Cascading(一)之日志解析
此例子为官网例子,所以直接上代码:
1 package com.wyf.cascade; 2 3 import java.util.Properties; 4 5 import cascading.flow.Flow; 6 import cascading.flow.FlowConnector; 7 import cascading.operation.regex.RegexParser; 8 import cascading.pipe.Each; 9 import cascading.pipe.Pipe; 10 import cascading.scheme.TextLine; 11 import cascading.tap.Hfs; 12 import cascading.tap.Lfs; 13 import cascading.tap.Tap; 14 import cascading.tuple.Fields; 15 16 /** 17 * 日志解析 18 * 19 * @author: wyf 20 * @version: Jul 12, 2013 2:47:44 PM 21 */ 22 public class LogParser { 23 public static void main(String[] args) { 24 String inputPath = "/home/wyf/workspace/HadoopCascading/data/apache.200.txt"; 25 String outputPath = "/home/wyf/workspace/HadoopCascading/data/output"; 26 27 //从本地文件系统中创建源头,默认TextLine规则声明了两个字段"offset"和“line” 28 Tap localLogTap = new Lfs(new TextLine(), inputPath); 29 30 //声明从日志文件解析出来的字段名称 31 Fields apacheFields = new Fields("ip", "time", "method", "event", "status", "size"); 32 33 // define the regular expression to parse the log file with 34 //设置解析规则 35 String apacheRegex = "^([^ ]*) +[^ ]* +[^ ]* +\\[([^]]*)\\] +\\\"([^ ]*) ([^ ]*) [^ ]*\\\" ([^ ]*) ([^ ]*).*$"; 36 37 //设置输出组的顺序 38 int[] allGroups = { 5, 6, 1, 2, 3, 4}; 39 40 // create the stream parser 41 //创建解析器 42 RegexParser parser = new RegexParser(apacheFields, apacheRegex, allGroups); 43 44 //创建管道元素,指定管道名称为"parser", 输入字段名为"line" 45 Pipe importPipe = new Each("parser", new Fields("line"), parser); 46 47 //创建输出头,默认TextLine输出所有字段 48 Tap remoteLogTap = new Hfs(new TextLine(), outputPath); 49 50 //设置当前工作jar 51 Properties properties = new Properties(); 52 FlowConnector.setApplicationJarClass(properties, LogParser.class); 53 54 //把输入源头与输出头用管道链接 55 Flow parsedLogFlow = new FlowConnector(properties).connect(localLogTap, remoteLogTap, importPipe); 56 57 //启动解析日志流 58 parsedLogFlow.start(); 59 60 //阻塞等待,直到任务完成 61 parsedLogFlow.complete(); 62 } 63 }
例子理解:
cascading执行是以流程处理,先建立一个"起始节点"(localLogTap),在建立一个"终止节点"(remoteLogTap)(严格讲这种说法不准确,在此例子中暂且这么说),然后在起始节点与终止节点间连"一条线"(importPipe),OK现在可以执行了