《Hadoop实战》之链接多个MapReduce作业
顺序链接MapReduce作业
形如:mapreduce-1 | mapreduce-2 | mapreduce-3
- 在run函数中,继续写新的job,再通过JobClient.runJob()进行调用
@Override
public int run(String[] args) throws Exception {
JobConf job1 = new JobConf(getConf(), getClass());
JobClient.runJob(job1);
JobConf job2 = new JobConf(getConf(), getClass());
JobClient.runJob(job2);
}
具有复杂依赖的MapReduce链接
- 通过Job和JobControl类来管理
// 对于Job对象x和y
x.addDependingJob(y) // 添加依赖关系:在y完成之前,x不会启动
jobControl.addJob(x) // Job对象x,y 由JobControl对象管理
jobControl.addJob(y)
jobControl.allFinished() //JobControl对象的监视方法
jobControl.getFailedJobs()
预处理和后处理的链接
形如:Map+ | REDUCE | MAP*
-
ChainMapper/ChainReducer:减少输出的中间结果
-
addMapper/setReducer接口
- job、mapperConf:全局和本地JobConf对象
- kclass:Mapper类
- 输入输出类的类型
- byValue:MapOutputKey跟MapOutputValue是否采用值传递的方式
- true:值传递
- false:引用传递
public static <K1, V1, K2, V2> void
addMapper(JobConf job,
Class<? extends Mapper<K1, V1, K2, V2>> kclass,
Class<? extends K1> inputKeyClass,
Class<? extends V1> inputValueClass,
Class<? extends K2> outputKeyClass,
Class<? extends V2> outputValueClass,
boolean byValue,
JobConf mapperConf)
例:具有预处理和后处理的MapReduce Driver
- Map1 | Map2 | Reduce | Map3 | Map4
- ChainMapper.addMapper:添加Reduce前所有步骤
- ChainReducer.addMapper:后续步骤
- 本地JobConf对象具有更高优先级
@Override
public int run(String[] args) throws Exception {
JobConf job = new JobConf(getConf(), getClass());
job.setJobName("ChainJob");
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
JobConf map1Conf = new JobConf(false); // loadDefaults=false,生成本地配置对象
ChainMapper.addMapper(job, Map1.class, LongWritable.class, Text.class,
Text.class, Text.class, true, map1Conf);
JobConf map2Conf = new JobConf(false);
ChainMapper.addMapper(job, Map2.class, Text.class, Text.class,
LongWritable.class, Text.class, true, map2Conf);
JobConf reduceConf = new JobConf(false);
ChainReducer.setReducer(job, ReducerClass.class, LongWritable.class, Text.class,
Text.class, Text.class, true, reduceConf);
JobConf map3Conf = new JobConf(false);
ChainReducer.addMapper(job, Map3.class, Text.class, Text.class,
LongWritable.class, Text.class, true, map3Conf);
JobConf map4Conf = new JobConf(false);
ChainReducer.addMapper(job, Map4.class, LongWritable.class, Text.class,
LongWritable.class, Text.class, true, map4Conf);
JobClient.runJob(job);
return 0;
}