MapReduce API(1)

首先简介一下，部分内置API的用途：

Configuration conf = new Configuration(); // 读取hadoop配置
Job job = new Job(conf, “作业名称”); // 实例化一道作业
job.setOutputKeyClass(输出Key的类型);
job.setOutputValueClass(输出Value的类型);
FileInputFormat.addInputPath(job, new Path(输入hdfs路径));
FileOutputFormat.setOutputPath(job, new Path(输出hdfs路径));
job.setMapperClass(Mapper类型);
job.setCombinerClass(Combiner类型);
job.setReducerClass(Reducer类型);

然后介绍一下结合eclipse的开发过程：

1.新建一个项目，编写相关代码，别忘了指定 package hadoop.test;

package hadoop.test;

import java.io.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

static class MaxTemperatureMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;

public void map(LongWritable key, Text value, Context conext)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(5, 9); // 自己准备的数据，是天气预报数据的简化版
int airTemperature = Integer.parseInt(line.substring(15, 19)); // 自己准备的数据，是天气预报数据的简化版

if (airTemperature != MISSING) {
conext.write(new Text(year), new IntWritable(airTemperature));
}

}

static class MaxTemperatureReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}

public static void main(String[] args) {

if (args.length != 2) {
System.err
.println(“Usage: MaxTemperature <input path> <output path>”);
System.exit(-1);
}

try {
Job job = new Job();
job.setJarByClass(MaxTemperature.class);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

2.通过eclipse引入hadoop相关jar
在工程项目的根目录点击右键查看属性(Properties)，然后会跳出来一个属性面板，左边有一列属性设置，上面有一个叫java build path的条目,点击，然后右边会出来一个白框和一列按钮白框上面有四个标签,点击Libraries标签，白框右边按钮里有一个叫 “add External jar” 的按钮，点击，选取hadoop-core-0.20.203.0.jar。
这时会发现eclipse代码检测所提示的错误全部消失。

3.导出jar包
右键点击java文件选择export，然后选择JAR file，指定路径和名称，如E:\HadoopTest.jar，然后一路next，最后选择main class,如hadoop.test.MaxTemperature

4.上传jar文件到hadoop集群或者伪分布式集群进行测试

5.数据
aaaaa1990aaaaaa0039a
 bbbbb1991bbbbbb0040a
 ccccc1992cccccc0040c
 ddddd1993dddddd0043d
 eeeee1994eeeeee0041e
 aaaaa1990aaaaaa0031a
 bbbbb1991bbbbbb0020a
 ccccc1992cccccc0030c
 ddddd1993dddddd0033d
 eeeee1994eeeeee0031e
 aaaaa1990aaaaaa0041a
 bbbbb1991bbbbbb0040a
 ccccc1992cccccc0040c
 ddddd1993dddddd0043d
 eeeee1994eeeeee0041e
 aaaaa1990aaaaaa0044a
 bbbbb1991bbbbbb0045a
 ccccc1992cccccc0041c
 ddddd1993dddddd0023d
 eeeee1994eeeeee0041e

6.运行hadoop，命令如下：
 /root/hadoop-0.20.203.0/bin/hadoop jar /root/java_hadoop/HadoopTest.jar /root/log/ /root/out/
具体执行时请注意路径

7.查看结果
cat /root/out/part-r-00000

posted on 2012-05-18 19:07 LifeStudio 阅读(822) 评论(0) 编辑收藏举报