Hadoop Mapreduce之WordCount实现

1.新建一个WCMapper继承Mapper
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
      @Override
      protected void map(LongWritable key, Text value, Context context)
                  throws IOException, InterruptedException {
            //接收数据V1
            String line = value.toString();
            //切分数据
            String[] wordsStrings = line.split(" ");
            //循环
            for (String w: wordsStrings) {
                  //出现一次,记一个一,输出
                  context.write(new Text(w), new LongWritable(1));
            }
      }
}
 
2.新建一个WCReducer继承Reducer
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
      @Override
      protected void reduce(Text key, Iterable<LongWritable> v2s, Context context)
                  throws IOException, InterruptedException {
            // TODO Auto-generated method stub
            //接收数据
            //Text k3 = k2;
            //定义一个计算器
            long counter = 0;
            //循环v2s
            for (LongWritable i : v2s)
            {
                  counter += i.get();
            }
            //输出
            context.write(key, new LongWritable(counter));
      }
}
3.WordCount类实现Main方法
/*
 * 1.分析具体的业力逻辑,确定输入输出数据样式
 * 2.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Mapper;
 * 重写map方法,实现具体业务逻辑,将新的kv输出
 * 3.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Reducer;
 * 重写reduce,实现具体业务逻辑
 * 4.将自定义的mapper和reducer通过job对象组装起来
 */
public class WordCount {
      public static void main(String[] args) throws Exception {
            // 构建Job对象
            Job job = Job.getInstance(new Configuration());
            
            // 注意:main方法所在的类
            job.setJarByClass(WordCount.class);
            
            // 设置Mapper相关属性
            job.setMapperClass(WCMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);
            FileInputFormat.setInputPaths(job, new Path("/words.txt"));
            
            // 设置Reducer相关属性
            job.setReducerClass(WCReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(LongWritable.class);
            FileOutputFormat.setOutputPath(job, new Path("/wcount619"));
            // 提交任务
            job.waitForCompletion(true);
      }     
}
 
 
4.打包为wc.jar,并上传到linux,并在Hadoop下运行
     hadoop jar /root/wc.jar
posted @ 2017-06-11 15:00  独立小桥风满袖  阅读(270)  评论(0编辑  收藏  举报