9.11
3. 使用 MapReduce 实现词频统计
概述
MapReduce 是 Hadoop 用于处理大规模数据的核心编程模型。本文将通过 MapReduce 代码实现简单的词频统计任务。
内容
MapReduce 工作原理:Mapper 和 Reducer
Hadoop 项目结构
MapReduce 程序代码
代码示例
public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } }
本文来自博客园,作者:赵千万,转载请注明原文链接:https://www.cnblogs.com/zhaoqianwan/p/18300660
千万千万赵千万