9.11

3. 使用 MapReduce 实现词频统计

概述

MapReduce 是 Hadoop 用于处理大规模数据的核心编程模型。本文将通过 MapReduce 代码实现简单的词频统计任务。

内容

MapReduce 工作原理：Mapper 和 Reducer

Hadoop 项目结构

MapReduce 程序代码

代码示例

public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } }

posted @ 2024-09-11 19:51 赵千万阅读(7) 评论(0) 编辑收藏举报

刷新页面返回顶部

zhaoqianwan

9.11

3. 使用 MapReduce 实现词频统计

概述

内容

代码示例

公告