大数据学习(4)MapReduce编程Helloworld:WordCount
Maven依赖:
<dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> <version>1.6</version> <scope>system</scope> <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.5</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.5</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.6.5</version> </dependency>
Mapper类:
public class WordcountMapper extends Mapper<LongWritable,Text,Text,IntWritable>{ @Override protected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException { String line = value.toString(); for(String word : line.split(" ")) { context.write(new Text(word), new IntWritable(1)); } } }
Reducer类:
public class WordcountReducer extends Reducer<Text, IntWritable,Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int count = 0; for(IntWritable value : values) { count += value.get(); } context.write(key , new IntWritable(count)); } }
启动类:
public class WordcountLancher { public static void main(String[] args) throws Exception{ String inputPath = args[0]; String outputPath = args[1]; Job job = Job.getInstance(); job.setMapperClass(WordcountMapper.class); job.setReducerClass(WordcountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.setInputPaths(job, new Path(inputPath)); FileOutputFormat.setOutputPath(job, new Path(outputPath)); boolean success = job.waitForCompletion(true); System.exit(success ? 0 : 1); } }
在HDFS中准备输入数据:
hadoop fs -mkdir -p /wordcount/input hadoop fs -put LICENSE.txt /wordcount/input
记得启动yarn:
start-yarn.sh
启动map-reduce程序:
hadoop jar wordcount.jar me.huqiao.hadoop.mr.WordcountLancher /wordcount/input /wordcount/output
查看结果:
hadoop fs -cat /wordcount/output/part-r-00000 |more
分类:
大数据/Hadoop
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Java 中堆内存和栈内存上的数据分布和特点
· 开发中对象命名的一点思考
· .NET Core内存结构体系(Windows环境)底层原理浅谈
· C# 深度学习:对抗生成网络(GAN)训练头像生成模型
· .NET 适配 HarmonyOS 进展
· 手把手教你更优雅的享受 DeepSeek
· AI工具推荐:领先的开源 AI 代码助手——Continue
· 探秘Transformer系列之(2)---总体架构
· V-Control:一个基于 .NET MAUI 的开箱即用的UI组件库
· 乌龟冬眠箱湿度监控系统和AI辅助建议功能的实现