03Hadoop

1. MapReduce 介绍

1.1. MapReduce 设计构思和框架结构



3. WordCount

  • 需求: 在一堆给定的文本文件中统计输出每一个单词出现的总次数



4. MapReduce 运行模式

yarn jar hadoop_hdfs_operate‐1.0‐SNAPSHOT.jar
cn.itcast.hdfs.demo1.JobMain

5. MapReduce 分区


Step 4. Main 入口

public class PartitionMain  extends Configured implements Tool {
    public static void main(String[] args) throws  Exception{
        int run = ToolRunner.run(new Configuration(), new 
PartitionMain(), args);
        System.exit(run);
    }
    @Override
    public int run(String[] args) throws Exception {
        Job job = Job.getInstance(super.getConf(), 
PartitionMain.class.getSimpleName());
        job.setJarByClass(PartitionMain.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        TextInputFormat.addInputPath(job,new 
Path("hdfs://192.168.52.250:8020/partitioner"));
        TextOutputFormat.setOutputPath(job,new 
Path("hdfs://192.168.52.250:8020/outpartition"));
        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);
        job.setReducerClass(MyReducer.class);
        /**
         * 设置我们的分区类,以及我们的reducetask的个数,注意reduceTask的个数
一定要与我们的
         * 分区数保持一致
         */
        job.setPartitionerClass(MyPartitioner.class);
        job.setNumReduceTasks(2);
        boolean b = job.waitForCompletion(true);
        return b?0:1;
    }
}
posted @ 2021-12-06 21:43  起跑线小言  阅读(22)  评论(0编辑  收藏  举报