hadoop程序MapReduce之average

需求：求多门课程的平均值。

样板：math.txt

zhangsan 90

lisi 88

wanghua 80

china.txt

zhangsan 80
lisi 90
wanghua 88

输出：zhangsan 85

lisi 89

wanghua 84

分析部分：

mapper部分分析：

1、<k1,v1>k1代表：一行数据的编号位置，v1代表：一行数据。

2、<k2,v2>k2代表：名字，v2代表：分数。

reduce部分分析：

3、<k3,v3>k3代表：相同key（名字），v3代表：list<int>。

4、统计输出<k4,v4>k4代表：名字，v4代表：平均值。

程序部分：

AverageMapper类：

package com.cn.average;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class AverageMapper extends Mapper<Object, Text, Text, IntWritable> {
    @Override
    protected void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        String [] strings = new String[2];
        int i = 0;
        String line = value.toString();
        StringTokenizer tokenizerVal = new StringTokenizer(line);
        while (tokenizerVal.hasMoreElements()) {
            strings[i] = tokenizerVal.nextToken();
            i++;
        }
        context.write(new Text(strings[0]), new IntWritable(Integer.parseInt(strings[1])));
    }
}

AverageReduce类：

package com.cn.average;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class AverageReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
     @Override
    protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
         int sum = 0;
         int i = 0;
         for(IntWritable value : values){
             sum += value.get();
             i++;
         }
         context.write(key, new IntWritable(sum/i));
    }
}

DataAverage类：

package com.cn.average;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

/**
 * 平均值
 * @author root
 *
 */
public class DataAverage {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
           System.err.println("Usage: DataAverage  ");
           System.exit(2);
        }
        //创建一个job
        Job job = new Job(conf, "Data Average");
        job.setJarByClass(DataAverage.class);
        
        //设置文件的输入输出路径
        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        
        //设置mapper和reduce处理类
        job.setMapperClass(AverageMapper.class);
        job.setReducerClass(AverageReduce.class);
        
      //设置输出key-value数据类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        
       //提交作业并等待它完成
       System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

每天总结一点点，总有不一样的收获。

posted @ 2016-08-11 00:09 麻雀虽小五脏俱全阅读(413) 评论(0) 编辑收藏举报

刷新页面返回顶部

指尖上的艺术

爱程序爱生活

hadoop程序MapReduce之average

公告