MRUnit

开发环境设置:

1.首先去Apach页面上下载最新版本的MRUnit,根据Hadoop版本来选择对应的版本。将jar包添加到IDE的classpath中。

2.下载最新版本mokito和JUnit jar包,同样将其添加到classpath中。

通过Maven添加依赖:

<dependency>
<groupId>org.apache.mrunit</groupId>
<artifactId>mrunit</artifactId>
<version>0.9.0-incubating</version>
<classifier>hadoop1</classifier> 
</dependency>

Use Classifier as hadoop2 if you are using Hadoop 2 version

 

MRUnit测试实例:

MRUnit测试框架基于JUnit。下面是使用MRUnit对MapReduce程序进行单元测试的实例。

数据记录如下:

 

    1. CDRID;CDRType;Phone1;Phone2;SMS Status Code
      655209;1;796764372490213;804422938115889;6
      353415;0;356857119806206;287572231184798;4
      835699;1;252280313968413;889717902341635;0

 

MapReduce 程序对这些记录进行分析, 找到所有wiCDRType为1的记录, 并记录下对应的SMS Status码. 例如Mapper的输出结果如下:

6, 1
0, 1

Reducer将它们作为输入,然后输出记录中特定的状态码出现的次数。相应的Mapper、Reducer代码如下:

public class SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
 
  private Text status = new Text();
  private final static IntWritable addOne = new IntWritable(1);
 
  /**
   * Returns the SMS status code and its count
   */
  protected void map(LongWritable key, Text value, Context context)
      throws java.io.IOException, InterruptedException {
 
    //655209;1;796764372490213;804422938115889;6 is the Sample record format
    String[] line = value.toString().split(";");
    // If record is of SMS CDR
    if (Integer.parseInt(line[1]) == 1) {
      status.set(line[4]);
      context.write(status, addOne);
    }
  }
}
View Code
相应的Reducer代码如下:
public class SMSCDRReducer extends
  Reducer<Text, IntWritable, Text, IntWritable> {
 
  protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws java.io.IOException, InterruptedException {
    int sum = 0;
    for (IntWritable value : values) {
      sum += value.get();
    }
    context.write(key, new IntWritable(sum));
  }
}
View Code

完整的MRUnit测试类如下:

import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mrunit.mapreduce.MapDriver;
import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;
import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
import org.junit.Before;
import org.junit.Test;
 
public class SMSCDRMapperReducerTest {
  MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
  ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
  MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;
 
  @Before
  public void setUp() {
    SMSCDRMapper mapper = new SMSCDRMapper();
    SMSCDRReducer reducer = new SMSCDRReducer();
    mapDriver = MapDriver.newMapDriver(mapper);
    reduceDriver = ReduceDriver.newReduceDriver(reducer);
    mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
  }
 
  @Test
  public void testMapper() {
    mapDriver.withInput(new LongWritable(), new Text(
        "655209;1;796764372490213;804422938115889;6"));
    mapDriver.withOutput(new Text("6"), new IntWritable(1));
    mapDriver.runTest();
  }
 
  @Test
  public void testReducer() {
    List<IntWritable> values = new ArrayList<IntWritable>();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    reduceDriver.withInput(new Text("6"), values);
    reduceDriver.withOutput(new Text("6"), new IntWritable(2));
    reduceDriver.runTest();
  }
   
  @Test
  public void testMapReduce() {
    mapReduceDriver.withInput(new LongWritable(), new Text(
              "655209;1;796764372490213;804422938115889;6"));
    List<IntWritable> values = new ArrayList<IntWritable>();
    values.add(new IntWritable(1));
    values.add(new IntWritable(1));
    mapReduceDriver.withOutput(new Text("6"), new IntWritable(2));
    mapReduceDriver.runTest();
  }
}
View Code

运行测试类,如果Mapper代码正确则运行通过,否则失败。

测试计数器:

计数器的通常用处是跟踪数据中格式不正确的记录。例如当输入CDR记录部位SMS类型,Mapper则忽略该数据、同时计数器递增。

增加计数器后的Mapper如下:

public class SMSCDRMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
 
  private Text status = new Text();
  private final static IntWritable addOne = new IntWritable(1);
 
  static enum CDRCounter {
    NonSMSCDR;
  };
 
  /**
   * Returns the SMS status code and its count
   */
  protected void map(LongWritable key, Text value, Context context) throws java.io.IOException, InterruptedException {
 
    String[] line = value.toString().split(";");
    // If record is of SMS CDR
    if (Integer.parseInt(line[1]) == 1) {
      status.set(line[4]);
      context.write(status, addOne);
    } else {// CDR record is not of type SMS so increment the counter
      context.getCounter(CDRCounter.NonSMSCDR).increment(1);
    }
  }
}
View Code

修改后的test Mapper如下:

public void testMapper() {
    mapDriver.withInput(new LongWritable(), new Text(
        "655209;0;796764372490213;804422938115889;6"));
    //mapDriver.withOutput(new Text("6"), new IntWritable(1));
    mapDriver.runTest();
      assertEquals("Expected 1 counter increment", 1, mapDriver.getCounters()
              .findCounter(CDRCounter.NonSMSCDR).getValue());
  }
View Code

 当CDR不为SMS类型时,计数器递增。这里通过断言来进行检查判断。当然可以通过同样的方式为Reducer添加计数器。

测试参数传递:

下面通过Configuration 类来给Testing传递参数。

在Mapper和Reducer类中通过Configuration.get()获取配置参数。在测试类中创建新配置对象,

Configuration conf = new Configuration(); 

在SetUp方法中添加一下代码:

mapDriver.setConfiguration(conf);
conf.set("myParameter1", "20");
conf.set("myParameter2", "23");

 测试类需要将这些参数传递给mapper.

参考:

https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial

posted @ 2015-10-19 23:15  wmymartin  阅读(285)  评论(0编辑  收藏  举报