Reduce端实现 JOIN

需求

> 假如数据量巨大,两表的数据是以文件的形式存储在 HDFS 中, 需要用 MapReduce 程序来实现以下 SQL 查询运算
>
> ```sql
> select a.id,a.date,b.name,b.category_id,b.price from t_order a left join t_product b on a.pid = b.id
> ```

##### 商品表

| id | pname | category_id | price |
| ----- | ------ | ----------- | ----- |
| P0001 | 小米5 | 1000 | 2000 |
| P0002 | 锤子T1 | 1000 | 3000 |

##### 订单数据表

| id | date | pid | amount |
| ---- | -------- | ----- | ------ |
| 1001 | 20150710 | P0001 | 2 |
| 1002 | 20150710 | P0002 | 3 |

###2.2 步骤

通过将关联的条件作为map输出的key,将两表满足join条件的数据并携带数据所来源的文件信息,发往同一个reduce task,在reduce中进行数据的串联

####Step 1: 定义 Mapper

```java
public class ReduceJoinMapper extends Mapper<LongWritable,Text,Text,Text> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//1:判断数据来自哪个文件
FileSplit fileSplit = (FileSplit) context.getInputSplit();
String fileName = fileSplit.getPath().getName();
if(fileName.equals("product.txt")){
//数据来自商品表
//2:将K1和V1转为K2和V2,写入上下文中
String[] split = value.toString().split(",");
String productId = split[0];

context.write(new Text(productId), value);

}else{
//数据来自订单表
//2:将K1和V1转为K2和V2,写入上下文中
String[] split = value.toString().split(",");
String productId = split[2];

context.write(new Text(productId), value);

}

}
}

```

####Step 2: 定义 Reducer

```java
public class ReduceJoinMapper extends Mapper<LongWritable,Text,Text,Text> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//1:判断数据来自哪个文件

FileSplit fileSplit = (FileSplit) context.getInputSplit();
String fileName = fileSplit.getPath().getName();
if(fileName.equals("product.txt")){
//数据来自商品表
//2:将K1和V1转为K2和V2,写入上下文中
String[] split = value.toString().split(",");
String productId = split[0];

context.write(new Text(productId), value);

}else{
//数据来自订单表
//2:将K1和V1转为K2和V2,写入上下文中
String[] split = value.toString().split(",");
String productId = split[2];

context.write(new Text(productId), value);

}

 

}
}
```

####Step 3: 定义主类

```java
public class ReduceJoinReducer extends Reducer<Text,Text,Text,Text> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
//1:遍历集合,获取V3 (first +second)
String first = "";
String second = "";
for (Text value : values) {
if(value.toString().startsWith("p")){
first = value.toString();
}else{
second += value.toString();
}

}
//2:将K3和V3写入上下文中
context.write(key, new Text(first+"\t"+second));
}
}
```

posted on 2021-09-20 08:14  季昂  阅读(55)  评论(0编辑  收藏  举报