多表关联

输入是两个文件,一个代表工厂表,包含工厂名列和地址编号列;另一个代表地址表,包含地址名列和地址编号列。要求从输入数据中找出工厂名地址名对应关系,输出"工厂名——地址名"表。

    样例输入如下所示。

    1)factory:

 

factoryname                    addressed

Beijing Red Star                    1

Shenzhen Thunder                3

Guangzhou Honda                2

Beijing Rising                       1

Guangzhou Development Bank      2

Tencent                        3

Back of Beijing                     1

 

    2)address:

 

addressID    addressname

1            Beijing

2            Guangzhou

3            Shenzhen

4            Xian

 

    样例输出如下所示。

 

factoryname                        addressname

Back of Beijing                          Beijing

Beijing Red Star                        Beijing

Beijing Rising                          Beijing

Guangzhou Development Bank          Guangzhou

Guangzhou Honda                    Guangzhou

Shenzhen Thunder                    Shenzhen

Tencent                            Shenzhen

 

设计思路:和单表关联百分之九十九相似,因为单表关联也算是多表关联(自连接),最重要的还是找到”中间数据“,这里的中间数据

              很明显是addressId,一句话概括(所有引用"xId"的factory都属于"xId所对应addressname " 。这里只需要按照ID分组,把

              该组里所有的factoryaddressname 做个笛卡尔积就得出了答案。

 

代码:

   

public static class Map extends Mapper<LongWritable, Text, Text,Text> {
        
        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            
            //输入的一行预处理文本
            StringTokenizer itr = new StringTokenizer(value.toString(),"\t");
            String[] values = new String[2];
            int i = 0;
            while (itr.hasMoreElements()) {
                values[i] = itr.nextToken();
                i++;
            }
            if (!value.toString().contains("factoryname")&&!value.toString().contains("addressed")) {
                if (values[0].charAt(0)>='0'&&values[0].charAt(0)<='9') {//说明是右表的ID
                    context.write(new Text(values[0]),new Text("Right+"+values[1]));
                }else{                                                     //说明是左表的factory
                    context.write(new Text(values[1]),new Text("Left+"+values[0]));
                }
            }
            
        }
    }

    public static class IntSumReducer extends Reducer<Text, Text, Text, Text> {
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            
            
            List<String> childs  = new ArrayList<String>();
            List<String> parents = new ArrayList<String>();
            
            for (Text value : values) {
                String[] split = value.toString().split("\\+");
                if (split[0].equals("Right")) {
                    parents.add(split[1]);
                }else{
                    childs.add(split[1]);
                }
            }
            
            for (String cds : childs) {
                for (String pts : parents) {
                    context.write(new Text(cds), new Text(pts));
                }
            }
            
        }
    }

 

 

posted @ 2015-04-02 16:04  Mr._L  阅读(130)  评论(0编辑  收藏  举报