504. Inverted Index (Map Reduce) lintcode

https://www.lintcode.com/problem/inverted-index-map-reduce/description -- decription of the map reduce problem

1. click the submit button to view the problem.

2. logic of map reduce, each time, they only deal with one key value pair (for map and reduce).

given two documents as follows:

[{"id":1,"content":"This is the content of document1"}

{"id":2,"content":"This is the content of document2"}]

after map:

This 1, is 1, .. This 2, is 2,

hidden shuffle(sort and transport), how does it sort, accorind key or pair??

after reduce(merge) -- before reduce, already have the iterator of id

This <1,2>, is <1,2>;

Cautious!!!!!!!!!! if they are repeated element or duplicate , you probably get the <1,1,2>, if the appears twice in first docemnet.

solution -- check the prev and cur in the reduce of the value .

 

code 

public class InvertedIndex {

    public static class Map {
        public void map(String key, Document value,
                        OutputCollector<String, Integer> output) {
            // Write your code here
            // Output the results into output buffer.
            
            int id = value.id;
            String content = value.content;
            String[] words = content.split("\\s+");
            //System.out.println(words[0]);
            if(words.length<=0) return ;
            //what if duplicate StackTraceElement
            for(int i = 0; i<words.length; i++){
                 output.collect(words[i], id);
            }
            // Ps. output.collect(String key, int value);
        }
    }

    public static class Reduce {
        public void reduce(String key, Iterator<Integer> values,
                           OutputCollector<String, List<Integer>> output) {
            // Write your code here
            // Output the results into output buffer.
            List<Integer> res = new ArrayList<>();
            int prev = -1;
            while(values.hasNext()){
                int now = values.next();
                if(prev!=now)
                    res.add(now);
                prev = now;
            }
            output.collect( key,  res);
            // Ps. output.collect(String key, List<Integer> value);
        }
    }
}

 

skills:

iterator<Integer> iter = new .. 

iter.hasNext(); iter.next()

string.split("\\s+")

 

posted @ 2018-05-07 12:46  wz30  阅读(419)  评论(0编辑  收藏  举报