hadoop测试题目-每天5题,总35题,第二天

地址:http://www.cnblogs.com/jarlean/archive/2013/04/09/3009855.html                       

Q6. What is the purpose of RecordReader in Hadoop
The InputSplithas defined a slice of work, but does not describe how to access it. The RecordReaderclass actually loads the data from its source and converts it into (key, value) pairs suitable for reading by the Mapper. The RecordReader instance is defined by the InputFormat(RecordReader将输入文件转成key value形式)
Q7. After the Map phase finishes, the hadoop framework does "Partitioning, Shuffle and sort". Explain what happens in this phase?
- Partitioning(决定哪个reduce进程接收哪个map输出的kv)
Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same
- Shuffle(混洗,将key value整合,形成如(1,(a,b,c))形式的数据?)
- Sort(节点在将数据进行reduce操作前,将先进行一次排序)
Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer
Q9. If no custom partitioner is defined in the hadoop then how is data partitioned before its sent to the reducer 
The default partitioner computes a hash value for the key and assigns the partition based on this result(系统带默认的分区器,由hash算法得出key,并根据key得出分区信息)
Q10. What is a Combiner(合并函数,将map生成的数据写入Combiner,Combiner再将数据传给reduce)
The Combiner is a "mini-reduce" process which operates only on data generated by a mapper. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers.

posted @ 2013-04-09 12:56  jarlean  阅读(477)  评论(0编辑  收藏  举报