hadoop测试题目-每天5题,总35题,第五天

地址:http://www.cnblogs.com/jarlean/archive/2013/04/12/3015911.html                       

Q21. What is the characteristic of streaming API that makes it flexible run map reduce jobs in languages like perl, ruby, awk etc.  (streaming的什么特性让他支持多语言的MR任务)
Hadoop Streaming allows to use arbitrary programs for the Mapper and Reducer phases of a Map Reduce job by having both Mappers and Reducers receive their input on stdin and emit output (key, value) pairs on stdout.(MR以标准形式输入即可)
Q22. Whats is Distributed Cache in Hadoop
Distributed Cache is a facility provided by the Map/Reduce framework to cache files (text, archives, jars and so on) needed by applications during execution of the job. The framework will copy the necessary files to the slave node before any tasks for the job are executed on that node.(分布式缓存以广播形式将files拷贝到slave节点,减少了join操作的时间开销)
Q23. What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it 
This is because distributed cache is much faster. It copies the file to all trackers at the start of the job. Now if the task tracker runs 10 or 100 mappers or reducer, it will use the same copy of distributed cache. On the other hand, if you put code in file to read it from HDFS in the MR job then every mapper will try to access it from HDFS hence if a task tracker run 100 map jobs then it will try to read this file 100 times from HDFS. Also HDFS is not very efficient when used like this.(分布式缓存在job运行前拷贝file到各个节点,提高了运行效率。但这也造成产生节点倍数进程的问题,故不很实用)
Q.24 What mechanism does Hadoop framework provides to synchronize changes made in Distribution Cache during runtime of the application 
This is a trick questions. There is no such mechanism. Distributed Cache by design is read only during the time of Job execution(分布式缓存只是设计用来读取的,没有办法保证任务同步)
Q25. Have you ever used Counters in Hadoop. Give us an example scenario
Anybody who claims to have worked on a Hadoop project is expected to use counters(呵呵,任何做过hadoop项目的人都该知道计数器)

posted @ 2013-04-12 08:48  jarlean  阅读(229)  评论(0编辑  收藏  举报