YARN与MapReduce1的对比
Apache YARN (Yet Another Resource Negotiator)从Hadoop2开始。YARN为集群提供资源管理和Applications的调度。YARN的API用于操作集群的资源。
MapReduce1:
JobTracker的职责:
(1)Job调度(将Tasks与TaskTrackers匹配)
(2)Task进程监控(keeping track of tasks, restarting failed orslow tasks, and doing task bookkeeping, such as maintaining counter totals)
(3)存储已经完成的job的历史信息
TaskTracker的职责:
运行tasks,向JobTracker发送进展报告
Scalability:
MapReduce 1 hits scalabilitybottlenecks in the region of 4,000 nodes and 40,000 tasks
Yarn is designed to scale up to 10,000 nodes and 100,000 tasks
Availability:
High availability (HA) is usually achieved by replicating the state needed for anotherdaemon to take over the work needed to provide the service, in the event of the service daemon failing.
JobTracker的内存复杂并且不断变化(each task status is updated every few seconds),很难支持HA。而YARN的RM、NM、AM都支持HA。
Utilization:
MapReduce1中,每个TaskTracker在配置阶段被分配固定大小的slot,分别为map slot (只能运行map task)和 reduce slot(只能运行reduce task),因此MRv1可能存在只有map slot可用而reduce slot不可用,造成reduce tasks必须等待的情况。此外,slot太大会浪费资源,slot太小可能导致失败。
YARN中每个NodeManager掌管一个资源池,资源是细粒度的,aoo请求所需的资源即可。
Multitenancy:
YARN最大的优势是从Hadoop中抽离出来,能够支持除了MapReduce之外的其他分布式Application,比如Spark的ClusterManager可以使YARN