hadoop2.0.x【1】--Apache Hadoop NextGen MapReduce (YARN)--翻译与分析
粗字体为翻译,蓝色字体为相关技术解释
MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
目前的MapReduce2.0(MRv2)或称之为YARN是MapReduce相对hadoop-0.23的版本来讲是一个彻底的变革
hadoop在2.0以后更像是一种可以容纳多种计算架构的云计算平台
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster
(AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
MRv2的基本思想是将JobTracker的两个主要功能:资源管理和作业调度/监控,分割成单独的守护进程。这个想法需要有一个全局的ResourceManager(RM)和每个应用程序有一个ApplicationMaster(AM)。应用程序是指Map-Reduce作业的传统意义上的任一单个作业或DAG的作业。
The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.
ResourceManager与每一个slave,NodeManager(NM),构成数据计算框架。ResourceManager中是仲裁资源系统中的所有应用程序的最终裁定者。
The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
实际上,每个应用程序的ApplicationMaster以一个框架库的形式存在,负责与ResourceManager协商资源,并与NodeManager(S)合作进行执行、监控任务。
The ResourceManager has two main components: Scheduler and ApplicationsManager.
ResourceManager中有两个主要组件:Scheduler(调度器)和ApplicationsManager (程序管理器)。
The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the
application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract
notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. In the first version, onlymemory is supported.
Scheduler负责为各个运行的应用程序分配资源以及运行队列调度等纯粹的调度工作。它不执行任何监视或跟踪状态的任务,并且它不提供任何有关任务失败重启以及应用程序故障或硬件故障的保证。Scheduler基于应用程序的资源需求执行调度功能。它如同是内存、
CPU、磁盘、网络等元素的资源容器,在第一个版本,仅支持内存调度。
The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples
of the plug-in.
Scheduler被设计为可插拔的插件,它负责为各任务队列或程序分配群集资源。目前的Map-Reduce以CapacityScheduler和FairScheduler作为插件的形式存在。
The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources
CapacityScheduler支持层次化队列,以方便更多可预见方式的共享群集资源。
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
ApplicationsManager负责接受作业提交,与执行应用程序的具体ApplicationMaster协商,并提供用于重新启动失败的ApplicationMaster服务。
The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
NodeManager是每台机器的框架的容器代理,负责监控每台机器的资源使用情况(CPU
,内存,磁盘,网络)并报告同样的内容到ResourceManager /Scheduler。
The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
每个应用程序的ApplicationMaster负责与Scheduler协商适当的资源containers,并对分配的资源在进程中的状态进行监控和跟踪。
MRV2 maintains API compatibility with previous stable release (hadoop-0.20.205). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.
MRV2保持API的兼容性与以前的稳定版本(的hadoop - 0.20.205 ) 。这意味着所有的Map-Reduce作业在MRv2的顶部运行只需重新编译。