大数据【三】YARN集群部署
一 概述
YARN是一个资源管理、任务调度的框架,采用master/slave架构,主要包含三大模块:ResourceManager(RM)、NodeManager(NM)、ApplicationMaster(AM)。
>ResourceManager负责所有资源的监控、分配和管理,运行在主节点;
>NodeManager负责每一个节点的维护,运行在从节点;
>ApplicationMaster负责每一个具体应用程序的调度和协调,只有在有任务正在执行时存在。
对于所有的applications,RM拥有绝对的控制权和对资源的分配权。而每个AM则会和RM协商资源,同时和NodeManager通信来执行和监控task。
二 运行流程
1‘ client向RM提交应用程序,其中包括启动该应用的ApplicationMaster的必须信息,例如ApplicationMaster程序、启动ApplicationMaster的命令、用户程序等。
2’ ResourceManager启动一个container用于运行ApplicationMaster。
3‘ 启动中的ApplicationMaster向ResourceManager注册自己,启动成功后与RM保持心跳。
4’ ApplicationMaster向ResourceManager发送请求,申请相应数目的container。
5‘ ResourceManager返回ApplicationMaster的申请的containers信息。申请成功的container,由ApplicationMaster进行初始化。container的启动信息初始化后,AM与对应的NodeManager通信,要求NM启动container。AM与NM保持心跳,从而对NM上运行的任务进行监控和管理。
6’ container运行期间,ApplicationMaster对container进行监控。container通过RPC协议向对应的AM汇报自己的进度和状态等信息。
7‘ 应用运行期间,client直接与AM通信获取应用的状态、进度更新等信息。
8’ 应用运行结束后,ApplicationMaster向ResourceManager注销自己,并允许属于它的container被收回。
三 管理YARN集群
1‘ 配置YARN集群
>切换到master服务器上,前提是HDFS结点已经启动,方法见上一篇博客>> http://www.cnblogs.com/1996swg/p/7286136.html
>指定YARN主节点,编辑文件“/usr/cstor/hadoop/etc/hadoop/yarn-site.xml”,将如下内容嵌入此文件里configuration标签间:
<property><name>yarn.resourcemanager.hostname</name><value>master</value></property>
<property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
yarn-site.xml是YARN守护进程的配置文件。第一句配置了ResourceManager的主机名,第二句配置了节点管理器运行的附加服务为mapreduce_shuffle,只有这样才可以运行MapReduce程序。
>将配置好的YARN配置文件拷贝至slaveX、client
命令如下: 查看子集 cat ~/data/4/machines
拷贝到子集 for x in `cat ~/data/4/machines` ; do echo $x ; scp /usr/cstor/hadoop/etc/hadoop/yarn-site.xml $x:/usr/cstor/hadoop/etc/hadoop/ ; done;
>确认已配置slaves文件,在master机器上查看;
>统一启动YARN,命令 /usr/cstor/hadoop/sbin/start-yarn.sh 如图所示
>验证用 jps 命令,在其余子集上同时验证,如图所示验证成功
2’ 在client机上提交DistributedShell任务
distributedshell,可以看做YARN编程中的“hello world”,主要功能是并行执行用户提供的shell命令或者shell脚本。
-jar指定了包含ApplicationMaster的jar文件,-shell_command指定了需要被ApplicationMaster执行的Shell命令。
在上再打开一个client 的连接,执行:
/usr/cstor/hadoop/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar /usr/cstor/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar -shell_command uptime
运行结果显示:
1 17/08/05 02:51:34 INFO distributedshell.Client: Initializing Client 2 17/08/05 02:51:34 INFO distributedshell.Client: Running Client 3 17/08/05 02:51:34 INFO client.RMProxy: Connecting to ResourceManager at master/10.1.21.27:8032 4 17/08/05 02:51:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 5 17/08/05 02:51:34 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=3 6 17/08/05 02:51:34 INFO distributedshell.Client: Got Cluster node info from ASM 7 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave1:42602, nodeAddressslave1:8042, nodeRackName/default-rack, nodeNumContainers0 8 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave2:57070, nodeAddressslave2:8042, nodeRackName/default-rack, nodeNumContainers0 9 17/08/05 02:51:34 INFO distributedshell.Client: Got node report from ASM for, nodeId=slave3:38580, nodeAddressslave3:8042, nodeRackName/default-rack, nodeNumContainers0 10 17/08/05 02:51:34 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0 11 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS 12 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE 13 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS 14 17/08/05 02:51:34 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE 15 17/08/05 02:51:35 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 8192 16 17/08/05 02:51:35 INFO distributedshell.Client: Max virtual cores capabililty of resources in this cluster 32 17 17/08/05 02:51:35 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment 18 17/08/05 02:51:35 INFO distributedshell.Client: Set the environment for the application master 19 17/08/05 02:51:35 INFO distributedshell.Client: Setting up app master command 20 17/08/05 02:51:35 INFO distributedshell.Client: Completed setting up app master command {{JAVA_HOME}}/bin/java -Xmx10m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr 21 17/08/05 02:51:35 INFO distributedshell.Client: Submitting application to ASM 22 17/08/05 02:51:36 INFO impl.YarnClientImpl: Submitted application application_1501872322130_0001 23 17/08/05 02:51:37 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 24 17/08/05 02:51:38 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 25 17/08/05 02:51:39 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 26 17/08/05 02:51:40 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 27 17/08/05 02:51:41 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 28 17/08/05 02:51:42 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 29 17/08/05 02:51:43 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 30 17/08/05 02:51:44 INFO distributedshell.Client: Got application report from ASM for, appId=1, clientToAMToken=null, appDiagnostics=, appMasterHost=slave2/10.1.32.41, appQueue=default, appMasterRpcPort=-1, appStartTime=1501872695990, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=http://master:8088/proxy/application_1501872322130_0001/, appUser=root 31 17/08/05 02:51:44 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop 32 17/08/05 02:51:44 INFO distributedshell.Client: Application completed successfully
3’ 在client机上提交MapReduce任务
(1)指定在YARN上运行MapReduce任务
首先,在master机上,将文件“/usr/cstor/hadoop/etc/hadoop/mapred-site.xml. template”重命名为“/usr/cstor/hadoop/etc/hadoop/mapred-site.xml”;
接着,编辑此文件并将如下内容嵌入此文件的configuration标签间:
<property><name>mapreduce.framework.name</name><value>yarn</value></property>
最后,将master机的“/usr/local/hadoop/etc/hadoop/mapred-site.xml”文件拷贝到slaveX与client,(拷贝方法同上YARN配置拷贝方法),重新启动集群。
(2)在client端提交PI Estimator任务
首先进入Hadoop安装目录:/usr/cstor/hadoop/,然后提交PI Estimator任务。
命令最后两个两个参数的含义:第一个参数是指要运行map的次数,这里是2次;第二个参数是指每个map任务,取样的个数;而两数相乘即为总的取样数。Pi Estimator使用Monte Carlo方法计算Pi值的。
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 2 10
显示结果如下:
1 Number of Maps = 2 2 Samples per Map = 10 3 17/08/05 03:03:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 4 Wrote input for Map #0 5 Wrote input for Map #1 6 Starting Job 7 17/08/05 03:03:31 INFO client.RMProxy: Connecting to ResourceManager at master/10.1.21.27:8032 8 17/08/05 03:03:32 INFO input.FileInputFormat: Total input paths to process : 2 9 17/08/05 03:03:32 INFO mapreduce.JobSubmitter: number of splits:2 10 17/08/05 03:03:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1501872322130_0002 11 17/08/05 03:03:32 INFO impl.YarnClientImpl: Submitted application application_1501872322130_0002 12 17/08/05 03:03:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1501872322130_0002/ 13 17/08/05 03:03:32 INFO mapreduce.Job: Running job: job_1501872322130_0002 14 17/08/05 03:03:39 INFO mapreduce.Job: Job job_1501872322130_0002 running in uber mode : false 15 17/08/05 03:03:39 INFO mapreduce.Job: map 0% reduce 0% 16 17/08/05 03:03:45 INFO mapreduce.Job: map 50% reduce 0% 17 17/08/05 03:03:46 INFO mapreduce.Job: map 100% reduce 0% 18 17/08/05 03:03:52 INFO mapreduce.Job: map 100% reduce 100% 19 17/08/05 03:03:52 INFO mapreduce.Job: Job job_1501872322130_0002 completed successfully 20 17/08/05 03:03:52 INFO mapreduce.Job: Counters: 49 21 File System Counters 22 FILE: Number of bytes read=50 23 FILE: Number of bytes written=347208 24 FILE: Number of read operations=0 25 FILE: Number of large read operations=0 26 FILE: Number of write operations=0 27 HDFS: Number of bytes read=522 28 HDFS: Number of bytes written=215 29 HDFS: Number of read operations=11 30 HDFS: Number of large read operations=0 31 HDFS: Number of write operations=3 32 Job Counters 33 Launched map tasks=2 34 Launched reduce tasks=1 35 Data-local map tasks=2 36 Total time spent by all maps in occupied slots (ms)=7932 37 Total time spent by all reduces in occupied slots (ms)=3443 38 Total time spent by all map tasks (ms)=7932 39 Total time spent by all reduce tasks (ms)=3443 40 Total vcore-seconds taken by all map tasks=7932 41 Total vcore-seconds taken by all reduce tasks=3443 42 Total megabyte-seconds taken by all map tasks=8122368 43 Total megabyte-seconds taken by all reduce tasks=3525632 44 Map-Reduce Framework 45 Map input records=2 46 Map output records=4 47 Map output bytes=36 48 Map output materialized bytes=56 49 Input split bytes=286 50 Combine input records=0 51 Combine output records=0 52 Reduce input groups=2 53 Reduce shuffle bytes=56 54 Reduce input records=4 55 Reduce output records=0 56 Spilled Records=8 57 Shuffled Maps =2 58 Failed Shuffles=0 59 Merged Map outputs=2 60 GC time elapsed (ms)=347 61 CPU time spent (ms)=2630 62 Physical memory (bytes) snapshot=683196416 63 Virtual memory (bytes) snapshot=2444324864 64 Total committed heap usage (bytes)=603979776 65 Shuffle Errors 66 BAD_ID=0 67 CONNECTION=0 68 IO_ERROR=0 69 WRONG_LENGTH=0 70 WRONG_MAP=0 71 WRONG_REDUCE=0 72 File Input Format Counters 73 Bytes Read=236 74 File Output Format Counters 75 Bytes Written=97 76 Job Finished in 20.592 seconds 77 Estimated value of Pi is 3.80000000000000000000
小结:
关于YARN框架的学习不需多深入,只需搭建好配置环境,以供下面MapReduce的学习。
在新版Hadoop中,Yarn作为一个资源管理调度框架,是Hadoop下MapReduce程序运行的生存环境。其实MapRuduce除了可以运行Yarn框架下,也可以运行在诸如Mesos,Corona之类的调度框架上,使用不同的调度框架,需要针对Hadoop做不同的适配。