一、Client只与ResourceManager交互,主要有以下步骤:
 
1、向ResourceManager注册自己,并获得ResourceMangaer的响应,RM在响应中会报告集群资源(maximumResourceCapability),其中有最大可用VCores数量和最大可用内存。
2、client构造ApplicationSubmissioinContext,内容包括ApplicationId,ApplicationName,mem,vcores,priority,queue和ContainerLauncherContext,其中必须的是mem,vcores,ContainerLauncherContext,ContainerLauncherContext中需要设置三个内容:
  • localResources:container运行时候需要的本地文件,可能是脚本、jar包等。通常会上传到hdfs中,container运行时再去下载下来。
  • enviroment:container运行时候需要设置的环境变量。
  • commands:container执行命令,是linux命令。ApplicationMasterSubmissionContext提交给RM之后就会来执行这个命令。
3、提交ApplicationSubmissionContext。调用submitApplication()方法,返回一个ApplicationId。
4、client通过applicationId 查询任务的运行情况,直到任务Application状态变为FINISHED。
 
二、源码分析:
 
实现以上四个步骤的用到的接口,但这些接口不是唯一的,还有其他接口也能实现一样的功能。
下面的client做的事情是提交一个ApplicationMaster的jar包。
 
1、向ResourceManager注册:
1 YarnClientApplication app = yarnClient.createApplication();
2 GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
获知集群最大可用资源:
1 Resource clusterMax = appResponse.getMaximumResourceCapability();
2 int maxMem = clusterMax.getMemory();
3 int maxCore = clusterMax.getVirtualCores();
 
2、准备提交ApplicationMaster要运行的相关信息ApplicationSubmissioinContext:
 1 ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
 2 ContainerLaunchContext clc = Records.newRecord(ContainerLaunchContext.class);
 3 /*mem、vcores*/
 4 Resource amResource = Records.newRecord(Resource.class);
 5 amResource.setMemory(Math.min(clusterMax.getMemory(), 1024)); 
 6 amResource.setVirtualCores(Math.min(clusterMax.getVirtualCores(), 4));
 7 appContext.setResource(amResource);
 8 
 9 /*localResource*/
10 Map<String, LocalResource> localResourceMap = new HashMap<String, LocalResource>();
11 File appMasterJarFile = new File(appMasterJar);
12 localResourceMap.put(appMasterJarFile.getName(), toLocalResource(fs,appResponse.getApplicationId().toString(),appMasterJarFile));
13 clc.setLocalResources(localResourceMap);
14 
15 /*commands*/
16 StringBuilder cmd = new StringBuilder();
17 cmd.append("\"" + ApplicationConstants.Environment.JAVA_HOME.$() + "/bin/java\"")
18                 .append(" ")
19                 .append(appMasterMainClass)
20                 .append(" ");
21 
22 cmd.append("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT)
23                 .append(" ")
24                 .append("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);
25 clc.setCommands(Collections.singletonList(cmd.toString()));
26 
27 /*environment*/
28  Map<String, String> envMap = new HashMap<String, String>();
29 envMap.put("CLASSPATH", hadoopClassPath());
30 System.out.println(hadoopClassPath());
31 clc.setEnvironment(envMap);
32 /*将ContainerLauncherContext设置到ApplicationSubmissionContext中*/
33 appContext.setAMContainerSpec(clc);

 

3、提交任务:

1 ApplicationId appId = yarnClient.submitApplication(appContext);

 

4、查看任务运行情况,等待任务结束:

1 ApplicationReport report = client.getApplicationReport(appId);
2 while (report.getYarnApplicationState() != YarnApplicationState.FINISHED) {
3           report = client.getApplicationReport(applicationId);
4           LOG.info(String.format("%f %s", report.getProgress(), report.getYarnApplicationState()));
5           Thread.sleep(1000);
6 }

 

参考:

超易懂的例子,参考了大部分:http://www.qingpingshan.com/rjbc/dashuju/151240.html

董西城的讲解,如何编写Yarn应用程序:http://dongxicheng.org/mapreduce-nextgen/how-to-write-an-yarn-applicationmaster/

hadoop 的 distributedShell例子,运行guide:https://www.thesisscientist.com/docs/JeffBarner/04fae500-dfd5-4614-836d-70e1808e0aae.pdf

posted on 2018-07-26 20:42  今天天蓝蓝  阅读(1162)  评论(0编辑  收藏  举报