yarn 简单的client客户端实现

一、Client只与ResourceManager交互，主要有以下步骤：

1、向ResourceManager注册自己，并获得ResourceMangaer的响应，RM在响应中会报告集群资源（maximumResourceCapability），其中有最大可用VCores数量和最大可用内存。

2、client构造ApplicationSubmissioinContext，内容包括ApplicationId，ApplicationName，mem，vcores，priority，queue和ContainerLauncherContext，其中必须的是mem，vcores，ContainerLauncherContext，ContainerLauncherContext中需要设置三个内容：

localResources：container运行时候需要的本地文件，可能是脚本、jar包等。通常会上传到hdfs中，container运行时再去下载下来。
enviroment：container运行时候需要设置的环境变量。
commands：container执行命令，是linux命令。ApplicationMasterSubmissionContext提交给RM之后就会来执行这个命令。

3、提交ApplicationSubmissionContext。调用submitApplication()方法，返回一个ApplicationId。

4、client通过applicationId 查询任务的运行情况，直到任务Application状态变为FINISHED。

二、源码分析：

实现以上四个步骤的用到的接口，但这些接口不是唯一的，还有其他接口也能实现一样的功能。

下面的client做的事情是提交一个ApplicationMaster的jar包。

1、向ResourceManager注册：

1 YarnClientApplication app = yarnClient.createApplication();
2 GetNewApplicationResponse appResponse = app.getNewApplicationResponse();

获知集群最大可用资源：

1 Resource clusterMax = appResponse.getMaximumResourceCapability();
2 int maxMem = clusterMax.getMemory();
3 int maxCore = clusterMax.getVirtualCores();

2、准备提交ApplicationMaster要运行的相关信息ApplicationSubmissioinContext：

 1 ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
 2 ContainerLaunchContext clc = Records.newRecord(ContainerLaunchContext.class);
 3 /*mem、vcores*/
 4 Resource amResource = Records.newRecord(Resource.class);
 5 amResource.setMemory(Math.min(clusterMax.getMemory(), 1024)); 
 6 amResource.setVirtualCores(Math.min(clusterMax.getVirtualCores(), 4));
 7 appContext.setResource(amResource);
 8 
 9 /*localResource*/
10 Map<String, LocalResource> localResourceMap = new HashMap<String, LocalResource>();
11 File appMasterJarFile = new File(appMasterJar);
12 localResourceMap.put(appMasterJarFile.getName(), toLocalResource(fs,appResponse.getApplicationId().toString(),appMasterJarFile));
13 clc.setLocalResources(localResourceMap);
14 
15 /*commands*/
16 StringBuilder cmd = new StringBuilder();
17 cmd.append("\"" + ApplicationConstants.Environment.JAVA_HOME.$() + "/bin/java\"")
18                 .append(" ")
19                 .append(appMasterMainClass)
20                 .append(" ");
21 
22 cmd.append("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT)
23                 .append(" ")
24                 .append("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);
25 clc.setCommands(Collections.singletonList(cmd.toString()));
26 
27 /*environment*/
28  Map<String, String> envMap = new HashMap<String, String>();
29 envMap.put("CLASSPATH", hadoopClassPath());
30 System.out.println(hadoopClassPath());
31 clc.setEnvironment(envMap);
32 /*将ContainerLauncherContext设置到ApplicationSubmissionContext中*/
33 appContext.setAMContainerSpec(clc);

3、提交任务：

1 ApplicationId appId = yarnClient.submitApplication(appContext);

4、查看任务运行情况，等待任务结束：

1 ApplicationReport report = client.getApplicationReport(appId);
2 while (report.getYarnApplicationState() != YarnApplicationState.FINISHED) {
3           report = client.getApplicationReport(applicationId);
4           LOG.info(String.format("%f %s", report.getProgress(), report.getYarnApplicationState()));
5           Thread.sleep(1000);
6 }

参考：

超易懂的例子，参考了大部分：http://www.qingpingshan.com/rjbc/dashuju/151240.html

董西城的讲解，如何编写Yarn应用程序：http://dongxicheng.org/mapreduce-nextgen/how-to-write-an-yarn-applicationmaster/

hadoop 的 distributedShell例子，运行guide：https://www.thesisscientist.com/docs/JeffBarner/04fae500-dfd5-4614-836d-70e1808e0aae.pdf

posted on 2018-07-26 20:42 今天天蓝蓝阅读(1162) 评论(0) 编辑收藏举报