HadoopSourceAnalyse---RMAppAttempt FSM
Overview
RMAppAttempt state machine
图 1-1
APP_ACCEPTED Handle
RMAppAttempt 由RMApp创建并启动,向scheduler 提交靖求之后进入submited 状态。 scheduler 验证请求,并创建一个内部App对像并提交到queue,等待调度,向dispatcher 发送APP_ACCEPTED消息,最终该消息将由RMAppAttempt处理:(这里以CapacityScheduler为例)
FiCaSchedulerApp SchedulerApp = new FiCaSchedulerApp(applicationAttemptId, user, queue, queue.getActiveUsersManager(), rmContext); // Submit to the queue try { queue.submitApplication(SchedulerApp, user, queueName); } catch (AccessControlException ace) { LOG.info("Failed to submit application " + applicationAttemptId + " to queue " + queueName + " from user " + user, ace); this.rmContext.getDispatcher().getEventHandler().handle( new RMAppAttemptRejectedEvent(applicationAttemptId, ace.toString())); return; } applications.put(applicationAttemptId, SchedulerApp); LOG.info("Application Submission: " + applicationAttemptId + ", user: " + user + " queue: " + queue + ", currently active: " + applications.size()); rmContext.getDispatcher().getEventHandler().handle( new RMAppAttemptEvent(applicationAttemptId, RMAppAttemptEventType.APP_ACCEPTED));
收到该事件,状态机,会调用ScheduleTransition,将自己注册到执行等待队例,然后状态机进入scheduled状态,如果master是可管理的;
CONTAINER_ALLOCATED Handle
状态机进入该状态之后,系统将等待 NM node的下一次heartbeat消息,收到消之后,scheduler会检测该node的当前可用capacity,有capacity,将在该node上为App分配一个container 对像:
In LeafQueue
// Create the container if necessary Container container = getContainer(rmContainer, application, node, capability, priority); // something went wrong getting/creating the container if (container == null) { LOG.warn("Couldn't get container for allocation!"); return Resources.none(); } // Can we allocate a container on this node? int availableContainers = resourceCalculator.computeAvailableContainers(available, capability); if (availableContainers > 0) { // Allocate... // Did we previously reserve containers at this 'priority'? if (rmContainer != null){ unreserve(application, priority, node, rmContainer); } // Create container tokens in secure-mode if (UserGroupInformation.isSecurityEnabled()) { ContainerToken containerToken = createContainerToken(application, container); if (containerToken == null) { // Something went wrong... return Resources.none(); } container.setContainerToken(containerToken); } // Inform the application RMContainer allocatedContainer = application.allocate(type, node, priority, request, container); // Does the application need this resource? if (allocatedContainer == null) { return Resources.none(); } // Inform the node node.allocateContainer(application.getApplicationId(), allocatedContainer);第一个container 用来运行ApplicationMaster,
Container 分配成功之后,AppAttempt将向Scheduler请求已分配的container,并设定为Master container,
// Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST); // Set the masterContainer appAttempt.setMasterContainer(amContainerAllocation.getContainers().get( 0));
然后通知 state Store 保存当前App状态,AppAttempt 进入ALLOCATE_SAVING状态 保存完成之后,AppAttempt会收到一个 ATTEMP_SAVED通知。
ATTEMP_SAVED Handle
状态机收到该事件之后,开始加载并启动container,使得master得以开始运行:
private void launchAttempt(){ // Send event to launch the AM Container eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this)); }
private void launch() throws IOException { connect(); ContainerId masterContainerID = masterContainer.getId(); ApplicationSubmissionContext applicationContext = application.getSubmissionContext(); LOG.info("Setting up container " + masterContainer + " for AM " + application.getAppAttemptId()); ContainerLaunchContext launchContext = createAMContainerLaunchContext(applicationContext, masterContainerID); StartContainerRequest request = recordFactory.newRecordInstance(StartContainerRequest.class); request.setContainerLaunchContext(launchContext); request.setContainer(masterContainer); containerMgrProxy.startContainer(request); LOG.info("Done launching container " + masterContainer + " for AM " + application.getAppAttemptId()); }LAUNCHE 成功之后,会收到 LAUNCHED可件通知:
LAUNCHED Handle
收到LAUNCHED通知之后,AppAttempt向监视线程注册, 之后等待Master启动运行的消息,master 启动之后,必须要向ResourceManager注册自己, 这时Resourcemanager会把这个注册事件发给appAttempt处理,
REGISTERED Handle
AppAttempt 收到 register 消息之后,保存master运行的相关信息,(host, port, trackingurl)然后通知App:
// Let the app know appAttempt.eventHandler.handle(new RMAppEvent(appAttempt .getAppAttemptId().getApplicationId(), RMAppEventType.ATTEMPT_REGISTERED));
ApplicationMaster 注册之后, AM会一直发送heartbeat 消息,通过 调用ApplicationMasterService.allocate() 方法, 收到applicationMaster的heartbeat 消息之后,Scheduler会为先向RMContainer发送Acquired 事件更新已经为AM分配的container状态,RMContainer 状态更新之后发送ContainerAcquired事件通知RMAppAttempt,
CONTAINER_ACQIRED Handle
当RMAppAttempt 收到该事件后,把该container 所属的node加放自己的runnodes set中去。
appAttempt.ranNodes.add(acquiredEvent.getContainer().getNodeId());
UNREGSITERD Handle
当任务执行完成之后,AM会向 ApplicationMasterService 注销自己,AppAttempt会收到unregsitered 事件通知,appatempt会执行一系列的清除工作,最后退出。