HadoopSourceAnalyse---ResourceMananger-Request Handle

Overview

在hadoop中有几大类资源要管理，为管理这些资源，hadoop定义了自己的,通信的协议, 下表是通用的请求格式

h	r		p	c
version	Service class		AuthMethod	Serialize type(0)
Body length
5 bytes protocol header tag
	5 bytes value
			More tags at least 3,(callId, RpcOp, RpcKind)
…..
5 bytes Request header tag
	5 bytes length
			Header body1

More tags (method name, protocol class, client protocol version)
5 bytes request body length
		Body contents

图 1-1

In hadoop use 5bytes to present a 32 int value. why?

下面分别处理，NodeManager request， client Request, aplicationmaster Request.， admin Request。

NodeManager Reuest handle

NodeManager Request 主要是node 发送到Resource manager 来注册，报告自己的状态信息。

Node Register To ResourceManger

当ResourceTrackerService 收到node 的register 请求时，首先取得请求node的ip，接收命令的 port 与 http port，创建一个本地对像，并注册到自己的context中，然后，将注册事件分发给dispatcher处理：

    RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,
        resolve(host), capability);

    RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
    if (oldNode == null) {
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeEvent(nodeId, RMNodeEventType.STARTED));
    } else {
      LOG.info("Reconnect from the node at: " + host);
      this.nmLivelinessMonitor.unregister(nodeId);
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeReconnectEvent(nodeId, rmNode));
    }

最后将该node 注册到monitor线程，去监听node的状态。

图 2-1

NodeHeartbeat handle

当ResourceTrackerService收到node的heartbeat 请求时，首先回check该 node是否已经注册，如没有，则会返回一个reboot 指令，请node 重新启动并先注册。若已经注册则向monitor线程发送一个ping信息，monitor将更新node活动的最后timestamp。

然后service检查node是否是限制访问的，如果是，刚发送”关机“指令，如果是收到的是一个重传的heartbeat请求，并且在该请求之前已经收到过更新的请求，则发送“重启”指令，service会偿试更新最新的sharekey，如果sharekey已经过期。

最后，更新，node到最新状态。

// 3. Check if it's a 'fresh' heartbeat i.e. not duplicate heartbeat
    NodeHeartbeatResponse lastNodeHeartbeatResponse = rmNode.getLastNodeHeartBeatResponse();
    if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse
        .getResponseId()) {
      LOG.info("Received duplicate heartbeat from node "
          + rmNode.getNodeAddress());
      return lastNodeHeartbeatResponse;
    } else if (remoteNodeStatus.getResponseId() + 1 < lastNodeHeartbeatResponse
        .getResponseId()) {
      LOG.info("Too far behind rm response id:"
          + lastNodeHeartbeatResponse.getResponseId() + " nm response id:"
          + remoteNodeStatus.getResponseId());
      // TODO: Just sending reboot is not enough. Think more.
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeEvent(nodeId, RMNodeEventType.REBOOTING));
      return resync;
    }

    // Heartbeat response
    NodeHeartbeatResponse nodeHeartBeatResponse = YarnServerBuilderUtils
        .newNodeHeartbeatResponse(lastNodeHeartbeatResponse.
            getResponseId() + 1, NodeAction.NORMAL, null, null, null,
            nextHeartBeatInterval);
    rmNode.updateNodeHeartbeatResponseForCleanup(nodeHeartBeatResponse);
    // Check if node's masterKey needs to be updated and if the currentKey has
    // roller over, send it across
    if (isSecurityEnabled()) {

      boolean shouldSendMasterKey = false;

      MasterKey nextMasterKeyForNode =
          this.containerTokenSecretManager.getNextKey();
      if (nextMasterKeyForNode != null) {
        // nextMasterKeyForNode can be null if there is no outstanding key that
        // is in the activation period.
        MasterKey nodeKnownMasterKey = request.getLastKnownMasterKey();
        if (nodeKnownMasterKey.getKeyId() != nextMasterKeyForNode.getKeyId()) {
          shouldSendMasterKey = true;
        }
      }
      if (shouldSendMasterKey) {
        nodeHeartBeatResponse.setMasterKey(nextMasterKeyForNode);
      }
    }

    // 4. Send status to RMNode, saving the latest response.
    this.rmContext.getDispatcher().getEventHandler().handle(
        new RMNodeStatusEvent(nodeId, remoteNodeStatus.getNodeHealthStatus(),
            remoteNodeStatus.getContainersStatuses(), 
            remoteNodeStatus.getKeepAliveApplications(), nodeHeartBeatResponse));

Client Request handle

Client Request 主要应用 ClientRMProtocol与resourceManager通信。 ClientRMProtocol是一个位于 client与server之间的proxy。

GetNewApplicationRequest

当client请求server运行application的时候，首先client需要一个appId，这时client向发送GetNewApplication 请求， server 收到请求，生成一个新的applicationId, 并把当前系统的最大/最小capacity 返回给client：

    response.setApplicationId(getNewApplicationId());
    // Pick up min/max resource from scheduler...
    response.setMinimumResourceCapability(scheduler
        .getMinimumResourceCapability());
    response.setMaximumResourceCapability(scheduler
        .getMaximumResourceCapability());

SubmitApplicationRequest

客户获得aplicationId之后，用这个id 来创建一个生成一个ApplicationSubmissionContext 对像，并填上task的信息，最后通过SubmitApplicationReques提交到ResourceManager,即，ClientRMService对像。 ClientRMService对像收到请求之后，从中取出ApplicationId 和 user信息，验证，applicationId是否已经提交过，如果已经提交，返回错误。否则，更新用户信息，然后, server 向Scheduler请求allocate 新的 container，如果是Resourcemanger 来管理container：

 if (!submissionContext.getUnmanagedAM()) {
        ResourceRequest amReq = BuilderUtils.newResourceRequest(
            RMAppAttemptImpl.AM_CONTAINER_PRIORITY, ResourceRequest.ANY,
            submissionContext.getResource(), 1);
        try {
          SchedulerUtils.validateResourceRequest(amReq,
              scheduler.getMaximumResourceCapability());
        } catch (InvalidResourceRequestException e) {
          LOG.warn("RM app submission failed in validating AM resource request"
              + " for application " + applicationId, e);
          throw RPCUtil.getRemoteException(e);
        }
      }

有了请求到container之后，Service 向ApplicationManger提交请求，（这个是同步调用）

 rmAppManager.handle(new RMAppManagerSubmitEvent(submissionContext, System
          .currentTimeMillis()));

ApplicationManager收到请求之后，从请求中取submissionContext 对像及请求提交时间，然后，取出applicationId，queue priority，applicationName，创建并注册一个新的aplication对像：

 application =
          new RMAppImpl(applicationId, rmContext, this.conf,
              submissionContext.getApplicationName(),
              submissionContext.getAMContainerSpec().getUser(),
              submissionContext.getQueue(),
              submissionContext, this.scheduler, this.masterService,
              submitTime);

      // Sanity check - duplicate?
      if (rmContext.getRMApps().putIfAbsent(applicationId, application) != 
          null) {
        String message = "Application with id " + applicationId
            + " is already present! Cannot add a duplicate!";
        LOG.info(message);
        throw RPCUtil.getRemoteException(message);
      }

然后再向applicatiomACLSManager注册ACL信息，并为applicaiton生成token，最后向后强线程提交请求，现在运行application所需要的信息已经完全建立起来：

 // All done, start the RMApp
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppEvent(applicationId, isRecovered ? RMAppEventType.RECOVER:
            RMAppEventType.START));

最后由RMAppImpl处理该请求，进入app statemachine（这个留着，单独再看）

图 2-2

关于clientRMService的客户端请求，多数的处理流程都是类似的，所有的和app相关的请求，最后都会由appimpl来处理，（关于app 状态机的处理，后面单独分析)。

ApplicationMasterService handle

ApplicationMaster Service 主要是客户端，用来注册Master对像的服务，Maste，会监视跟踪job的执行。对于master的操作共有以下几种：RegisterApplicationMaster, finishApplicationMaster, allocate:

RegisterApplicatiomMasterRequest Handle

当ApplicatiomMasterService收到请求时，首先会从请求中取出，ApplicationAttemptId，并检查其授权，然后通告 monitor线程更新AppAttempt时间标签，最后，注册新的事件到Dispatcher，该注册事件由ApplicationAttemptEventDispatcher处理；最终由相应RMApp对应的RMAppAtempImpl处理；进入AppAttempt状态机：

  this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppAttemptRegistrationEvent(applicationAttemptId, request
              .getHost(), request.getRpcPort(), request.getTrackingUrl()));

图 2-3

FinishApplicationMasterRequest

处理流和与上图类似，最后提交的event为 ApplicatiomAttempUnregistrationEvent。

AllocateRequest

这里MasterService 将检查当前系统的capacity以确定是否有中够的资源，并通Scheduler去请求分配运行Master的containers：

 // Send the status update to the appAttempt.
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppAttemptStatusupdateEvent(appAttemptId, request
              .getProgress()));

      List<ResourceRequest> ask = request.getAskList();
      List<ContainerId> release = request.getReleaseList();

      // sanity check
      try {
        SchedulerUtils.validateResourceRequests(ask,
            rScheduler.getMaximumResourceCapability());
      } catch (InvalidResourceRequestException e) {
        LOG.warn("Invalid resource ask by application " + appAttemptId, e);
        throw RPCUtil.getRemoteException(e);
      }
      // Send new requests to appAttempt.
      Allocation allocation =
          this.rScheduler.allocate(appAttemptId, ask, release);

图 2-4

Admin Request

Just ignore now!

posted on 2013-04-26 14:11 江山疯宇晴阅读(290) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

江山疯宇晴