江山疯宇晴

HadoopSourceAnalyse---ResourceMananger-Request Handle

Overview

在hadoop中有几大类资源要管理,为管理这些资源,hadoop定义了自己的,通信的协议, 下表是通用的请求格式

h

r

p

 

version

Service class

AuthMethod

Serialize type(0)

 

Body length

 

5 bytes protocol header tag

 

 

5 bytes value

 

More tags at least 3,(callId, RpcOp, RpcKind)

 

…..

 

5 bytes Request header tag

 

 

5 bytes length

 

 

Header body1

 

 

 

More tags (method name, protocol class, client protocol version)

 

5 bytes request body length

 

 

Body contents

 

 

 

 

 


图 1-1

In hadoop use 5bytes to present a 32 int value.   why?
下面分别处理,NodeManager request, client Request, aplicationmaster Request., admin Request。

NodeManager Reuest handle

NodeManager Request 主要是node 发送到Resource manager 来注册,报告自己的状态信息。

Node Register To ResourceManger

当ResourceTrackerService 收到node 的register 请求时,首先取得请求node的ip, 接收命令的 port 与 http port, 创建一个本地对像,并注册 到自己 的context中,然后,将注册事件分发给dispatcher处理:
    RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,
        resolve(host), capability);

    RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
    if (oldNode == null) {
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeEvent(nodeId, RMNodeEventType.STARTED));
    } else {
      LOG.info("Reconnect from the node at: " + host);
      this.nmLivelinessMonitor.unregister(nodeId);
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeReconnectEvent(nodeId, rmNode));
    }
最后将该node 注册到monitor线程,去监听node的状态。
图 2-1

NodeHeartbeat handle

当ResourceTrackerService收到node的heartbeat 请求时,首先回check该 node是否已经注册,如没有,则会返回一个reboot 指令,请node 重新启动并先注册。若已经注册则向monitor线程发送一个ping信息,monitor将更新node活动的最后timestamp。
然后service检查node是否是限制访问的,如果是,刚发送”关机“指令,如果是收到的是一个重传的heartbeat请求,并且在该请求之前已经收到过更新的请求,则发送“重启”指令,service会偿试更新最新的sharekey,如果sharekey已经过期。
最后,更新,node到最新状态。
// 3. Check if it's a 'fresh' heartbeat i.e. not duplicate heartbeat
    NodeHeartbeatResponse lastNodeHeartbeatResponse = rmNode.getLastNodeHeartBeatResponse();
    if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse
        .getResponseId()) {
      LOG.info("Received duplicate heartbeat from node "
          + rmNode.getNodeAddress());
      return lastNodeHeartbeatResponse;
    } else if (remoteNodeStatus.getResponseId() + 1 < lastNodeHeartbeatResponse
        .getResponseId()) {
      LOG.info("Too far behind rm response id:"
          + lastNodeHeartbeatResponse.getResponseId() + " nm response id:"
          + remoteNodeStatus.getResponseId());
      // TODO: Just sending reboot is not enough. Think more.
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMNodeEvent(nodeId, RMNodeEventType.REBOOTING));
      return resync;
    }

    // Heartbeat response
    NodeHeartbeatResponse nodeHeartBeatResponse = YarnServerBuilderUtils
        .newNodeHeartbeatResponse(lastNodeHeartbeatResponse.
            getResponseId() + 1, NodeAction.NORMAL, null, null, null,
            nextHeartBeatInterval);
    rmNode.updateNodeHeartbeatResponseForCleanup(nodeHeartBeatResponse);
    // Check if node's masterKey needs to be updated and if the currentKey has
    // roller over, send it across
    if (isSecurityEnabled()) {

      boolean shouldSendMasterKey = false;

      MasterKey nextMasterKeyForNode =
          this.containerTokenSecretManager.getNextKey();
      if (nextMasterKeyForNode != null) {
        // nextMasterKeyForNode can be null if there is no outstanding key that
        // is in the activation period.
        MasterKey nodeKnownMasterKey = request.getLastKnownMasterKey();
        if (nodeKnownMasterKey.getKeyId() != nextMasterKeyForNode.getKeyId()) {
          shouldSendMasterKey = true;
        }
      }
      if (shouldSendMasterKey) {
        nodeHeartBeatResponse.setMasterKey(nextMasterKeyForNode);
      }
    }

    // 4. Send status to RMNode, saving the latest response.
    this.rmContext.getDispatcher().getEventHandler().handle(
        new RMNodeStatusEvent(nodeId, remoteNodeStatus.getNodeHealthStatus(),
            remoteNodeStatus.getContainersStatuses(), 
            remoteNodeStatus.getKeepAliveApplications(), nodeHeartBeatResponse));

Client Request handle

Client Request 主要应用 ClientRMProtocol与resourceManager通信。 ClientRMProtocol是一个位于 client与server之间的proxy。

GetNewApplicationRequest

当client请求server运行application的时候,首先client需要一个appId,这时client向发送GetNewApplication 请求, server 收到请求,生成一个新的applicationId, 并把当前系统的最大/最小capacity 返回给client:
    response.setApplicationId(getNewApplicationId());
    // Pick up min/max resource from scheduler...
    response.setMinimumResourceCapability(scheduler
        .getMinimumResourceCapability());
    response.setMaximumResourceCapability(scheduler
        .getMaximumResourceCapability());       

SubmitApplicationRequest

客户获得aplicationId之后,用这个id 来创建一个生成一个ApplicationSubmissionContext 对像,并填上task的信息,最后通过SubmitApplicationReques提交到ResourceManager,即,ClientRMService对像。 ClientRMService对像收到请求之后,从中取出ApplicationId 和 user信息,验证,applicationId是否已经提交过,如果已经提交,返回错误。否则,更新用户信息,然后, server 向Scheduler请求allocate 新的 container,如果是Resourcemanger 来管理container:
 if (!submissionContext.getUnmanagedAM()) {
        ResourceRequest amReq = BuilderUtils.newResourceRequest(
            RMAppAttemptImpl.AM_CONTAINER_PRIORITY, ResourceRequest.ANY,
            submissionContext.getResource(), 1);
        try {
          SchedulerUtils.validateResourceRequest(amReq,
              scheduler.getMaximumResourceCapability());
        } catch (InvalidResourceRequestException e) {
          LOG.warn("RM app submission failed in validating AM resource request"
              + " for application " + applicationId, e);
          throw RPCUtil.getRemoteException(e);
        }
      }

有了请求到container之后,Service 向ApplicationManger提交请求,(这个是同步调用)
 rmAppManager.handle(new RMAppManagerSubmitEvent(submissionContext, System
          .currentTimeMillis()));
ApplicationManager收到请求之后,从请求中取submissionContext 对像及请求提交时间, 然后,取出applicationId,queue priority,applicationName,创建并注册一个新的aplication对像:
 application =
          new RMAppImpl(applicationId, rmContext, this.conf,
              submissionContext.getApplicationName(),
              submissionContext.getAMContainerSpec().getUser(),
              submissionContext.getQueue(),
              submissionContext, this.scheduler, this.masterService,
              submitTime);

      // Sanity check - duplicate?
      if (rmContext.getRMApps().putIfAbsent(applicationId, application) != 
          null) {
        String message = "Application with id " + applicationId
            + " is already present! Cannot add a duplicate!";
        LOG.info(message);
        throw RPCUtil.getRemoteException(message);
      } 
然后再向applicatiomACLSManager注册ACL信息,并为applicaiton生成token, 最后向后强线程提交请求,现在运行application所需要的信息已经完全建立起来:

 // All done, start the RMApp
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppEvent(applicationId, isRecovered ? RMAppEventType.RECOVER:
            RMAppEventType.START));
最后由RMAppImpl处理该 请求,进入app statemachine(这个留着,单独再看)

图 2-2

关于clientRMService的客户端请求,多数的处理流程都是类似的,所有的和app相关的请求,最后都会由appimpl来处理,(关于app 状态 机的处理,后面单独分析)。

ApplicationMasterService handle

ApplicationMaster Service 主要是客户端,用来注册Master对像的服务,Maste,会监视跟踪job的执行。 对于master的操作共有以下几种:RegisterApplicationMaster, finishApplicationMaster, allocate:

RegisterApplicatiomMasterRequest Handle

当ApplicatiomMasterService收到请求时,首先会从请求中取出,ApplicationAttemptId,并检查其授权,然后通告 monitor线程更新AppAttempt时间标签,最后,注册新的事件到Dispatcher,该注册事件由ApplicationAttemptEventDispatcher处理;最终由相应RMApp对应的RMAppAtempImpl处理;进入AppAttempt状态机:
  this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppAttemptRegistrationEvent(applicationAttemptId, request
              .getHost(), request.getRpcPort(), request.getTrackingUrl()));


图 2-3

FinishApplicationMasterRequest

处理流和与上图类似,最后提交的event为 ApplicatiomAttempUnregistrationEvent。

AllocateRequest

这里MasterService 将检查当前系统的capacity以确定是否有中够的资源,并通Scheduler去请求分配运行Master的containers:


 // Send the status update to the appAttempt.
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppAttemptStatusupdateEvent(appAttemptId, request
              .getProgress()));

      List<ResourceRequest> ask = request.getAskList();
      List<ContainerId> release = request.getReleaseList();

      // sanity check
      try {
        SchedulerUtils.validateResourceRequests(ask,
            rScheduler.getMaximumResourceCapability());
      } catch (InvalidResourceRequestException e) {
        LOG.warn("Invalid resource ask by application " + appAttemptId, e);
        throw RPCUtil.getRemoteException(e);
      }
      // Send new requests to appAttempt.
      Allocation allocation =
          this.rScheduler.allocate(appAttemptId, ask, release);




图 2-4


Admin Request

Just ignore now!

posted on 2013-04-26 14:11  江山疯宇晴  阅读(290)  评论(0编辑  收藏  举报

导航