YARN-SLS中RM如何获得NM和AM的heartbeat

在NMSimulator.java中:

  public void middleStep() throws Exception {
    // we check the lifetime for each running containers
    ContainerSimulator cs = null;
    synchronized(completedContainerList) {
      while ((cs = containerQueue.poll()) != null) {
        runningContainers.remove(cs.getId());
        completedContainerList.add(cs.getId());
        LOG.debug("Container {} has completed", cs.getId());
      }
    }
    
    // send heart beat
    NodeHeartbeatRequest beatRequest =
            Records.newRecord(NodeHeartbeatRequest.class);
    beatRequest.setLastKnownNMTokenMasterKey(masterKey);
    NodeStatus ns = Records.newRecord(NodeStatus.class);
    
    ns.setContainersStatuses(generateContainerStatusList());
    ns.setNodeId(node.getNodeID());
    ns.setKeepAliveApplications(new ArrayList<ApplicationId>());
    ns.setResponseId(responseId++);
    ns.setNodeHealthStatus(NodeHealthStatus.newInstance(true, "", 0));

    //set node & containers utilization
    if (resourceUtilizationRatio > 0 && resourceUtilizationRatio <=1) {
      int pMemUsed = Math.round(node.getTotalCapability().getMemorySize()
          * resourceUtilizationRatio);
      float cpuUsed = node.getTotalCapability().getVirtualCores()
          * resourceUtilizationRatio;
      ResourceUtilization resourceUtilization = ResourceUtilization.newInstance(
          pMemUsed, pMemUsed, cpuUsed);
      ns.setContainersUtilization(resourceUtilization);
      ns.setNodeUtilization(resourceUtilization);
    }
    beatRequest.setNodeStatus(ns);
    NodeHeartbeatResponse beatResponse =
        rm.getResourceTrackerService().nodeHeartbeat(beatRequest);

NM和AM类都继承了TaskRunner类,然后重写了firstStep(), middleStep()和lastStep()方法。

其中NM的middleStep方法中,向RM发送heartbeat,可以看出NM是直接调用了RM的getResourceTrackerService中的nodeHeartbeat方法来产生心跳。

而在实际YARN部署环境中,通过NM通过RPC机制(底层仍然是TCP)远程调用RM的nodeHeartbaet方法。


对于AM:

其实AM的传输心跳的方式我一直没找清楚,只是估计是这个方法:(后面再进行验证)

AMSimulator.java中:

  @Override
  public void middleStep() throws Exception {
    if (isAMContainerRunning) {
      // process responses in the queue
      processResponseQueue();

      // send out request
      sendContainerRequest();

      // check whether finish
      checkStop();
    }
  }

同样是middleStep中,调用了sendContainerRequest方法,该方法在MRAMSimulator.java和StreamAMSimulator.java中进行了重载:

MRAMSimulator.java

......

    final AllocateRequest request = createAllocateRequest(ask);
    if (totalContainers == 0) {
      request.setProgress(1.0f);
    } else {
      request.setProgress((float) finishedContainers / totalContainers);
    }

    UserGroupInformation ugi =
            UserGroupInformation.createRemoteUser(appAttemptId.toString());
    Token<AMRMTokenIdentifier> token = rm.getRMContext().getRMApps()
            .get(appAttemptId.getApplicationId())
            .getRMAppAttempt(appAttemptId).getAMRMToken();
    ugi.addTokenIdentifier(token.decodeIdentifier());
    AllocateResponse response = ugi.doAs(
            new PrivilegedExceptionAction<AllocateResponse>() {
      @Override
      public AllocateResponse run() throws Exception {
        return rm.getApplicationMasterService().allocate(request);
      }
    });
    if (response != null) {
      responseQueue.put(response);
    }
  }
AM通过rm.getRMContext().getRMApps()方法通信。也是直接调用RM的方式。


可以看出,SLS代码真的非常简单。

posted on 2018-07-12 22:20  sichenzhao  阅读(233)  评论(0编辑  收藏  举报

导航