Hadoop1.2.1源码解析系列:JT与TT之间的心跳通信机制——JT篇
上一篇浅析了Hadoop心跳机制的TT(TaskTracker)方面,这一篇浅析下JT(JobTracker)方面。
我们知道心跳是TT通过RPC请求调用JT的heartbeat()方法的,TT在调用JT的heartbeat回收集自身的状态信息封装到TaskTrackerStatus对象中,传递给JT。下面看看JT如何处理来自TT的心跳。
1.JobTracker.heartbeat():
// Make sure heartbeat is from a tasktracker allowed by the jobtracker. if (!acceptTaskTracker(status)) { throw new DisallowedTaskTrackerException(status); }第一步是检查发送心跳请求的TT是否属于可允许的TT,这个是根据一个HostsFileReader对象进行判断的,该对象是在实例化JT的时候创建的,这个类保存了两个队列,分别是includes和excludes队列,includes表示可以访问的host列表,excludes表示不可访问的host列表,这两个列表的内容根据两个mapred.hosts和mapred.hosts.exclude(mapred-site,xml中,默认是null)这两个参数指定的文件名读取的。具体可参考JT源码1956行。
2.JobTracker.heartbeat():
String trackerName = status.getTrackerName(); long now = clock.getTime(); if (restarted) { faultyTrackers.markTrackerHealthy(status.getHost()); } else { faultyTrackers.checkTrackerFaultTimeout(status.getHost(), now); }这一步是检查TT是否重启,是重启的话标识该TT的状态为健康的,否则检查TT的健康状态。faultyTrackers.markTrackerHealthy(status.getHost())内部将该TT所在的Host上所有的TT(从这里可以看出hadoop考虑到一个Host上可能存在多个TT的可能)从黑名单,灰名单和可能存在错误的列表上删除,也就是从potentiallyFaultyTrackers队列中移除该Host,通过更新JT的numGraylistedTrackers/numBlacklistedTrackers数量以及JT的totalMapTaskCapacity和totalReduceTaskCapacity数量。至于如何检查TT健康状态,具体是根据JT上记录的关于TT执行任务失败的次数来判断的(具体不是太理解)。
3.JobTracker.heartbeat():
HeartbeatResponse prevHeartbeatResponse = trackerToHeartbeatResponseMap.get(trackerName); boolean addRestartInfo = false; if (initialContact != true) { // If this isn't the 'initial contact' from the tasktracker, // there is something seriously wrong if the JobTracker has // no record of the 'previous heartbeat'; if so, ask the // tasktracker to re-initialize itself. if (prevHeartbeatResponse == null) { // This is the first heartbeat from the old tracker to the newly // started JobTracker if (hasRestarted()) { addRestartInfo = true; // inform the recovery manager about this tracker joining back recoveryManager.unMarkTracker(trackerName); } else { // Jobtracker might have restarted but no recovery is needed // otherwise this code should not be reached LOG.warn("Serious problem, cannot find record of 'previous' " + "heartbeat for '" + trackerName + "'; reinitializing the tasktracker"); return new HeartbeatResponse(responseId, new TaskTrackerAction[] {new ReinitTrackerAction()}); } } else { // It is completely safe to not process a 'duplicate' heartbeat from a // {@link TaskTracker} since it resends the heartbeat when rpcs are // lost see {@link TaskTracker.transmitHeartbeat()}; // acknowledge it by re-sending the previous response to let the // {@link TaskTracker} go forward. if (prevHeartbeatResponse.getResponseId() != responseId) { LOG.info("Ignoring 'duplicate' heartbeat from '" + trackerName + "'; resending the previous 'lost' response"); return prevHeartbeatResponse; } } }此处第一句从JT记录的HeartbeatResponse队列中获取该TT的HeartbeatResponse信息,即判断JT之前是否收到过该TT的心跳请求。如果initialContact!=true,表示TT不是首次连接JT,同时如果prevHeartbeatResponse==null,根据注释可以知道如果TT不是首次连接JT,而且JT中并没有该TT之前的心跳请求信息,表明This is the first heartbeat from the old tracker to the newly started JobTracker。判断hasRestarted是否为true,hasRestarted是在JT初始化(initialize()方法)时,根据recoveryManager的shouldRecover来决定的,hasRestarted=shouldRecover,所以当需要进行job恢复时,addRestartInfo会被设置为true,即需要TT进行job恢复操作,同时从recoveryManager的recoveredTrackers队列中移除该TT。如果不需要进行任务恢复,则直接返回HeartbeatResponse,并对TT下重新初始化指令(后期介绍),注意此处返回的responseId还是原来的responseId,即responseId不变。上面说的都是prevHeartbeatResponse==null时的情况,下面说说prevHeartbeatResponse!=null时如何处理,当prevHeartbeatResponse!=null时会直接返回prevHeartbeatResponse,而忽略本次心跳请求。
4.JobTracker.heartbeat():
// Process this heartbeat short newResponseId = (short)(responseId + 1); status.setLastSeen(now); if (!processHeartbeat(status, initialContact, now)) { if (prevHeartbeatResponse != null) { trackerToHeartbeatResponseMap.remove(trackerName); } return new HeartbeatResponse(newResponseId, new TaskTrackerAction[] {new ReinitTrackerAction()}); }
首先将responseId+1,然后记录心跳发送时间。接着来看看processHeartbeat()方法。
5.JobTracker.processHeartbeat():
boolean seenBefore = updateTaskTrackerStatus(trackerName, trackerStatus);根据该TT的上一次心跳发送的状态信息更新JT的一些信息,如totalMaps,totalReduces,occupiedMapSlots,occupiedReduceSlots等,接着根据本次心跳发送的TT状态信息再次更新这些变量。
6.JobTracker.processHeartbeat():
TaskTracker taskTracker = getTaskTracker(trackerName); if (initialContact) { // If it's first contact, then clear out // any state hanging around if (seenBefore) { lostTaskTracker(taskTracker); } } else { // If not first contact, there should be some record of the tracker if (!seenBefore) { LOG.warn("Status from unknown Tracker : " + trackerName); updateTaskTrackerStatus(trackerName, null); return false; } }如果该TT是首次连接JT,且存在oldStatus,则表明JT丢失了TT,具体意思应该是JT在一段时间内与TT失去了联系,之后TT恢复了,所以发送心跳时显示首次连接。lostTaskTracker(taskTracker):会将该TT从所有的队列中移除,并将该TT上记录的job清除掉(kill掉),当然对那些已经完成的Job不会进行次操作。当TT不是首次连接到JT,但是JT却没有该TT的历史status信息,则表示JT对该TT未知,所以重新更新TaskTracker状态信息。
7.JobTracker.processHeartbeat():
updateTaskStatuses(trackerStatus); updateNodeHealthStatus(trackerStatus, timeStamp);更新Task和NodeHealth信息,较复杂。
8.JobTracker.heartbeat():如果processHeartbeat()返回false,则返回HeartbeatResponse(),并下达重新初始化TT指令。
// Initialize the response to be sent for the heartbeat HeartbeatResponse response = new HeartbeatResponse(newResponseId, null); List<TaskTrackerAction> actions = new ArrayList<TaskTrackerAction>(); boolean isBlacklisted = faultyTrackers.isBlacklisted(status.getHost()); // Check for new tasks to be executed on the tasktracker if (recoveryManager.shouldSchedule() && acceptNewTasks && !isBlacklisted) { TaskTrackerStatus taskTrackerStatus = getTaskTrackerStatus(trackerName); if (taskTrackerStatus == null) { LOG.warn("Unknown task tracker polling; ignoring: " + trackerName); } else { List<Task> tasks = getSetupAndCleanupTasks(taskTrackerStatus); if (tasks == null ) { tasks = taskScheduler.assignTasks(taskTrackers.get(trackerName)); } if (tasks != null) { for (Task task : tasks) { expireLaunchingTasks.addNewTask(task.getTaskID()); if(LOG.isDebugEnabled()) { LOG.debug(trackerName + " -> LaunchTask: " + task.getTaskID()); } actions.add(new LaunchTaskAction(task)); } } } }此处会实例化一个HeartbeatResponse对象,作为本次心跳的返回值,在初始化一个TaskTrackerAction队列,用于存放JT对TT下达的指令。首先需要判断recoveryManager的recoveredTrackers是否为空,即是否有需要回复的TT,然后根据TT心跳发送的acceptNewTasks值,即表明TT是否可接收新任务,并且该TT不在黑名单中,同上满足以上条件,则JT可以为TT分配任务。分配任务的选择方式是优先CleanipTask,然后是SetupTask,然后才是Map/Reduce Task。下面来看下getSetupAndCleanupTasks()方法。
9.JobTracker.getSetupAndCleanupTasks():
// Don't assign *any* new task in safemode if (isInSafeMode()) { return null; }如果集群处于safe模式,则不分配任务。
int maxMapTasks = taskTracker.getMaxMapSlots(); int maxReduceTasks = taskTracker.getMaxReduceSlots(); int numMaps = taskTracker.countOccupiedMapSlots(); int numReduces = taskTracker.countOccupiedReduceSlots(); int numTaskTrackers = getClusterStatus().getTaskTrackers(); int numUniqueHosts = getNumberOfUniqueHosts();计算TT的最大map/reduce slot,以及已占用的map/reduce slot,以及集群可使用的TT数量,和集群的host数量。
for (Iterator<JobInProgress> it = jobs.values().iterator(); it.hasNext();) { JobInProgress job = it.next(); t = job.obtainJobCleanupTask(taskTracker, numTaskTrackers, numUniqueHosts, true); if (t != null) { return Collections.singletonList(t); } }首先获取Job的Cleanup任务,每个Job有两个Cleanup任务,分别是map和reduce的。
for (Iterator<JobInProgress> it = jobs.values().iterator(); it.hasNext();) { JobInProgress job = it.next(); t = job.obtainTaskCleanupTask(taskTracker, true); if (t != null) { return Collections.singletonList(t); } }
然后获取一个Cleanup任务的TaskAttempt。
for (Iterator<JobInProgress> it = jobs.values().iterator(); it.hasNext();) { JobInProgress job = it.next(); t = job.obtainJobSetupTask(taskTracker, numTaskTrackers, numUniqueHosts, true); if (t != null) { return Collections.singletonList(t); } }然后在获取Job的setup任务。上面这三个全部是获取的map任务,而下面是获取reduce任务,方法基本一样。
如果该方法返回null,则表示没有cleanup或者setup任务需要执行,则执行map/reduce任务。
10.JobTracker.heartbeat():
if (tasks == null ) { tasks = taskScheduler.assignTasks(taskTrackers.get(trackerName)); }此处是使用TaskScheduler调度任务,一大难点,后期分析。
11.JobTracker.heartbeat():
if (tasks != null) { for (Task task : tasks) { expireLaunchingTasks.addNewTask(task.getTaskID()); if(LOG.isDebugEnabled()) { LOG.debug(trackerName + " -> LaunchTask: " + task.getTaskID()); } actions.add(new LaunchTaskAction(task)); } }生成一个LaunchTaskAction指令。
// Check for tasks to be killed List<TaskTrackerAction> killTasksList = getTasksToKill(trackerName); if (killTasksList != null) { actions.addAll(killTasksList); } // Check for jobs to be killed/cleanedup List<TaskTrackerAction> killJobsList = getJobsForCleanup(trackerName); if (killJobsList != null) { actions.addAll(killJobsList); } // Check for tasks whose outputs can be saved List<TaskTrackerAction> commitTasksList = getTasksToSave(status); if (commitTasksList != null) { actions.addAll(commitTasksList); }以上分别是下达kill task指令,kill/cleanedup job指令,commit task指令。以上四种指令,加上一个ReinitTackerAction,这是心跳JT对TT下达的所有五种指令,以后可以相信对其进行分析。
12.JobTracker.heartbeat():
// calculate next heartbeat interval and put in heartbeat response int nextInterval = getNextHeartbeatInterval(); response.setHeartbeatInterval(nextInterval); response.setActions( actions.toArray(new TaskTrackerAction[actions.size()])); // check if the restart info is req if (addRestartInfo) { response.setRecoveredJobs(recoveryManager.getJobsToRecover()); } // Update the trackerToHeartbeatResponseMap trackerToHeartbeatResponseMap.put(trackerName, response); // Done processing the hearbeat, now remove 'marked' tasks removeMarkedTasks(trackerName);剩下一些收尾工作,如计算下次发送心跳的时间,以及设置需要TT进行恢复的任务,更新trackerToHeartbeatResponseMap队列,移除标记的task。最后返回HeartbeatResponse对象,完成心跳请求响应。
到此JT的heartbeat()完成了,中间很多地方比较复杂,都没有去深追,以后有时间可以继续研究,如有错误,请不吝指教,谢谢