Flink源码阅读(二)——checkpoint源码分析

前言

  在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程。当然本文只是分析做checkpoint的调度过程,只是尽量弄清楚整体的逻辑,没有弄清楚其实现细节,还是有遗憾的,后期还是努力去分析实现细节。文中若是有误,欢迎大伙留言指出

  本文基于Flink1.9。

1、参数设置

  1.1 有关checkpoint常见的参数如下:

1 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
2 env.enableCheckpointing(10000);   //默认是不开启的  
3 env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);  //默认为EXACTLY_ONCE
4 env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000);  //默认为0,最大值为1年
5 env.getCheckpointConfig().setCheckpointTimeout(150000);  //默认为10min
6 env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);  //默认为1

   上述参数的默认值可见flink-streaming-java*.jar中的CheckpointConfig.java,配置值是通过该类中私有configureCheckpointing()的jobGraph.setSnapshotSettings(settings)传递给runtime层的,更多设置也可以参见该类。

  1.2 参数分析

  这里着重分析enableCheckpointing()设置的baseInterval和minPauseBetweenCheckpoint之间的关系。为分析两者的关系,这里先给出源码中定义

1     /** The base checkpoint interval. Actual trigger time may be affected by the
2     * max concurrent checkpoints and minimum-pause values */
3     //checkpoint触发周期,时间触发时间还受maxConcurrentCheckpointAttempts和minPauseBetweenCheckpointsNanos影响
4     private final long baseInterval;
5     
6     /** The min time(in ns) to delay after a checkpoint could be triggered. Allows to
7      * enforce minimum processing time between checkpoint attempts */
8     //在可以触发checkpoint的时,两次checkpoint之间的时间间隔
9     private final long minPauseBetweenCheckpointsNanos;

   当baseInterval<minPauseBetweenCheckpoint时,在CheckpointCoordinator.java源码中定义如下:

1     // it does not make sense to schedule checkpoints more often then the desired
2     // time between checkpoints
3     long baseInterval = chkConfig.getCheckpointInterval();
4     if (baseInterval < minPauseBetweenCheckpoints) {
5         baseInterval = minPauseBetweenCheckpoints;
6     }

   从此可以看出,checkpoint的触发虽然设置为周期性的,但是实际触发情况,还得考虑minPauseBetweenCheckpoint和maxConcurrentCheckpointAttempts,若maxConcurrentCheckpointAttempts为1,就算满足触发时间也需等待正在执行的checkpoint结束。

2、checkpoint调用过程

  将JobGraph提交到Dispatcher后,会createJobManagerRunner和startJobManagerRunner,可以关注Dispatcher类中的createJobManagerRunner(...)方法。

  2.1 createJobManagerRunner阶段

  该阶段会创建一个JobManagerRunner实例,在该过程和checkpoint有关的是会启动listener去监听job的状态。

 1   #JobManagerRunner.java
 2     public JobManagerRunner(...) throws Exception {
 3 
 4         //..........
 5 
 6         // make sure we cleanly shut down out JobManager services if initialization fails
 7         try {
 8             //..........
 9             //加载JobGraph、library、leader选举等
10 
11             // now start the JobManager
12             //启动JobManager
13             this.jobMasterService = jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);
14         }
15         catch (Throwable t) {
16             //......
17         }
18     }
19     
20     //在DefaultJobMasterServiceFactory类的createJobMasterService()中新建一个JobMaster对象
21     //#JobMaster.java
22     public JobMaster(...) throws Exception {
23 
24         //........
25         //该方法中主要做了参数检查,slotPool的创建、slotPool的schedul的创建等一系列的事情
26         
27         //创建一个调度器
28         this.schedulerNG = createScheduler(jobManagerJobMetricGroup);
29         //......
30     }

   在创建调度器中核心的语句如下:

 1   //#LegacyScheduler.java中的LegacyScheduler()
 2     //创建ExecutionGraph
 3     this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup, checkNotNull(shuffleMaster), checkNotNull(partitionTracker));
 4   
 5 
 6     private ExecutionGraph createAndRestoreExecutionGraph(
 7         JobManagerJobMetricGroup currentJobManagerJobMetricGroup,
 8         ShuffleMaster<?> shuffleMaster,
 9         PartitionTracker partitionTracker) throws Exception {
10 
11         
12         ExecutionGraph newExecutionGraph = createExecutionGraph(currentJobManagerJobMetricGroup, shuffleMaster, partitionTracker);
13 
14         final CheckpointCoordinator checkpointCoordinator = newExecutionGraph.getCheckpointCoordinator();
15 
16         if (checkpointCoordinator != null) {
17             // check whether we find a valid checkpoint
18             //若state没有被恢复是否可以通过savepoint恢复
19             //......
20             }
21         }
22 
23         return newExecutionGraph;
24     }

   通过调用到达生成ExecutionGraph的核心类ExecutionGraphBuilder的在buildGraph()方法,其中该方法主要是生成ExecutionGraph和设置checkpoint,下面给出其中的核心代码:

 1     //..............
 2     //生成ExecutionGraph的核心方法,这里后期会详细分析
 3     executionGraph.attachJobGraph(sortedTopology);
 4     
 5     //.......................
 6         
 7     //在enableCheckpointing中设置CheckpointCoordinator
 8     executionGraph.enableCheckpointing(
 9         chkConfig,
10         triggerVertices,
11         ackVertices,
12         confirmVertices,
13         hooks,
14         checkpointIdCounter,
15         completedCheckpoints,
16         rootBackend,
17         checkpointStatsTracker);    

   在enableCheckpointing()方法中主要是创建了checkpoint失败是的manager、设置了checkpoint的核心类CheckpointCoordinator。

 1     //#ExecutionGraph.java
 2     public void enableCheckpointing(
 3             CheckpointCoordinatorConfiguration chkConfig,
 4             List<ExecutionJobVertex> verticesToTrigger,
 5             List<ExecutionJobVertex> verticesToWaitFor,
 6             List<ExecutionJobVertex> verticesToCommitTo,
 7             List<MasterTriggerRestoreHook<?>> masterHooks,
 8             CheckpointIDCounter checkpointIDCounter,
 9             CompletedCheckpointStore checkpointStore,
10             StateBackend checkpointStateBackend,
11             CheckpointStatsTracker statsTracker) {
12         //Job的状态必须为Created,
13         checkState(state == JobStatus.CREATED, "Job must be in CREATED state");
14         checkState(checkpointCoordinator == null, "checkpointing already enabled");
15         //checkpointing的不同状态
16         ExecutionVertex[] tasksToTrigger = collectExecutionVertices(verticesToTrigger);
17         ExecutionVertex[] tasksToWaitFor = collectExecutionVertices(verticesToWaitFor);
18         ExecutionVertex[] tasksToCommitTo = collectExecutionVertices(verticesToCommitTo);
19 
20         checkpointStatsTracker = checkNotNull(statsTracker, "CheckpointStatsTracker");
21         //checkpoint失败manager,若是checkpoint失败会根据设置来决定下一步
22         CheckpointFailureManager failureManager = new CheckpointFailureManager(
23             chkConfig.getTolerableCheckpointFailureNumber(),
24             new CheckpointFailureManager.FailJobCallback() {
25                 @Override
26                 public void failJob(Throwable cause) {
27                     getJobMasterMainThreadExecutor().execute(() -> failGlobal(cause));
28                 }
29 
30                 @Override
31                 public void failJobDueToTaskFailure(Throwable cause, ExecutionAttemptID failingTask) {
32                     getJobMasterMainThreadExecutor().execute(() -> failGlobalIfExecutionIsStillRunning(cause, failingTask));
33                 }
34             }
35         );
36 
37         // create the coordinator that triggers and commits checkpoints and holds the state
38         //checkpoint的核心类CheckpointCoordinator
39         checkpointCoordinator = new CheckpointCoordinator(
40             jobInformation.getJobId(),
41             chkConfig,
42             tasksToTrigger,
43             tasksToWaitFor,
44             tasksToCommitTo,
45             checkpointIDCounter,
46             checkpointStore,
47             checkpointStateBackend,
48             ioExecutor,
49             SharedStateRegistry.DEFAULT_FACTORY,
50             failureManager);
51 
52         // register the master hooks on the checkpoint coordinator
53         for (MasterTriggerRestoreHook<?> hook : masterHooks) {
54             if (!checkpointCoordinator.addMasterHook(hook)) {
55                 LOG.warn("Trying to register multiple checkpoint hooks with the name: {}", hook.getIdentifier());
56             }
57         }
58         //checkpoint统计
59         checkpointCoordinator.setCheckpointStatsTracker(checkpointStatsTracker);
60 
61         // interval of max long value indicates disable periodic checkpoint,
62         // the CheckpointActivatorDeactivator should be created only if the interval is not max value
63         //设置为Long.MAX_VALUE标识关闭周期性的checkpoint
64         if (chkConfig.getCheckpointInterval() != Long.MAX_VALUE) {
65             // the periodic checkpoint scheduler is activated and deactivated as a result of
66             // job status changes (running -> on, all other states -> off)
67             //只有在job的状态为running时,才会开启checkpoint的scheduler
68             //createActivatorDeactivator()创建一个listener监听器
69             //registerJobStatusListener()将listener加入监听器集合jobStatusListeners中
70             registerJobStatusListener(checkpointCoordinator.createActivatorDeactivator());
71         }
72     }
73     
74     
75     //#CheckpointCoordinator.java
76     / ------------------------------------------------------------------------
77     //  job status listener that schedules / cancels periodic checkpoints
78     // ------------------------------------------------------------------------
79     //创建一个listener监听器checkpointCoordinator.createActivatorDeactivator()
80     public JobStatusListener createActivatorDeactivator() {
81         synchronized (lock) {
82             if (shutdown) {
83                 throw new IllegalArgumentException("Checkpoint coordinator is shut down");
84             }
85 
86             if (jobStatusListener == null) {
87                 jobStatusListener = new CheckpointCoordinatorDeActivator(this);
88             }
89 
90             return jobStatusListener;
91         }
92     }

   至此,createJobManagerRunner阶段结束了,ExecutionGraph中checkpoint的配置就设置好了。

  2.2 startJobManagerRunner阶段

  在该阶段中,在获得leaderShip之后,就会启动startJobExecution,这里只给出调用涉及的类和方法:

1     //#JobManagerRunner.java类中
2     //grantLeadership(...)==>verifyJobSchedulingStatusAndStartJobManager(...)
3     //==>startJobMaster(...),该方法中核心代码为
4     startFuture = jobMasterService.start(new JobMasterId(leaderSessionId));
5     
6     //进一步调用#JobMaster.java类中的start()==>startJobExecution(...)    

   startJobExecution()方法是JobMaster类中的私有方法,具体代码分析如下:

 1   //----------------------------------------------------------------------------------------------
 2     // Internal methods
 3     //----------------------------------------------------------------------------------------------
 4 
 5     //-- job starting and stopping  -----------------------------------------------------------------
 6 
 7     private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
 8 
 9         validateRunsInMainThread();
10 
11         checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");
12 
13         if (Objects.equals(getFencingToken(), newJobMasterId)) {
14             log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);
15 
16             return Acknowledge.get();
17         }
18 
19         setNewFencingToken(newJobMasterId);
20         //启动slotPool并申请资源,该方法可以具体看看申请资源的过程
21         startJobMasterServices();
22 
23         log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);
24         //执行ExecuteGraph的切入口,先判断job的状态是否为created的,后调执行executionGraph.scheduleForExecution();
25         resetAndStartScheduler();
26 
27         return Acknowledge.get();
28     }

   在LegacyScheduler类中的方法scheduleForExecution()调度过程如下:

 1     public void scheduleForExecution() throws JobException {
 2 
 3         assertRunningInJobMasterMainThread();
 4 
 5         final long currentGlobalModVersion = globalModVersion;
 6         //任务执行之前进行状态切换从CREATED到RUNNING,
 7         //transitionState(...)方法中会通过notifyJobStatusChange(newState, error)通知jobStatusListeners集合中listeners状态改变
 8         if (transitionState(JobStatus.CREATED, JobStatus.RUNNING)) {
 9             //根据启动算子调度模式不同,采用不同的调度方案
10             final CompletableFuture<Void> newSchedulingFuture = SchedulingUtils.schedule(
11                 scheduleMode,
12                 getAllExecutionVertices(),
13                 this);
14             
15             //..............
16         }
17         else {
18             throw new IllegalStateException("Job may only be scheduled from state " + JobStatus.CREATED);
19         }
20     }
21     
22     private void notifyJobStatusChange(JobStatus newState, Throwable error) {
23         if (jobStatusListeners.size() > 0) {
24             final long timestamp = System.currentTimeMillis();
25             final Throwable serializedError = error == null ? null : new SerializedThrowable(error);
26 
27             for (JobStatusListener listener : jobStatusListeners) {
28                 try {
29                     listener.jobStatusChanges(getJobID(), newState, timestamp, serializedError);
30                 } catch (Throwable t) {
31                     LOG.warn("Error while notifying JobStatusListener", t);
32                 }
33             }
34         }
35     }
36     
37     
38     //#CheckpointCoordinatorDeActivator.java
39     public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {
40         if (newJobStatus == JobStatus.RUNNING) {
41             // start the checkpoint scheduler
42             //触发checkpoint的核心方法
43             coordinator.startCheckpointScheduler();
44         } else {
45             // anything else should stop the trigger for now
46             coordinator.stopCheckpointScheduler();
47         }
48     }

   下面具体分析触发checkpoint的核心方法startCheckpointScheduler()。

  startCheckpointScheduler()方法结合注释还是比较好理解的,但由于方法太长这里就不全部贴出来了,先分析一下大致做什么了,然后给出其核心代码:

  1)检查触发checkpoint的条件。如coordinator被关闭、周期性checkpoint被禁止、在没有开启强制checkpoint的情况下没有达到最小的checkpoint间隔以及超过并发的checkpoint个数等;

  2)检查是否所有需要checkpoint和需要响应checkpoint的ACK(的task都处于running状态,否则抛出异常;

  3)若均符合,执行checkpointID = checkpointIdCounter.getAndIncrement();以生成一个新的checkpointID,然后生成一个PendingCheckpoint。其中,PendingCheckpoint仅是一个启动了的checkpoint,但是还没有被确认,直到所有的task都确认了本次checkpoint,该checkpoint对象才转化为一个CompletedCheckpoint;

  4)调度timer清理失败的checkpoint;

  5)定义一个超时callback,如果checkpoint执行了很久还没完成,就把它取消;

  6)触发MasterHooks,用户可以定义一些额外的操作,用以增强checkpoint的功能(如准备和清理外部资源);

  核心代码如下:

1     // send the messages to the tasks that trigger their checkpoint
2     //遍历ExecutionVertex,是否异步触发checkpoint
3     for (Execution execution: executions) {
4         if (props.isSynchronous()) {
5             execution.triggerSynchronousSavepoint(checkpointID, timestamp, checkpointOptions, advanceToEndOfTime);
6         } else {
7             execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
8         }
9     }

   不管是否以异步的方式触发checkpoint,最终调用的方法是Execution类中的私有方法triggerCheckpointHelper(...),具体代码如下:

 1   //Execution.java
 2     private void triggerCheckpointHelper(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {
 3 
 4         final CheckpointType checkpointType = checkpointOptions.getCheckpointType();
 5         if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {
 6             throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");
 7         }
 8 
 9         final LogicalSlot slot = assignedResource;
10 
11         if (slot != null) {
12             //TaskManagerGateway是用于与taskManager通信的组件
13             final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
14 
15             taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions, advanceToEndOfEventTime);
16         } else {
17             LOG.debug("The execution has no slot assigned. This indicates that the execution is no longer running.");
18         }
19     }

   至此,checkpointCoordinator就将做checkpoint的命令发送到TaskManager去了,下面着重分析TM中checkpoint的执行过程。

  2.3 TaskManager中checkpoint

  TaskManager 接收到触发checkpoint的RPC后,会触发生成checkpoint barrier。RpcTaskManagerGateway作为消息入口,其triggerCheckpoint(...)会调用TaskExecutor的triggerCheckpoint(...),具体过程如下:

 1   //RpcTaskManagerGateway.java
 2     public void triggerCheckpoint(ExecutionAttemptID executionAttemptID, JobID jobId, long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {
 3         taskExecutorGateway.triggerCheckpoint(
 4             executionAttemptID,
 5             checkpointId,
 6             timestamp,
 7             checkpointOptions,
 8             advanceToEndOfEventTime);
 9     }
10     
11     //TaskExecutor.java
12     @Override
13     public CompletableFuture<Acknowledge> triggerCheckpoint(
14             ExecutionAttemptID executionAttemptID,
15             long checkpointId,
16             long checkpointTimestamp,
17             CheckpointOptions checkpointOptions,
18             boolean advanceToEndOfEventTime) {
19         log.debug("Trigger checkpoint {}@{} for {}.", checkpointId, checkpointTimestamp, executionAttemptID);
20 
21         //...........
22 
23         if (task != null) {
24             //核心方法,触发生成barrier
25             task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions, advanceToEndOfEventTime);
26 
27             return CompletableFuture.completedFuture(Acknowledge.get());
28         } else {
29             final String message = "TaskManager received a checkpoint request for unknown task " + executionAttemptID + '.';
30 
31             //.........
32         }
33     }

   在Task类的triggerCheckpointBarrier(...)方法中生成了一个Runable匿名类用于执行checkpoint,然后以异步的方式触发了该Runable,具体代码如下:

 1     public void triggerCheckpointBarrier(
 2             final long checkpointID,
 3             final long checkpointTimestamp,
 4             final CheckpointOptions checkpointOptions,
 5             final boolean advanceToEndOfEventTime) {
 6 
 7         final AbstractInvokable invokable = this.invokable;
 8         //创建一个CheckpointMetaData,该对象仅有checkpointID、checkpointTimestamp两个属性
 9         final CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointID, checkpointTimestamp);
10 
11         if (executionState == ExecutionState.RUNNING && invokable != null) {
12 
13             //..............
14 
15             Runnable runnable = new Runnable() {
16                 @Override
17                 public void run() {
18                     // set safety net from the task's context for checkpointing thread
19                     LOG.debug("Creating FileSystem stream leak safety net for {}", Thread.currentThread().getName());
20                     FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(safetyNetCloseableRegistry);
21 
22                     try {
23                         //根据SourceStreamTask和StreamTask调用不同的方法
24                         boolean success = invokable.triggerCheckpoint(checkpointMetaData, checkpointOptions, advanceToEndOfEventTime);
25                         if (!success) {
26                             checkpointResponder.declineCheckpoint(
27                                     getJobID(), getExecutionId(), checkpointID,
28                                     new CheckpointException("Task Name" + taskName, CheckpointFailureReason.CHECKPOINT_DECLINED_TASK_NOT_READY));
29                         }
30                     }
31                     catch (Throwable t) {
32                         if (getExecutionState() == ExecutionState.RUNNING) {
33                             failExternally(new Exception(
34                                 "Error while triggering checkpoint " + checkpointID + " for " +
35                                     taskNameWithSubtask, t));
36                         } else {
37                             LOG.debug("Encountered error while triggering checkpoint {} for " +
38                                 "{} ({}) while being not in state running.", checkpointID,
39                                 taskNameWithSubtask, executionId, t);
40                         }
41                     } finally {
42                         FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(null);
43                     }
44                 }
45             };
46             //以异步的方式触发Runnable
47             executeAsyncCallRunnable(
48                     runnable,
49                     String.format("Checkpoint Trigger for %s (%s).", taskNameWithSubtask, executionId));
50         }
51         else {
52             LOG.debug("Declining checkpoint request for non-running task {} ({}).", taskNameWithSubtask, executionId);
53 
54             // send back a message that we did not do the checkpoint
55             checkpointResponder.declineCheckpoint(jobId, executionId, checkpointID,
56                     new CheckpointException("Task name with subtask : " + taskNameWithSubtask, CheckpointFailureReason.CHECKPOINT_DECLINED_TASK_NOT_READY));
57         }
58     }

   SourceStreamTask和StreamTask调用triggerCheckpoint最终都是调用StreamTask类中的triggerCheckpoint(...)方法,其核心代码为:

1   //#StreamTask.java
2     return performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics, advanceToEndOfEventTime);

   在performCheckpoint(...)方法中,主要有以下两件事:

  1、若task是running,则可以进行checkpoint,主要有以下三件事:

    1)为checkpoint做准备,一般是什么不做的,直接接受checkpoint;

    2)生成barrier,并以广播的形式发射到下游去;

    3)触发本task保存state;

  2、若不是running,通知下游取消本次checkpoint,方法是发送一个CancelCheckpointMarker,这是类似于Barrier的另一种消息。

   具体代码如下:

 1   //#StreamTask.java
 2     private boolean performCheckpoint(
 3             CheckpointMetaData checkpointMetaData,
 4             CheckpointOptions checkpointOptions,
 5             CheckpointMetrics checkpointMetrics,
 6             boolean advanceToEndOfTime) throws Exception {
 7         //......
 8 
 9         synchronized (lock) {
10             if (isRunning) {
11 
12                 if (checkpointOptions.getCheckpointType().isSynchronous()) {
13                     syncSavepointLatch.setCheckpointId(checkpointId);
14 
15                     if (advanceToEndOfTime) {
16                         advanceToEndOfEventTime();
17                     }
18                 }
19 
20                 // All of the following steps happen as an atomic step from the perspective of barriers and
21                 // records/watermarks/timers/callbacks.
22                 // We generally try to emit the checkpoint barrier as soon as possible to not affect downstream
23                 // checkpoint alignments
24 
25                 // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.
26                 //           The pre-barrier work should be nothing or minimal in the common case.
27                 operatorChain.prepareSnapshotPreBarrier(checkpointId);
28 
29                 // Step (2): Send the checkpoint barrier downstream
30                 operatorChain.broadcastCheckpointBarrier(
31                         checkpointId,
32                         checkpointMetaData.getTimestamp(),
33                         checkpointOptions);
34 
35                 // Step (3): Take the state snapshot. This should be largely asynchronous, to not
36                 //           impact progress of the streaming topology
37                 checkpointState(checkpointMetaData, checkpointOptions, checkpointMetrics);
38 
39                 return true;
40             }
41             else {
42                 //.......
43             }
44         }
45     }

    接下来分析checkpointState(...)过程。

  checkpointState(...)方法最终会调用StreamTask类中executeCheckpointing(),其中会创建一个异步对象AsyncCheckpointRunnable,用以报告该检查点已完成,关键代码如下:

 1   //#StreamTask.java类中executeCheckpointing()
 2     public void executeCheckpointing() throws Exception {
 3             startSyncPartNano = System.nanoTime();
 4 
 5             try {
 6                 //调用StreamOperator进行snapshotState的入口方法,依算子不同而变
 7                 for (StreamOperator<?> op : allOperators) {
 8                     checkpointStreamOperator(op);
 9                 }
10                 //.........
11 
12                 // we are transferring ownership over snapshotInProgressList for cleanup to the thread, active on submit
13                 AsyncCheckpointRunnable asyncCheckpointRunnable = new AsyncCheckpointRunnable(
14                     owner,
15                     operatorSnapshotsInProgress,
16                     checkpointMetaData,
17                     checkpointMetrics,
18                     startAsyncPartNano);
19 
20                 owner.cancelables.registerCloseable(asyncCheckpointRunnable);
21                 owner.asyncOperationsThreadPool.execute(asyncCheckpointRunnable);
22 
23                 //.........
24             } catch (Exception ex) {
25                 //.......
26             }
27         }

   进入AsyncCheckpointRunnable(...)中的run()方法,其中会调用StreamTask类中reportCompletedSnapshotStates(...)(对于一个无状态的job返回的null),进而调用TaskStateManagerImpl类中的reportTaskStateSnapshots(...)将TM的checkpoint汇报给JM,关键代码如下:

1     //TaskStateManagerImpl.java
2     checkpointResponder.acknowledgeCheckpoint(
3             jobId,
4             executionAttemptID,
5             checkpointId,
6             checkpointMetrics,
7             acknowledgedState);

  其逻辑是逻辑是通过rpc的方式远程调JobManager的相关方法完成报告事件。

  2.4 JobManager处理checkpoint

  通过RpcCheckpointResponder类中acknowledgeCheckpoint(...)来响应checkpoint返回的消息,该方法之后的调度过程和涉及的核心方法如下:

 1    //#JobMaster类中acknowledgeCheckpoint==>
 2     //#LegacyScheduler类中acknowledgeCheckpoint==>
 3     //#CheckpointCoordinator类中receiveAcknowledgeMessage(...)==>
 4     //completePendingCheckpoint(checkpoint);
 5     
 6     //<p>Important: This method should only be called in the checkpoint lock scope
 7     private void completePendingCheckpoint(PendingCheckpoint pendingCheckpoint) throws CheckpointException {
 8         final long checkpointId = pendingCheckpoint.getCheckpointId();
 9         final CompletedCheckpoint completedCheckpoint;
10 
11         // As a first step to complete the checkpoint, we register its state with the registry
12         Map<OperatorID, OperatorState> operatorStates = pendingCheckpoint.getOperatorStates();
13         sharedStateRegistry.registerAll(operatorStates.values());
14 
15         try {
16             try {
17                 //完成checkpoint
18                 completedCheckpoint = pendingCheckpoint.finalizeCheckpoint();
19                 failureManager.handleCheckpointSuccess(pendingCheckpoint.getCheckpointId());
20             }
21             catch (Exception e1) {
22                 // abort the current pending checkpoint if we fails to finalize the pending checkpoint.
23                 if (!pendingCheckpoint.isDiscarded()) {
24                     failPendingCheckpoint(pendingCheckpoint, CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, e1);
25                 }
26 
27                 throw new CheckpointException("Could not finalize the pending checkpoint " + checkpointId + '.',
28                     CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, e1);
29             }
30 
31             // the pending checkpoint must be discarded after the finalization
32             Preconditions.checkState(pendingCheckpoint.isDiscarded() && completedCheckpoint != null);
33 
34             try {
35                 //添加新的checkpoints,若有必要(completedCheckpoints.size() > maxNumberOfCheckpointsToRetain)删除旧的
36                 completedCheckpointStore.addCheckpoint(completedCheckpoint);
37             } catch (Exception exception) {
38                 // we failed to store the completed checkpoint. Let's clean up
39                 executor.execute(new Runnable() {
40                     @Override
41                     public void run() {
42                         try {
43                             completedCheckpoint.discardOnFailedStoring();
44                         } catch (Throwable t) {
45                             LOG.warn("Could not properly discard completed checkpoint {}.", completedCheckpoint.getCheckpointID(), t);
46                         }
47                     }
48                 });
49 
50                 throw new CheckpointException("Could not complete the pending checkpoint " + checkpointId + '.',
51                     CheckpointFailureReason.FINALIZE_CHECKPOINT_FAILURE, exception);
52             }
53         } finally {
54             pendingCheckpoints.remove(checkpointId);
55 
56             triggerQueuedRequests();
57         }
58 
59         rememberRecentCheckpointId(checkpointId);
60 
61         // drop those pending checkpoints that are at prior to the completed one
62         //删除在其之前未完成的checkpoint(优先级高的)
63         dropSubsumedCheckpoints(checkpointId);
64 
65         // record the time when this was completed, to calculate
66         // the 'min delay between checkpoints'
67         lastCheckpointCompletionNanos = System.nanoTime();
68 
69         LOG.info("Completed checkpoint {} for job {} ({} bytes in {} ms).", checkpointId, job,
70             completedCheckpoint.getStateSize(), completedCheckpoint.getDuration());
71 
72         if (LOG.isDebugEnabled()) {
73             StringBuilder builder = new StringBuilder();
74             builder.append("Checkpoint state: ");
75             for (OperatorState state : completedCheckpoint.getOperatorStates().values()) {
76                 builder.append(state);
77                 builder.append(", ");
78             }
79             // Remove last two chars ", "
80             builder.setLength(builder.length() - 2);
81 
82             LOG.debug(builder.toString());
83         }
84 
85         // send the "notify complete" call to all vertices
86         final long timestamp = completedCheckpoint.getTimestamp();
87         
88         //通知所有(TM中)operator该checkpoint已完成
89         for (ExecutionVertex ev : tasksToCommitTo) {
90             Execution ee = ev.getCurrentExecutionAttempt();
91             if (ee != null) {
92                 ee.notifyCheckpointComplete(checkpointId, timestamp);
93             }
94         }
95     }

   至此,checkpoint的整体流程分析完毕建议结合原理去理解,参考的三篇文献都是写的很好的,有时间建议看看。

 

Ref:

[1]https://www.jianshu.com/p/a40a1b92f6a2

[2]https://www.cnblogs.com/bethunebtj/p/9168274.html

[3] https://blog.csdn.net/qq475781638/article/details/92698301

 

posted @ 2019-10-30 02:33  王大咩的图书馆  阅读(1733)  评论(1编辑  收藏  举报