|NO.Z.00087|——————————|BigDataEnd|——|Hadoop&Spark.V03|——|Spark.v03|Spark 原理源码|作业执行原理&Stage划分&DAGScheduler中的重要对象|

一、Stage划分

### --- Stage划分

~~~     Spark的任务调度从 DAG 划分开始，由 DAGScheduler 完成
~~~     DAGScheduler 根据 RDD 的血缘关系构成的 DAG 进行切分，将一个Job划分为若干Stages，
~~~     具体划分策略是：从最后一个RDD开始，
~~~     通过回溯依赖判断父依赖是否是宽依赖（即以Shuffle为界），划分Stage；
~~~     窄依赖的RDD之间被划分到同一个Stage中，可以进行 pipeline 式的计算
~~~     在向前搜索的过程中使用深度优先搜索算法
~~~     最后一个Stage称为ResultStage，其他的都是ShuffleMapStage
~~~     一个Stage是否被提交，需要判断它的父Stage是否执行。只有父Stage执行完毕才能提交当前Stage，
~~~     如果一个Stage没有父Stage，那么从该Stage开始提交
~~~     总体而言，DAGScheduler做的事情较为简单，
~~~     仅仅是在Stage层面上划分DAG，提交Stage并监控相关状态信息。

二、DAGScheduler中的重要对象

### --- DAGScheduler中的重要对象

~~~     DAGSchedulerEventProcessLoop： DAGScheduler内部的事件循环处理器，
~~~     用于处理DAGSchedulerEvent类型的事件。DAGSchedulerEventProcessLoop 实现了自 EventLoop。

### --- EventLoop是个消息异步处理策略抽象类（abstract class）

~~~     内置了一个消息队列（双端队列） eventQueue: LinkedBlockingDeque[E]，
~~~     配合实现消息存储、消息消费使用
~~~     内置了一个消费线程eventThread，消费线程消费队列中的消息，
~~~     消费处理接口函数是onReceive(event: E)，消费异常函数接口onError(e: Throwable)
~~~     对外开放了接收消息的post方法：接收到外部消息并存入队列，等待被消费消费线程启动方法start。
~~~     在调用线程启动方法：eventThread.start()之前，需要调用onStart()为启动做准备接口函数
~~~     消费线程停止方法stop。
~~~     在调用线程停止方法：eventThread.interrupt&eventThread.join()之后需要调用onStop()做补充接口函数

三、源码提取说明

### --- 源码提取说明

~~~     # 源码提取说明： EventLoop.scala
~~~     # 34行~80行
  private[spark] abstract class EventLoop[E](name: String) extends Logging {
    // 事件队列，双端队列
    private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()
    // 标记当前事件循环是否停止
    private val stopped = new AtomicBoolean(false)
    // 事件处理线程
    private val eventThread = new Thread(name) {
      // 设置为守护线程
      setDaemon(true)
      // 主要的run()方法
      override def run(): Unit = {
        try {
          while (!stopped.get) {
            // 从事件队列中取出事件
            val event = eventQueue.take()
            try {
              // 交给onReceive()方法处理
              onReceive(event)
            } catch { // 异常处理
              case NonFatal(e) => // 非致命异常
                try {
                  // 回调给onError()方法处理
                  onError(e)
                } catch {
                  case NonFatal(e) => logError("Unexpected error in " + name, e)
                }
            }
          }
        } catch { // 中断等其他异常
          case ie: InterruptedException => // exit even if eventQueue is not empty
          case NonFatal(e) => logError("Unexpected error in " + name, e)
        }
      }
    }
    // 启动当前事件循环
    def start(): Unit = {
      // 判断是否已被停止，被停止的事件循环无法被启动
      if (stopped.get) {
        throw new IllegalStateException(name + " has already been stopped")
      }
      // Call onStart before starting the event thread to make sure it happens before onReceive
      // 调用onStart()方法通知事件循环启动了，onStart()方法由子类实现
      onStart()
      // 启动事件处理线程
      eventThread.start()
    }
    // 停止当前事件循环
    def stop(): Unit = {
      // CAS方式修改stopped为true，标识事件循环被停止
      if (stopped.compareAndSet(false, true)) {
        // 中断事件处理线程
        eventThread.interrupt()
        // 标识是否调用了onStop()方法
        var onStopCalled = false
        try {
          // 对事件处理线程进行join，等待它完成
          eventThread.join()
          // Call onStop after the event thread exits to make sure onReceive happens before onStop
          // 标记onStopCalled并调用onStop()方法通知事件循环停止了，onStop()方法由子类实现
          onStopCalled = true
          onStop()
        } catch {
          case ie: InterruptedException =>
            Thread.currentThread().interrupt()
            if (!onStopCalled) {
              // 如果join过程中出现中断异常，则直接调用onStop()方法
              // ie is thrown from `eventThread.join()`. Otherwise, we should not call `onStop` since
                // it's already called.
                onStop()
            }
        }
      } else {
        // Keep quiet to allow calling `stop` multiple times.
      }
    }

~~~     # 源码提取说明： EventLoop.scala
~~~     # 81行~121行

    /**
     * Put the event into the event queue. The event thread will process it later.
     * 投递事件，会放入eventQueue事件队列
     */
    def post(event: E): Unit = {
      eventQueue.put(event)
    }
    /**
     * Return if the event thread has already been started but not yet stopped.
     *
     * 判断事件循环是否处于激活状态
     */
    def isActive: Boolean = eventThread.isAlive
    /**
     * Invoked when `start()` is called but before the event thread starts.
     * 表示事件循环启动了，需子类实现
     */
    protected def onStart(): Unit = {}
    /**
     * Invoked when `stop()` is called and the event thread exits.
     * 表示事件循环停止了，需子类实现
     */
    protected def onStop(): Unit = {}
    /**
     * Invoked in the event thread when polling events from the event queue.
     *
     * Note: Should avoid calling blocking actions in `onReceive`, or the event thread will be blocked
     * and cannot process events in time. If you want to call some blocking actions, run them in
     * another thread.
     *
     * 表示收到事件，需子类实现
     */
    protected def onReceive(event: E): Unit
    /**
     * Invoked if `onReceive` throws any non fatal error. Any non fatal error thrown from `onError`
     * will be ignored.
     *
     * 表示在处理事件时出现异常，需子类实现
     */
    protected def onError(e: Throwable): Unit
  }

四、源码提取说明

### --- 源码概述

~~~     JobWaiter实现了 JobListener 接口，等待 DAGScheduler 中的job计算完成。
~~~     每个 Task 结束后，通过回调函数，将对应结果传递给句柄函数 resultHandler 处理。
~~~     所有Tasks都完成时认为job完成。

### --- 源码提取说明

~~~     # 源码提取说明：JobWaiter.scala
~~~     # 30行~73行
  private[spark] class JobWaiter[T](
      dagScheduler: DAGScheduler,
      val jobId: Int,
      totalTasks: Int,
      resultHandler: (Int, T) => Unit) 
    extends JobListener with Logging {
    // 等待完成的Job中已经完成的Task数量
    private val finishedTasks = new AtomicInteger(0)
    // If the job is finished, this will be its result. In the case of 0 task jobs (e.g. zero
    // partition RDDs), we set the jobResult directly to JobSucceeded.
    /**
     * 用来代表Job完成后的结果。
     * 如果totalTasks等于零，说明没有Task需要执行，此时将被直接设置为Success。
     */
    private val jobPromise: Promise[Unit] = 
      if (totalTasks == 0) Promise.successful(()) else Promise()
    // Job是否已经完成
    def jobFinished: Boolean = jobPromise.isCompleted
    // 返回jobPromise的future
    def completionFuture: Future[Unit] = jobPromise.future
    
    /**
     * Sends a signal to the DAGScheduler to cancel the job. The cancellation itself ishandled
     * asynchronously. After the low level scheduler cancels all the tasks belonging to thisjob, it
     * will fail this job with a SparkException.
     *
     * 取消对Job的执行
     */
    
    def cancel() {
      // 使用DAGScheduler的cancelJob()方法来取消Job
      dagScheduler.cancelJob(jobId)
    }
    
    // Job执行成功后将调用该方法
    override def taskSucceeded(index: Int, result: Any): Unit = {
      // resultHandler call must be synchronized in case resultHandler itself is not threadsafe.
      synchronized { // 加锁进行回调
        resultHandler(index, result.asInstanceOf[T])
      } 
      // 完成Task数量自增，如果所有Task都完成了就调用JobPromise的success()方法
      if (finishedTasks.incrementAndGet() == totalTasks) {
        jobPromise.success(())
      }
    }
    
    // Job执行失败后将调用该方法
    override def jobFailed(exception: Exception): Unit = {
      // 调用jobPromise的相关方法将其设置为Failure
      if (!jobPromise.tryFailure(exception)) {
        logWarning("Ignore failure", exception)
      }
    }
  }

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart

——W.S.Landor

posted on 2022-04-12 13:48 yanqi_vip 阅读(26) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

· |NO.Z.00088|——————————|BigDataEnd|——|Hadoop&Spark.V04|——|Spark.v04|Spark 原理源码|作业执行原理&Stage划分&dagScheduler.submit 发送消息|

· Spark 任务划分&作业提交

· Spark 源码系列 - DAGScheduler 概述

· Spark 源码系列 - DAGScheduler 执行

阅读排行：
· 无需6万激活码！GitHub神秘组织3小时极速复刻Manus，手把手教你使用OpenManus搭建本
· Manus爆火，是硬核还是营销？
· 终于写完轮子一部分：tcp代理了，记录一下
· 别再用vector＜bool＞了！Google高级工程师：这可能是STL最大的设计失误
· 单元测试从入门到精通

|NO.Z.00087|——————————|BigDataEnd|——|Hadoop&Spark.V03|——|Spark.v03|Spark 原理源码|作业执行原理&Stage划分&DAGScheduler中的重要对象|

导航

统计

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (2759)

随笔档案 (2760)

阅读排行榜

推荐排行榜

|NO.Z.00087|——————————|BigDataEnd|——|Hadoop&Spark.V03|——|Spark.v03|Spark 原理 源码|作业执行原理&Stage划分&DAGScheduler中的重要对象|

导航

统计

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (2759)

随笔档案 (2760)

阅读排行榜

推荐排行榜

|NO.Z.00087|——————————|BigDataEnd|——|Hadoop&Spark.V03|——|Spark.v03|Spark 原理源码|作业执行原理&Stage划分&DAGScheduler中的重要对象|