Apache Spark 源代码分析之主节点和工作节点间协作流程

摘要

Spark 是一个高效的分布式计算框架，但想要更深入地学习它，就需要分析 Spark 的源代码，这不仅可以帮助更好地了解 Spark 的工作过程，还可以提高集群的故障排除能力。本文主要关注Spark Master的启动过程和Worker的启动过程。

Master Start

我们通过启动脚本 start-master.sh Shell 命令来启动 Master。脚本开始如下

start-master.sh  -> spark-daemon.sh start org.apache.spark.deploy.master.Master

我们可以看到脚本以 org.apache.spark.deploy.master.Master 类开头。启动时会传入一些参数，比如 cpu execution core, memory size, main method of app等。

查看Master类的main方法内容下面

private[spark] object Master extends Logging {
  val systemName = "sparkMaster"
  private val actorName = "Master"

  //master startup entry
  def main(argStrings: Array[String]) {
    SignalLogger.register(log)
    //Create SparkConf
    val conf = new SparkConf
    //Save parameters to SparkConf
    val args = new MasterArguments(argStrings, conf)
    //Create Actor System and Actor
    val (actorSystem, _, _, _) = startSystemAndActor(args.host, args.port, args.webUiPort, conf)
    //Waiting for the End
    actorSystem.awaitTermination()
  }

这里我们主要看一下startSystemAndActor

  /**
   * Start the Master and return a four tuple of:
   *   (1) The Master actor system
   *   (2) The bound port
   *   (3) The web UI bound port
   *   (4) The REST server bound port, if any
   */
  def startSystemAndActor(
      host: String,
      port: Int,
      webUiPort: Int,
      conf: SparkConf): (ActorSystem, Int, Int, Option[Int]) = {
    val securityMgr = new SecurityManager(conf)

    //Creating ActorSystem with AkkaUtils
    val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf,
      securityManager = securityMgr)

    val actor = actorSystem.actorOf(
      Props(classOf[Master], host, boundPort, webUiPort, securityMgr, conf), actorName)
   ....
  }
}

Spark 下层通讯使用Akka来实现

创建Actor->Actor系统。Actor 先通过 Actor System执行 Master 的构造方法 - >然后执行 Actor 生命周期方法

其中通过执行 Master 的构造函数来初始化部分变量

 private[spark] class Master(
    host: String,
    port: Int,
    webUiPort: Int,
    val securityMgr: SecurityManager,
    val conf: SparkConf)
  extends Actor with ActorLogReceive with Logging with LeaderElectable {
  //primary constructor

  //Enable timer function
  import context.dispatcher   // to use Akka's scheduler.schedule()

  val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)

  def createDateFormat = new SimpleDateFormat("yyyyMMddHHmmss")  // For application IDs
  //woker timeout
  val WORKER_TIMEOUT = conf.getLong("spark.worker.timeout", 60) * 1000
  val RETAINED_APPLICATIONS = conf.getInt("spark.deploy.retainedApplications", 200)
  val RETAINED_DRIVERS = conf.getInt("spark.deploy.retainedDrivers", 200)
  val REAPER_ITERATIONS = conf.getInt("spark.dead.worker.persistence", 15)
  val RECOVERY_MODE = conf.get("spark.deploy.recoveryMode", "NONE")

  //A HashSet is used to save WorkerInfo
  val workers = new HashSet[WorkerInfo]
  //A HashMap saves workid - > WorkerInfo
  val idToWorker = new HashMap[String, WorkerInfo]
  val addressToWorker = new HashMap[Address, WorkerInfo]

  //A HashSet is used to save tasks submitted by the client (SparkSubmit)
  val apps = new HashSet[ApplicationInfo]
  //A HashMap Appid - "Application Info"
  val idToApp = new HashMap[String, ApplicationInfo]
  val actorToApp = new HashMap[ActorRef, ApplicationInfo]
  val addressToApp = new HashMap[Address, ApplicationInfo]
  //App Waiting for Scheduling
  val waitingApps = new ArrayBuffer[ApplicationInfo]
  val completedApps = new ArrayBuffer[ApplicationInfo]
  var nextAppNumber = 0
  val appIdToUI = new HashMap[String, SparkUI]

  //Save DriverInfo
  val drivers = new HashSet[DriverInfo]
  val completedDrivers = new ArrayBuffer[DriverInfo]
  val waitingDrivers = new ArrayBuffer[DriverInfo] // Drivers currently spooled for scheduling

当主构造函数完成执行时，它会执行 preStart --“并接收方法。

  //Start timer and check timeout worker
  //Focus on CheckForWorkerTime Out
  context.system.scheduler.schedule(0 millis, WORKER_TIMEOUT millis, self, CheckForWorkerTimeOut)

在 preStart 方法中，创建一个计时器来检查 Woker 的超时值 WORKER_TIMEOUT = conf. getLong("spark. worker. timeout", 60)* 1000 默认为 60 秒。

正如我们所看到的，Master 初始化的主要过程是构造一个 Master Actor 来等待消息，初始化一个集合来保存 Worker 信息，并使用计时器检查 Worker 的超时。

Master Start 序列图

Woker Start-up

执行salves.sh - 通过 Shell 脚本>，通过读取slaves 来开启remote worker，并通过 ssh

spark-daemon.sh 启动 org.apache.spark.deploy.worker.worker

该脚本启动 org.apache.spark.deploy.worker.Worker 类

查看工作线程源代码

private[spark] object Worker extends Logging {
  //Worker Start Entry
  def main(argStrings: Array[String]) {
    SignalLogger.register(log)
    val conf = new SparkConf
    val args = new WorkerArguments(argStrings, conf)
    //New Actor System and Actor
    val (actorSystem, _) = startSystemAndActor(args.host, args.port, args.webUiPort, args.cores,
      args.memory, args.masters, args.workDir)
    actorSystem.awaitTermination()
  }

The most important thing here is Woker's Start SystemAndActor.

这里最重要的是Woker的startSystemAndActor

。

  def startSystemAndActor(
      host: String,
      port: Int,
      webUiPort: Int,
      cores: Int,
      memory: Int,
      masterUrls: Array[String],
      workDir: String,
      workerNumber: Option[Int] = None,
      conf: SparkConf = new SparkConf): (ActorSystem, Int) = {

    // The LocalSparkCluster runs multiple local sparkWorkerX actor systems
    val systemName = "sparkWorker" + workerNumber.map(_.toString).getOrElse("")
    val actorName = "Worker"
    val securityMgr = new SecurityManager(conf)
    //Through Akka Utils Actor System
    val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port,
      conf = conf, securityManager = securityMgr)
    val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, AkkaUtils.protocol(actorSystem)))
    //Create Actor Worker-"Execution Constructor-" preStart-"Recice through actorSystem.actorOf
    actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, cores, memory,
      masterAkkaUrls, systemName, actorName,  workDir, conf, securityMgr), name = actorName)
    (actorSystem, boundPort)
  }

在这里，Worker 还构造了一个属于 Worker 的 Actor 对象，并且 Worker 启动的初始化就完成了。

Worker 和Master 通信

Worker 的 preStart 方法根据 Actor 生命周期调用

  override def preStart() {
    assert(!registered)
    logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
      host, port, cores, Utils.megabytesToString(memory)))
    logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
    logInfo("Spark home: " + sparkHome)
    createWorkDir()
    context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
    shuffleService.startIfEnabled()
    webUi = new WorkerWebUI(this, workDir, webUiPort)
    webUi.bind()

    //Worker registers with Master
    registerWithMaster()
    ....
  }

这里我们调用 registerWithMaster 方法并开始注册Master。

 def registerWithMaster() {
    // DisassociatedEvent may be triggered multiple times, so don't attempt registration
    // if there are outstanding registration attempts scheduled.
    registrationRetryTimer match {
      case None =>
        registered = false
        //Start registration
        tryRegisterAllMasters()
        ....
    }
  }

tryRegisterAllMasters 方法通过在 registerWithMaster匹配结果来调用

  private def tryRegisterAllMasters() {
    //Traversing the address of the master
    for (masterAkkaUrl <- masterAkkaUrls) {
      logInfo("Connecting to master " + masterAkkaUrl + "...")
      //Connect Worker to Mater
      val actor = context.actorSelection(masterAkkaUrl)
      //Send registration information to Master
      actor ! RegisterWorker(workerId, host, port, cores, memory, webUi.boundPort, publicAddress)
    }
  }

通过 master AkkaUrl 和 Master RegisterWorker 建立连接后（workerId、host、port、cores、memory、webUI. boundPort、publicAddress），Worker 向 Master

发送一条消息，其中包含参数、id、host、port、cpu 内核、内存等待

override def receiveWithLogging = {
    ......

    //Accept registration information from Worker
    case RegisterWorker(id, workerHost, workerPort, cores, memory, workerUiPort, publicAddress) =>
    {
      logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
        workerHost, workerPort, cores, Utils.megabytesToString(memory)))
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
        //Determine if the worker has been registered
      } else if (idToWorker.contains(id)) {
        //If registered, tell worker that registration failed
        sender ! RegisterWorkerFailed("Duplicate worker ID")
      } else {
        //No registration, encapsulate the registration information from Worker into WorkerInfo
        val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
          sender, workerUiPort, publicAddress)
        if (registerWorker(worker)) {
          //Recording Worker's Information with a Persistence Engine
          persistenceEngine.addWorker(worker)
          //Feedback Worker to inform Worker of successful registration
          sender ! RegisteredWorker(masterUrl, masterWebUiUrl)

          schedule()
        } else {
          val workerAddress = worker.actor.path.address
          logWarning("Worker registration failed. Attempted to re-register worker at same " +
            "address: " + workerAddress)
          sender ! RegisterWorkerFailed("Attempted to re-register worker at same address: "
            + workerAddress)
        }
      }
    }

这是主要内容:ReciveWithLogging 轮询消息。当 Master 收到消息时，它会将参数封装为 WorkInfo 对象，将它们添加到集合中，然后将它们添加到持久性引擎中。sender ! RegisteredWorker(masterUrl, masterWebUiUrl)向工作线程发送消息反馈.接下来，查看 worker 的 receiveWithLogging

override def receiveWithLogging = {

    case RegisteredWorker(masterUrl, masterWebUiUrl) =>
      logInfo("Successfully registered with master " + masterUrl)
      registered = true
      changeMaster(masterUrl, masterWebUiUrl)
      //Start the timer and send Heartbeat at regular intervals
      context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
      if (CLEANUP_ENABLED) {
        logInfo(s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
        context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
          CLEANUP_INTERVAL_MILLIS millis, self, WorkDirCleanup)
      }

worker 从Master 接收有关注册成功的反馈，启动计时器，并定期发送检测信号。

    case SendHeartbeat =>
      //The purpose of worker sending heartbeat is to report live
      if (connected) { master ! Heartbeat(workerId) }

ReciveWithLogging on Master 接收检测信号消息

  override def receiveWithLogging = {
        ....
    case Heartbeat(workerId) => {
      idToWorker.get(workerId) match {
        case Some(workerInfo) =>
          //Update the last heartbeat time
          workerInfo.lastHeartbeat = System.currentTimeMillis()
          .....
      }
    }
 }

Record and update the last heartbeat time of workerInfo.lastHeartbeat = System.currentTimeMillis()

Master's scheduled tasks constantly send Worker information in a continuous polling set of CheckForWorkerTime Out internal messages, removing Worker information if it exceeds 60 seconds

记录并更新 workerInfo.lastHeartbeat = System.currentTimeMillis（）的上次检测信号时间

Master的计划任务在 CheckForWorkerTimeOut 内部消息的连续轮询集中不断发送工作线程信息，如果工作线程信息超过 60 秒，则删除该信息。

  //Check timeout Worker
    case CheckForWorkerTimeOut => {
      timeOutDeadWorkers()
    }

timeOutDeadWorkers 方法

  def timeOutDeadWorkers() {
    // Copy the workers into an array so we don't modify the hashset while iterating through it
    val currentTime = System.currentTimeMillis()
    val toRemove = workers.filter(_.lastHeartbeat < currentTime - WORKER_TIMEOUT).toArray
    for (worker <- toRemove) {
      if (worker.state != WorkerState.DEAD) {
        logWarning("Removing %s because we got no heartbeat in %d seconds".format(
          worker.id, WORKER_TIMEOUT/1000))
        removeWorker(worker)
      } else {
        if (worker.lastHeartbeat < currentTime - ((REAPER_ITERATIONS + 1) * WORKER_TIMEOUT)) {
          workers -= worker // we've seen this DEAD worker in the UI, etc. for long enough; cull it
        }
      }
    }
  }

如果（the last heartbeat time < current time-timeout time）被判断为工作线程超时，并从集合中删除信息。

 case None =>
          if (workers.map(_.id).contains(workerId)) {
            logWarning(s"Got heartbeat from unregistered worker $workerId." +
              " Asking it to re-register.")
            //Send a re-registered message
            sender ! ReconnectWorker(masterUrl)
          } else {
            logWarning(s"Got heartbeat from unregistered worker $workerId." +
              " This worker was never registered, so ignoring the heartbeat.")
          }

Worker 与Master 序列图

在Master 和Worker 启动后，一般的通信过程就到这里了，然后如何在集群上启动执行器进程计算任务。

posted @ 2014-09-18 11:36 JackYang 阅读(1503) 评论(0) 编辑收藏举报

刷新页面返回顶部