Spark源码分析 – SchedulerBackend
SchedulerBackend, 两个任务, 申请资源和task执行和管理
对于SparkDeploySchedulerBackend, 基于actor模式, 主要就是启动和管理两个actor
Deploy.Client Actor, 负责资源申请, 在SparkDeploySchedulerBackend初始化的时候就会被创建, 然后Client会去到Master上注册, 最终完成在Worker上的ExecutorBackend的创建(参考, Spark源码分析 – Deploy), 并且这些ExecutorBackend都会被注册到Driver Actor上
Driver Actor, 负责task的执行
由于Spark是原先基于Mesos的, 然后为了兼容性才提供Standalone模式, 所以你可以看到Driver Actor中的接口都是mesos风格的, 在mesos的情况下应该是动态的申请资源, 然后执行task (猜测, 还没有看源码)
但对于coarse-grained Mesos mode和Spark's standalone deploy mode, 这步被简化成当TaskScheduler初始化的时候, 直接就将资源分配好了, 然后Driver Actor只是负责调度task在这些executor上执行
所以在makeOffers的注释上, 写的是Make fake resource offers, 因为这里其实没有真正的offer resources
关于Driver Actor如何调用task去执行, 关键在scheduler.resourceOffers
SchedulerBackend
package org.apache.spark.scheduler.cluster /** * A backend interface for cluster scheduling systems that allows plugging in different ones under * ClusterScheduler. We assume a Mesos-like model where the application gets resource offers as * machines become available and can launch tasks on them. */ private[spark] trait SchedulerBackend { def start(): Unit def stop(): Unit def reviveOffers(): Unit def defaultParallelism(): Int // Memory used by each executor (in megabytes) protected val executorMemory: Int = SparkContext.executorMemoryRequested // TODO: Probably want to add a killTask too }
StandaloneSchedulerBackend
用于coarse-grained Mesos mode和Spark's standalone deploy mode
可用看到主要目的, 就是创建并维护driverActor
主要的逻辑都在driverActor 中
/** * A standalone scheduler backend, which waits for standalone executors to connect to it through * Akka. These may be executed in a variety of ways, such as Mesos tasks for the coarse-grained * Mesos mode or standalone processes for Spark's standalone deploy mode (spark.deploy.*). */ private[spark] class StandaloneSchedulerBackend(scheduler: ClusterScheduler, actorSystem: ActorSystem) extends SchedulerBackend with Logging { // Use an atomic variable to track total number of cores in the cluster for simplicity and speed var totalCoreCount = new AtomicInteger(0) class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor {
// ……后面分析 } var driverActor: ActorRef = null val taskIdsOnSlave = new HashMap[String, HashSet[String]] override def start() { val properties = new ArrayBuffer[(String, String)] val iterator = System.getProperties.entrySet.iterator while (iterator.hasNext) { val entry = iterator.next val (key, value) = (entry.getKey.toString, entry.getValue.toString) if (key.startsWith("spark.") && !key.equals("spark.hostPort")) { properties += ((key, value)) } } driverActor = actorSystem.actorOf( // 关键就是创建driverActor Props(new DriverActor(properties)), name = StandaloneSchedulerBackend.ACTOR_NAME) } private val timeout = Duration.create(System.getProperty("spark.akka.askTimeout", "10").toLong, "seconds") override def stop() { try { if (driverActor != null) { val future = driverActor.ask(StopDriver)(timeout) // 关闭driverActor Await.result(future, timeout) } } catch { case e: Exception => throw new SparkException("Error stopping standalone scheduler's driver actor", e) } } override def reviveOffers() { driverActor ! ReviveOffers // 发送ReviveOffers event给driverActor } override def defaultParallelism() = Option(System.getProperty("spark.default.parallelism")) .map(_.toInt).getOrElse(math.max(totalCoreCount.get(), 2)) // Called by subclasses when notified of a lost worker def removeExecutor(executorId: String, reason: String) { try { val future = driverActor.ask(RemoveExecutor(executorId, reason))(timeout) Await.result(future, timeout) } catch { case e: Exception => throw new SparkException("Error notifying standalone scheduler's driver actor", e) } } }
DriverActor
关键的函数, makeOffers, 在executors上launch tasks, 什么时候调用?
RegisterExecutor的时候,
Task StatusUpdate的时候,
收到ReviveOffers event的时候, 新的task被submit的时候, delay scheduling被触发的时候(per second)
关于delay scheduling, 应该是为了保持活度, 当没有任何状态变化时, 仍然需要继续保持launch tasks
class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor { private val executorActor = new HashMap[String, ActorRef] // track所有executorActor Ref private val executorAddress = new HashMap[String, Address] private val executorHost = new HashMap[String, String] private val freeCores = new HashMap[String, Int] private val actorToExecutorId = new HashMap[ActorRef, String] private val addressToExecutorId = new HashMap[Address, String] override def preStart() { // Listen for remote client disconnection events, since they don't go through Akka's watch() context.system.eventStream.subscribe(self, classOf[RemoteClientLifeCycleEvent]) // Periodically revive offers to allow delay scheduling to work val reviveInterval = System.getProperty("spark.scheduler.revive.interval", "1000").toLong context.system.scheduler.schedule(0.millis, reviveInterval.millis, self, ReviveOffers) } def receive = { case RegisterExecutor(executorId, hostPort, cores) => // 接收从StandaloneExecutorBackend发来的RegisterExecutor Utils.checkHostPort(hostPort, "Host port expected " + hostPort) if (executorActor.contains(executorId)) { sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId) } else { logInfo("Registered executor: " + sender + " with ID " + executorId) sender ! RegisteredExecutor(sparkProperties) context.watch(sender) // watch executor actor executorActor(executorId) = sender executorHost(executorId) = Utils.parseHostPort(hostPort)._1 freeCores(executorId) = cores executorAddress(executorId) = sender.path.address actorToExecutorId(sender) = executorId addressToExecutorId(sender.path.address) = executorId totalCoreCount.addAndGet(cores) makeOffers() } case StatusUpdate(executorId, taskId, state, data) => scheduler.statusUpdate(taskId, state, data.value) if (TaskState.isFinished(state)) { freeCores(executorId) += 1 makeOffers(executorId) } case ReviveOffers => // 接收从StandaloneSchedulerBackend发来的ReviveOffers makeOffers() case StopDriver => sender ! true context.stop(self) case RemoveExecutor(executorId, reason) => removeExecutor(executorId, reason) sender ! true case Terminated(actor) => actorToExecutorId.get(actor).foreach(removeExecutor(_, "Akka actor terminated")) case RemoteClientDisconnected(transport, address) => addressToExecutorId.get(address).foreach(removeExecutor(_, "remote Akka client disconnected")) case RemoteClientShutdown(transport, address) => addressToExecutorId.get(address).foreach(removeExecutor(_, "remote Akka client shutdown")) } // Make fake resource offers on all executors def makeOffers() { launchTasks(scheduler.resourceOffers( executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) } // Make fake resource offers on just one executor
// 可以看到这里传给scheduler.resourceOffers的WorkOffer,是根据之前已经分布好的executor静态生成的
// 而不是动态得到的workeroffer, 如果用mesos, 这里应该是动态获取workeroffer, 然后传给scheduler.resourceOffers def makeOffers(executorId: String) { launchTasks(scheduler.resourceOffers( Seq(new WorkerOffer(executorId, executorHost(executorId), freeCores(executorId))))) } // Launch tasks returned by a set of resource offers def launchTasks(tasks: Seq[Seq[TaskDescription]]) { for (task <- tasks.flatten) { freeCores(task.executorId) -= 1 executorActor(task.executorId) ! LaunchTask(task) // launch就是给executorActor发送LaunchTask event } } // Remove a disconnected slave from the cluster def removeExecutor(executorId: String, reason: String) { if (executorActor.contains(executorId)) { logInfo("Executor " + executorId + " disconnected, so removing it") val numCores = freeCores(executorId) actorToExecutorId -= executorActor(executorId) addressToExecutorId -= executorAddress(executorId) executorActor -= executorId executorHost -= executorId freeCores -= executorId totalCoreCount.addAndGet(-numCores) scheduler.executorLost(executorId, SlaveLost(reason)) } } }
SparkDeploySchedulerBackend
关键就是创建和管理Driver and Client Actor
private[spark] class SparkDeploySchedulerBackend( scheduler: ClusterScheduler, sc: SparkContext, master: String, appName: String) extends StandaloneSchedulerBackend(scheduler, sc.env.actorSystem) with ClientListener with Logging { var client: Client = null
override def start() { super.start() // 调用StandaloneSchedulerBackend的start,创建DriverActor // The endpoint for executors to talk to us val driverUrl = "akka://spark@%s:%s/user/%s".format( System.getProperty("spark.driver.host"), System.getProperty("spark.driver.port"), StandaloneSchedulerBackend.ACTOR_NAME) val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}") val command = Command( // 生成worker中ExecutorRunner中执行的command, 其实就是运行StandaloneExecutorBackend "org.apache.spark.executor.StandaloneExecutorBackend", args, sc.executorEnvs) val sparkHome = sc.getSparkHome().getOrElse(null) val appDesc = new ApplicationDescription(appName, maxCores, executorMemory, command, sparkHome, "http://" + sc.ui.appUIAddress) // 生成application description client = new Client(sc.env.actorSystem, master, appDesc, this) // 创建Client Actor, 并start client.start() } override def stop() { stopping = true super.stop() client.stop() if (shutdownCallback != null) { shutdownCallback(this) } } }