Spark 源码解读(二)SparkContext的初始化之创建并初始化SparkUI

Spark 源码解读(二)SparkContext的初始化之创建并初始化SparkUI

SparkUI的创建与初始化

SparkContext创建SparkUI

_ui =
      if (conf.getBoolean("spark.ui.enabled", true)) {
        Some(SparkUI.createLiveUI(this, _conf, listenerBus, _jobProgressListener,
          _env.securityManager, appName, startTime = startTime))
      } else {
        // For tests, do not enable the UI
        None
      }
    // Bind the UI before starting the task scheduler to communicate
    // the bound port to the cluster manager properly
    //sparkUI绑定端口
    _ui.foreach(_.bind())

可以看到,如果不需要提供SparkUI服务,可以将属性Spark.ui.enabled修改为false.
SparkUI的创建

def createLiveUI(
      sc: SparkContext,
      conf: SparkConf,
      listenerBus: SparkListenerBus,
      jobProgressListener: JobProgressListener,
      securityManager: SecurityManager,
      appName: String,
      startTime: Long): SparkUI = {
    create(Some(sc), conf, listenerBus, securityManager, appName,
      jobProgressListener = Some(jobProgressListener), startTime = startTime)
  }

可以看到SparkUI createLiveUI方法又调用了create方法,create方法代码如下:

private def create(
      sc: Option[SparkContext],
      conf: SparkConf,
      listenerBus: SparkListenerBus,
      securityManager: SecurityManager,
      appName: String,
      basePath: String = "",
      jobProgressListener: Option[JobProgressListener] = None,
      startTime: Long): SparkUI = {

    val _jobProgressListener: JobProgressListener = jobProgressListener.getOrElse {
      val listener = new JobProgressListener(conf)
      listenerBus.addListener(listener)
      listener
    }

    val environmentListener = new EnvironmentListener
    val storageStatusListener = new StorageStatusListener(conf)
    val executorsListener = new ExecutorsListener(storageStatusListener, conf)
    val storageListener = new StorageListener(storageStatusListener)
    val operationGraphListener = new RDDOperationGraphListener(conf)

    listenerBus.addListener(environmentListener)
    listenerBus.addListener(storageStatusListener)
    listenerBus.addListener(executorsListener)
    listenerBus.addListener(storageListener)
    listenerBus.addListener(operationGraphListener)

    new SparkUI(sc, conf, securityManager, environmentListener, storageStatusListener,
      executorsListener, _jobProgressListener, storageListener, operationGraphListener,
      appName, basePath, startTime)
  }
}

可以看到create方法里除了JobProgressListener是外部传入的之外,又增加了一些SparkListener,例如用于对JVM参数、Spark属性、Java系统属性、classpath等进行监控的EnvironmentListener;用于维护Executor的存储状态的StorageStatusListener;用于准备将Executor的信息展示在ExecutorsTab的ExecutorsListener;用于准备将Executor相关存储信息展示在BlockManagerUI的StorageListener;用于构建RDD的DAG(有向无关图)的RDDOperationGraphListener等。这5个SparkListener的实现添加到listenerBus的监听器列表中。最后创建SparkUI,SparkUI服务默认是可以kill掉的,可以通过设置spark.ui.killEnabled参数为false来修改。SparkUI的initialize方法会组织前端页面各个Tab和Page的展示及布局。代码如下:

def initialize() {
    val jobsTab = new JobsTab(this)
    attachTab(jobsTab)
    val stagesTab = new StagesTab(this)
    attachTab(stagesTab)
    attachTab(new StorageTab(this))
    attachTab(new EnvironmentTab(this))
    attachTab(new ExecutorsTab(this))
    attachHandler(createStaticHandler(SparkUI.STATIC_RESOURCE_DIR, "/static"))
    attachHandler(createRedirectHandler("/", "/jobs/", basePath = basePath))
    attachHandler(ApiRootResource.getServletHandler(this))
    // These should be POST only, but, the YARN AM proxy won't proxy POSTs
    attachHandler(createRedirectHandler(
      "/jobs/job/kill", "/jobs/", jobsTab.handleKillRequest, httpMethods = Set("GET", "POST")))
    attachHandler(createRedirectHandler(
      "/stages/stage/kill", "/stages/", stagesTab.handleKillRequest,
      httpMethods = Set("GET", "POST")))
  }
  initialize()

SparkUI页面的布局与展示

在上一节中,JobsTab标签绑定到SparkUI上之后,在JobsTab上绑定了AllJobsPage和JobPage类。AllJobsPage页面即访问SparkUI页面时列举出所有Job的那个页面,JobPage页面则是点击单个Job时跳转的页面。通过调用JobsTab从WebUITab继承的attachPage方法与JobsTab进行绑定。

private[ui] class JobsTab(parent: SparkUI) extends SparkUITab(parent, "jobs") {
  val sc = parent.sc
  val killEnabled = parent.killEnabled
  val jobProgresslistener = parent.jobProgressListener
  val executorListener = parent.executorsListener
  val operationGraphListener = parent.operationGraphListener

  def isFairScheduler: Boolean =
    jobProgresslistener.schedulingMode == Some(SchedulingMode.FAIR)

  def getSparkUser: String = parent.getSparkUser

  attachPage(new AllJobsPage(this))
  attachPage(new JobPage(this))

JobsTab创建之后,将被attachTab方法加入SparkUI的ArrayBuffer[WebUItab中],并且通过attachPage方法,给每一个page生成org.eclipse.jetty.servlet.ServletContextHandler,最后调用attachHandler方法将ServletContextHandler绑定到SparkUI,即加入到handlers:ArrayBuffer[ServletContextHandler]和样例类ServerInfo的rootHandler(ContextHandlerCollection)中

SparkUI之listenerBus

Spark定义了一个特质ListenerBus,可以接收事件并且将事件提交到对应事件的监听器。ListenerBus主要包括以下3个内容:
(1)事件阻塞队列:类型为LinkedBlockingQueue[SparkListenerEvent],固定大小10000。
(2)监听器数组:类型为ArrayBuffer[SparkListener],存放各类监听器SparkListener。
(3)事件匹配监听器的线程:此Thread不断拉取LinkedBlockingQueue中的事件,遍历监听器,调用监听器的方法。任何事件都会在LinkedBlockingQueue中存在一段时间,然后Thread处理了此事件后会将其清除。
(持续完善中。。。)

参考

《深入理解Spark核心思想与源码分析》 --耿嘉安
Spark UI界面原理:https://www.cnblogs.com/wuyida/p/6300237.html

posted @ 2020-06-30 22:36  这个小仙女真可爱  阅读(227)  评论(0编辑  收藏  举报