Spark 源码系列 - SparkSubmit - 提交到Yarn集群

结论

spark-submit命令是SparkSubmit类提供的命令行功能,通过解析命令行参数判断运行模式,集群方案,额外参数等信息,此例中触发YarnClusterApplication的start方法。

Spark submit 提交集群

Spark提交集群命令 spark-submit --deploy-mode cluster --master yarn --executor-cores 2 --executor-memory 1g --num-executors 4 --class xxx yyy.jar 1000
该命令的实现类是 org.apache.spark.deploy.SparkSubmit

object SparkSubmit && 触发doSubmit

package org.apache.spark.deploy

// 入口类
object SparkSubmit extends CommandLineUtils with Logging {
    // object 修饰的类不能new, 此处的SparkSubmit是后面定义的class SparkSubmit
    val submit = new SparkSubmit() {
      override def doSubmit(args: Array[String]): Unit = {
          ...
          super.doSubmit(args)
          ...
      }
    }

    submit.doSubmit(args)
    ...
}

class SparkSubmit -> doSubmit

// 该类和上类顺序与源码不同,只是方便笔记
private[spark] class SparkSubmit extends Logging {
  def doSubmit(args: Array[String]): Unit = {
    ...
    val appArgs = parseArguments(args)
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    ...
  }

  @tailrec
  private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    def doRunMain(): Unit = {
      ...
      runMain(args, uninitLog)
      ...
    }
    doRunMain()
  }
}

class SparkSubmit -> runMain

  private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    // 判断集群模式等
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
    ...
    // 调用java的Class.forName加载<mainClass>类到内存
    mainClass = Utils.classForName(childMainClass)
    ...
    // 实例化<mainClass>类
    val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
      mainClass.getConstructor().newInstance().asInstanceOf[SparkApplication]
    } else {
      new JavaMainApplication(mainClass)
    }
    ...
    // 调用<mainClass>类的start方法,此例中为YarnClusterApplication的start方法
    app.start(childArgs.toArray, sparkConf)
    ...

class SparkSubmit -> prepareSubmitEnvironment

  private[deploy] val YARN_CLUSTER_SUBMIT_CLASS =
    "org.apache.spark.deploy.yarn.YarnClusterApplication"

  private[deploy] def prepareSubmitEnvironment(...) {
    val clusterManager: Int = args.master match {
      case "yarn" => YARN
      ...
    }
    ...
    // 创建临时目录 并下载"资源"文件到本地
    val targetDir = Utils.createTempDir()
    ...
    def downloadResource(resource: String): String = {
      val file = new File(targetDir, new Path(uri).getName)
      ...
      downloadFile(resource, targetDir, sparkConf, hadoopConf, secMgr)
      ...
    }
    ...
    downloadResource
    ...
    // 指定cm(cluster manager)的全类名
    if (isYarnCluster) {
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
    }
    // 返回
    (childArgs.toSeq, childClasspath.toSeq, sparkConf, childMainClass)
  }
posted @   608088  阅读(290)  评论(0编辑  收藏  举报
(评论功能已被禁用)
相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 一文读懂知识蒸馏
· 终于写完轮子一部分:tcp代理 了,记录一下
点击右上角即可分享
微信分享提示