Spark 源码系列 - SparkSubmit - 提交到Yarn集群
目录
结论
spark-submit命令是SparkSubmit类提供的命令行功能,通过解析命令行参数判断运行模式,集群方案,额外参数等信息,此例中触发YarnClusterApplication的start方法。
Spark submit 提交集群
Spark提交集群命令 spark-submit --deploy-mode cluster --master yarn --executor-cores 2 --executor-memory 1g --num-executors 4 --class xxx yyy.jar 1000
该命令的实现类是 org.apache.spark.deploy.SparkSubmit
object SparkSubmit && 触发doSubmit
package org.apache.spark.deploy
// 入口类
object SparkSubmit extends CommandLineUtils with Logging {
// object 修饰的类不能new, 此处的SparkSubmit是后面定义的class SparkSubmit
val submit = new SparkSubmit() {
override def doSubmit(args: Array[String]): Unit = {
...
super.doSubmit(args)
...
}
}
submit.doSubmit(args)
...
}
class SparkSubmit -> doSubmit
// 该类和上类顺序与源码不同,只是方便笔记
private[spark] class SparkSubmit extends Logging {
def doSubmit(args: Array[String]): Unit = {
...
val appArgs = parseArguments(args)
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
...
}
@tailrec
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
def doRunMain(): Unit = {
...
runMain(args, uninitLog)
...
}
doRunMain()
}
}
class SparkSubmit -> runMain
private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
// 判断集群模式等
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
...
// 调用java的Class.forName加载<mainClass>类到内存
mainClass = Utils.classForName(childMainClass)
...
// 实例化<mainClass>类
val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
mainClass.getConstructor().newInstance().asInstanceOf[SparkApplication]
} else {
new JavaMainApplication(mainClass)
}
...
// 调用<mainClass>类的start方法,此例中为YarnClusterApplication的start方法
app.start(childArgs.toArray, sparkConf)
...
class SparkSubmit -> prepareSubmitEnvironment
private[deploy] val YARN_CLUSTER_SUBMIT_CLASS =
"org.apache.spark.deploy.yarn.YarnClusterApplication"
private[deploy] def prepareSubmitEnvironment(...) {
val clusterManager: Int = args.master match {
case "yarn" => YARN
...
}
...
// 创建临时目录 并下载"资源"文件到本地
val targetDir = Utils.createTempDir()
...
def downloadResource(resource: String): String = {
val file = new File(targetDir, new Path(uri).getName)
...
downloadFile(resource, targetDir, sparkConf, hadoopConf, secMgr)
...
}
...
downloadResource
...
// 指定cm(cluster manager)的全类名
if (isYarnCluster) {
childMainClass = YARN_CLUSTER_SUBMIT_CLASS
}
// 返回
(childArgs.toSeq, childClasspath.toSeq, sparkConf, childMainClass)
}
分类:
spark 源码
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· Manus爆火,是硬核还是营销?
· 一文读懂知识蒸馏
· 终于写完轮子一部分:tcp代理 了,记录一下