关闭页面特效

问题导读
1、yarn提交作业的流程是怎样的？
2、run方法在ApplicationMaster里面主要干了什么工作？
3、把作业发布到yarn上面去执行，涉及到哪些类？

本来不打算写的了，但是真的是闲来无事，整天看美剧也没啥意思。这一章打算讲一下Spark on yarn的实现，1.0.0里面已经是一个stable的版本了，可是1.0.1也出来了，离1.0.0发布才一个月的时间，更新太快了，节奏跟不上啊，这里仍旧是讲1.0.0的代码，所以各位朋友也不要再问我讲的是哪个版本，目前为止发布的文章都是基于1.0.0的代码。

1、提交作业

找到main函数，里面调用了run方法，我们直接看run方法。

    val appId = runApp()
    monitorApplication(appId)
    System.exit(0)

运行App，跟踪App，最后退出。我们先看runApp吧。

  def runApp(): ApplicationId = {
    // 校验参数，内存不能小于384Mb，Executor的数量不能少于1个。
    validateArgs()
    // 这两个是父类的方法，初始化并且启动Client
    init(yarnConf)
    start()

    // 记录集群的信息(e.g, NodeManagers的数量，队列的信息).
    logClusterResourceDetails()

    // 准备提交请求到ResourcManager (specifically its ApplicationsManager (ASM)// Get a new client application.
    val newApp = super.createApplication()
    val newAppResponse = newApp.getNewApplicationResponse()
    val appId = newAppResponse.getApplicationId()
    // 检查集群的内存是否满足当前的作业需求
    verifyClusterResources(newAppResponse)

    // 准备资源和环境变量.
    //1.获得工作目录的具体地址: /.sparkStaging/appId/
    val appStagingDir = getAppStagingDir(appId)
　　//2.创建工作目录，设置工作目录权限，上传运行时所需要的jar包
    val localResources = prepareLocalResources(appStagingDir)
    //3.设置运行时需要的环境变量
    val launchEnv = setupLaunchEnv(localResources, appStagingDir)
　　//4.设置运行时JVM参数，设置SPARK_USE_CONC_INCR_GC为true的话，就使用CMS的垃圾回收机制
    val amContainer = createContainerLaunchContext(newAppResponse, localResources, launchEnv)

    // 设置application submission context. 
    val appContext = newApp.getApplicationSubmissionContext()
    appContext.setApplicationName(args.appName)
    appContext.setQueue(args.amQueue)
    appContext.setAMContainerSpec(amContainer)
    appContext.setApplicationType("SPARK")

    // 设置ApplicationMaster的内存，Resource是表示资源的类，目前有CPU和内存两种.
    val memoryResource = Records.newRecord(classOf[Resource]).asInstanceOf[Resource]
    memoryResource.setMemory(args.amMemory + YarnAllocationHandler.MEMORY_OVERHEAD)
    appContext.setResource(memoryResource)

    // 提交Application.
    submitApp(appContext)
    appId
  }

monitorApplication就不说了，不停的调用getApplicationReport方法获得最新的Report，然后调用getYarnApplicationState获取当前状态，如果状态为FINISHED、FAILED、KILLED就退出。
说到这里，顺便把跟yarn相关的参数也贴出来一下，大家一看就清楚了。

    while (!args.isEmpty) {
      args match {
        case ("--jar") :: value :: tail =>
          userJar = value
          args = tail

        case ("--class") :: value :: tail =>
          userClass = value
          args = tail

        case ("--args" | "--arg") :: value :: tail =>
          if (args(0) == "--args") {
            println("--args is deprecated. Use --arg instead.")
          }
          userArgsBuffer += value
          args = tail

        case ("--master-class" | "--am-class") :: value :: tail =>
          if (args(0) == "--master-class") {
            println("--master-class is deprecated. Use --am-class instead.")
          }
          amClass = value
          args = tail

        case ("--master-memory" | "--driver-memory") :: MemoryParam(value) :: tail =>
          if (args(0) == "--master-memory") {
            println("--master-memory is deprecated. Use --driver-memory instead.")
          }
          amMemory = value
          args = tail

        case ("--num-workers" | "--num-executors") :: IntParam(value) :: tail =>
          if (args(0) == "--num-workers") {
            println("--num-workers is deprecated. Use --num-executors instead.")
          }
          numExecutors = value
          args = tail

        case ("--worker-memory" | "--executor-memory") :: MemoryParam(value) :: tail =>
          if (args(0) == "--worker-memory") {
            println("--worker-memory is deprecated. Use --executor-memory instead.")
          }
          executorMemory = value
          args = tail

        case ("--worker-cores" | "--executor-cores") :: IntParam(value) :: tail =>
          if (args(0) == "--worker-cores") {
            println("--worker-cores is deprecated. Use --executor-cores instead.")
          }
          executorCores = value
          args = tail

        case ("--queue") :: value :: tail =>
          amQueue = value
          args = tail

        case ("--name") :: value :: tail =>
          appName = value
          args = tail

        case ("--addJars") :: value :: tail =>
          addJars = value
          args = tail

        case ("--files") :: value :: tail =>
          files = value
          args = tail

        case ("--archives") :: value :: tail =>
          archives = value
          args = tail

        case Nil =>
          if (userClass == null) {
            printUsageAndExit(1)
          }

        case _ =>
          printUsageAndExit(1, args)
      }
    }

2、ApplicationMaster

直接看run方法就可以了，main函数就干了那么一件事...

  def run() {
    // 设置本地目录，默认是先使用yarn的YARN_LOCAL_DIRS目录，再到LOCAL_DIRS
    System.setProperty("spark.local.dir", getLocalDirs())

    // set the web ui port to be ephemeral for yarn so we don't conflict with
    // other spark processes running on the same box
    System.setProperty("spark.ui.port", "0")

    // when running the AM, the Spark master is always "yarn-cluster"
    System.setProperty("spark.master", "yarn-cluster")

 　　// 设置优先级为30，和mapreduce的优先级一样。它比HDFS的优先级高，因为它的操作是清理该作业在hdfs上面的Staging目录
    ShutdownHookManager.get().addShutdownHook(new AppMasterShutdownHook(this), 30)

    appAttemptId = getApplicationAttemptId()
　　// 通过yarn.resourcemanager.am.max-attempts来设置，默认是2
　　// 目前发现它只在清理Staging目录的时候用
    isLastAMRetry = appAttemptId.getAttemptId() >= maxAppAttempts
    amClient = AMRMClient.createAMRMClient()
    amClient.init(yarnConf)
    amClient.start()

    // setup AmIpFilter for the SparkUI - do this before we start the UI
　　//  方法的介绍说是yarn用来保护ui界面的，我感觉是设置ip代理的
    addAmIpFilter()
　　//  注册ApplicationMaster到内部的列表里
    ApplicationMaster.register(this)

    // 安全认证相关的东西，默认是不开启的，省得给自己找事
    val securityMgr = new SecurityManager(sparkConf)

    // 启动driver程序 
    userThread = startUserClass()

    // 等待SparkContext被实例化，主要是等待spark.driver.port property被使用
　　// 等待结束之后，实例化一个YarnAllocationHandler
    waitForSparkContextInitialized()

    // Do this after Spark master is up and SparkContext is created so that we can register UI Url.
　　// 向yarn注册当前的ApplicationMaster, 这个时候isFinished不能为true，是true就说明程序失败了
    synchronized {
      if (!isFinished) {
        registerApplicationMaster()
        registered = true
      }
    }

    // 申请Container来启动Executor
    allocateExecutors()

    // 等待程序运行结束
    userThread.join()

    System.exit(0)
  }

run方法里面主要干了5项工作：
1、初始化工作

2、启动driver程序

3、注册ApplicationMaster

4、分配Executors

5、等待程序运行结束

我们重点看分配Executor方法。

  private def allocateExecutors() {
    try {
      logInfo("Allocating " + args.numExecutors + " executors.")
      // 分host、rack、任意机器三种类型向ResourceManager提交ContainerRequest
　　　 // 请求的Container数量可能大于需要的数量
      yarnAllocator.addResourceRequests(args.numExecutors)
      // Exits the loop if the user thread exits.
      while (yarnAllocator.getNumExecutorsRunning < args.numExecutors && userThread.isAlive) {
        if (yarnAllocator.getNumExecutorsFailed >= maxNumExecutorFailures) {
          finishApplicationMaster(FinalApplicationStatus.FAILED, "max number of executor failures reached")
        }
　　　　 // 把请求回来的资源进行分配，并释放掉多余的资源
        yarnAllocator.allocateResources()
        ApplicationMaster.incrementAllocatorLoop(1)
        Thread.sleep(100)
      }
    } finally {
      // In case of exceptions, etc - ensure that count is at least ALLOCATOR_LOOP_WAIT_COUNT,
      // so that the loop in ApplicationMaster#sparkContextInitialized() breaks.
      ApplicationMaster.incrementAllocatorLoop(ApplicationMaster.ALLOCATOR_LOOP_WAIT_COUNT)
    }
    logInfo("All executors have launched.")

    // 启动一个线程来状态报告
    if (userThread.isAlive) {
      // Ensure that progress is sent before YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS elapses.
      val timeoutInterval = yarnConf.getInt(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS, 120000)

      // we want to be reasonably responsive without causing too many requests to RM.
      val schedulerInterval = sparkConf.getLong("spark.yarn.scheduler.heartbeat.interval-ms", 5000)

      // must be <= timeoutInterval / 2.
      val interval = math.min(timeoutInterval / 2, schedulerInterval)

      launchReporterThread(interval)
    }
  }

这里面我们只需要看addResourceRequests和allocateResources方法即可。

先说addResourceRequests方法，代码就不贴了。

Client向ResourceManager提交Container的请求，分三种类型：优先选择机器、同一个rack的机器、任意机器。

优先选择机器是在RDD里面的getPreferredLocations获得的机器位置，如果没有优先选择机器，也就没有同一个rack之说了，可以是任意机器。

下面我们接着看allocateResources方法。

  def allocateResources() {
    // We have already set the container request. Poll the ResourceManager for a response.
    // This doubles as a heartbeat if there are no pending container requests.
　　// 之前已经提交过Container请求了，现在只需要获取response即可 
    val progressIndicator = 0.1f
    val allocateResponse = amClient.allocate(progressIndicator)

    val allocatedContainers = allocateResponse.getAllocatedContainers()
    if (allocatedContainers.size > 0) {
      var numPendingAllocateNow = numPendingAllocate.addAndGet(-1 * allocatedContainers.size)

      if (numPendingAllocateNow < 0) {
        numPendingAllocateNow = numPendingAllocate.addAndGet(-1 * numPendingAllocateNow)
      }

      val hostToContainers = new HashMap[String, ArrayBuffer[Container]]()

      for (container <- allocatedContainers) {
　　　　 // 内存 > Executor所需内存 + 384
        if (isResourceConstraintSatisfied(container)) {
          // 把container收入名册当中，等待发落
          val host = container.getNodeId.getHost
          val containersForHost = hostToContainers.getOrElseUpdate(host, new ArrayBuffer[Container]())
          containersForHost += container
        } else {
          // 内存不够，释放掉它
          releaseContainer(container)
        }
      }

      // 找到合适的container来使用.
      val dataLocalContainers = new HashMap[String, ArrayBuffer[Container]]()
      val rackLocalContainers = new HashMap[String, ArrayBuffer[Container]]()
      val offRackContainers = new HashMap[String, ArrayBuffer[Container]]()
　　　 // 遍历所有的host
      for (candidateHost <- hostToContainers.keySet) {
        val maxExpectedHostCount = preferredHostToCount.getOrElse(candidateHost, 0)
        val requiredHostCount = maxExpectedHostCount - allocatedContainersOnHost(candidateHost)

        val remainingContainersOpt = hostToContainers.get(candidateHost)
        var remainingContainers = remainingContainersOpt.get
　　　　　　
        if (requiredHostCount >= remainingContainers.size) {
          // 需要的比现有的多，把符合数据本地性的添加到dataLocalContainers映射关系里
          dataLocalContainers.put(candidateHost, remainingContainers)
          // 没有containner剩下的.
          remainingContainers = null
        } else if (requiredHostCount > 0) {
          // 获得的container比所需要的多，把多余的释放掉
          val (dataLocal, remaining) = remainingContainers.splitAt(remainingContainers.size - requiredHostCount)
          dataLocalContainers.put(candidateHost, dataLocal)

          for (container <- remaining) releaseContainer(container)
          remainingContainers = null
        }

        // 数据所在机器已经分配满任务了，只能在同一个rack里面挑选了
        if (remainingContainers != null) {
          val rack = YarnAllocationHandler.lookupRack(conf, candidateHost)
          if (rack != null) {
            val maxExpectedRackCount = preferredRackToCount.getOrElse(rack, 0)
            val requiredRackCount = maxExpectedRackCount - allocatedContainersOnRack(rack) -
              rackLocalContainers.getOrElse(rack, List()).size

            if (requiredRackCount >= remainingContainers.size) {
              // Add all remaining containers to to `dataLocalContainers`.
              dataLocalContainers.put(rack, remainingContainers)
              remainingContainers = null
            } else if (requiredRackCount > 0) {
              // Container list has more containers that we need for data locality.
              val (rackLocal, remaining) = remainingContainers.splitAt(remainingContainers.size - requiredRackCount)
              val existingRackLocal = rackLocalContainers.getOrElseUpdate(rack, new ArrayBuffer[Container]())

              existingRackLocal ++= rackLocal
              remainingContainers = remaining
            }
          }
        }

        if (remainingContainers != null) {
          // 还是不够，只能放到别的rack的机器上运行了
          offRackContainers.put(candidateHost, remainingContainers)
        }
      }

      // 按照数据所在机器、同一个rack、任意机器来排序
      val allocatedContainersToProcess = new ArrayBuffer[Container](allocatedContainers.size)
      allocatedContainersToProcess ++= TaskSchedulerImpl.prioritizeContainers(dataLocalContainers)
      allocatedContainersToProcess ++= TaskSchedulerImpl.prioritizeContainers(rackLocalContainers)
      allocatedContainersToProcess ++= TaskSchedulerImpl.prioritizeContainers(offRackContainers)

      // 遍历选择了的Container，为每个Container启动一个ExecutorRunnable线程专门负责给它发送命令
      for (container <- allocatedContainersToProcess) {
        val numExecutorsRunningNow = numExecutorsRunning.incrementAndGet()
        val executorHostname = container.getNodeId.getHost
        val containerId = container.getId
　　　　　// 内存需要大于Executor的内存 + 384
        val executorMemoryOverhead = (executorMemory + YarnAllocationHandler.MEMORY_OVERHEAD)

        if (numExecutorsRunningNow > maxExecutors) {
          // 正在运行的比需要的多了，释放掉多余的Container
          releaseContainer(container)
          numExecutorsRunning.decrementAndGet()
        } else {
          val executorId = executorIdCounter.incrementAndGet().toString
          val driverUrl = "akka.tcp://spark@%s:%s/user/%s".format(
            sparkConf.get("spark.driver.host"),
            sparkConf.get("spark.driver.port"),
            CoarseGrainedSchedulerBackend.ACTOR_NAME)


          // To be safe, remove the container from `pendingReleaseContainers`.
          pendingReleaseContainers.remove(containerId)
         // 把container记录到已分配的rack的映射关系当中
          val rack = YarnAllocationHandler.lookupRack(conf, executorHostname)
          allocatedHostToContainersMap.synchronized {
            val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
              new HashSet[ContainerId]())

            containerSet += containerId
            allocatedContainerToHostMap.put(containerId, executorHostname)

            if (rack != null) {
              allocatedRackCount.put(rack, allocatedRackCount.getOrElse(rack, 0) + 1)
            }
          }
　　　　　　// 启动一个线程给它进行跟踪服务，给它发送运行Executor的命令
          val executorRunnable = new ExecutorRunnable(
            container,
            conf,
            sparkConf,
            driverUrl,
            executorId,
            executorHostname,
            executorMemory,
            executorCores)
          new Thread(executorRunnable).start()
        }
      }
      
  }

总结：
把作业发布到yarn上面去执行这块涉及到的类不多，主要是涉及到Client、ApplicationMaster、YarnAllocationHandler、ExecutorRunner这四个类。

1、Client作为Yarn的客户端，负责向Yarn发送启动ApplicationMaster的命令。

2、ApplicationMaster就像项目经理一样负责整个项目所需要的工作，包括请求资源，分配资源，启动Driver和Executor，Executor启动失败的错误处理。

3、ApplicationMaster的请求、分配资源是通过YarnAllocationHandler来进行的。

4、Container选择的顺序是：优先选择机器 > 同一个rack的机器 > 任意机器。

5、ExecutorRunner只负责向Container发送启动CoarseGrainedExecutorBackend的命令。

6、Executor的错误处理是在ApplicationMaster的launchReporterThread方法里面，它启动的线程除了报告运行状态，还会监控Executor的运行，一旦发现有丢失的Executor就重新请求。

7、在yarn目录下看到的名称里面带有YarnClient的是属于yarn-client模式的类，实现和前面的也差不多。

posted on 2020-05-27 15:16 大码王阅读(809) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

青青陵上柏，磊磊涧R9-

运行时长：2258天0小时55分51秒

您的浏览器不兼容canvas

昵称：大码王
园龄： 5年8个月
粉丝： 233
关注： 30

+加关注

2025年3月

日

一

二

三

四

五

六

随笔分类 (719)

clickhouse(4)

flink源码分析(2)

Groovy(1)

Java(34)

Linux(3)

office(10)

OpenStack入门(1)

Phoenix+hbase(11)

photoshop(10)

python之绘图(7)

python之爬虫(15)

python之入门到实战(26)

shell大全(1)

SparkCore(14)

sparkGraphx(2)

sparksql(8)

sparkstreaming(17)

spark源码分析(11)

博客园美化(6)

操作系统(1)

随笔档案 (693)

2024年5月(4)

2024年3月(3)

2023年9月(1)

2023年4月(2)

2023年3月(4)

2023年2月(1)

2022年12月(1)

2022年11月(1)

2022年9月(2)

2022年8月(17)

2022年7月(5)

2022年5月(3)

2022年4月(18)

2021年9月(1)

2021年6月(9)

2021年5月(19)

2021年2月(1)

2021年1月(17)

2020年12月(7)

2020年11月(19)

文章分类 (35)

airflow(4)

azkban(1)

canal(1)

Cassandra(1)

datax(1)

druid(1)

Elasticsearch(8)

java(11)

mongodb(2)

redis(3)

scala(2)

文章档案 (40)

2024年4月(2)

2023年5月(2)

2023年4月(1)

2023年1月(1)

2020年6月(9)

2020年5月(25)

1、提交作业

2、ApplicationMaster

公告

搜索

常用链接

最新随笔

积分与排名

随笔分类 (719)

随笔档案 (693)

文章分类 (35)

文章档案 (40)

阅读排行榜

评论排行榜

推荐排行榜

最新评论

喜欢请打赏