Spark在standalone中关于core的参数设置

Posted on 2022-02-18 15:54 Antel 阅读(500) 评论(0) 收藏举报

最近发现，在执行pyspark任务时，对pythonFunction的CPU使用率进行限制存在问题，究其根本，还是sparkConf的参数存在问题。

梳理了下spark启动参数中关于core的设置问题

执行spark-submit -h 得到spark启动参数的说明，截取部分关于core的说明

 Cluster deploy mode only:
  --driver-cores NUM          Number of cores used by the driver, only in cluster mode
                              (Default: 1).

 Spark standalone, Mesos and Kubernetes only:
  --total-executor-cores NUM  Total cores for all executors.

 Spark standalone, YARN and Kubernetes only:
  --executor-cores NUM        Number of cores used by each executor. (Default: 1 in
                              YARN and K8S modes, or all available cores on the worker
                              in standalone mode).

 Spark on YARN and Kubernetes only:
  --num-executors NUM         Number of executors to launch (Default: 2).
                              If dynamic allocation is enabled, the initial number of
                              executors will be at least NUM.

由于我们使用的是standalone模式，可以看到，原来并不是所有参数都起作用。

例如--num-executors只适用YARN和Kubernetes。

在standalone中，关于core的设置主要是由--total-executor-cores和--executor-cores进行控制。

--total-executor-cores：所有executor的core总数
--executor-cores：每个executor的core个数（standalone下，默认是worker上所有可用的core）

发现，这两个参数好像有点冲突，如果只设置--total-executor-cores而不设置--executor-cores是否会出现问题

针对这两个参数的使用，考虑了三种情况进行测试：

仅设置--total-executor-cores：仍会根据--executor-cores的默认值，使用所有空闲的core。并不会根据设置，使用最多--total-executor-cores个数的core
仅设置--executor-cores：会根据数值对core进行设置
同时设置--total-executor-cores和--executor-cores：standalone模式下，没有对executor的设置，则会根据(--total-executor-cores // --executor-cores)对executor的个数进行计算，得到executor个数。再根据worker情况，建立有--executor-cores个core的executor

有兴趣的可以在扒一扒源码看看。

我在测试时使用的是pyspark3.1.2.

刷新页面返回顶部

Antel

公告

Spark在standalone中关于core的参数设置