spark-submit常用参数

yarn模式默认启动2个executor，无论你有多少的worker节点
standalone模式每个worker一个executor，无法修改executor的数量

partition是RDD中的一个dataset，一般默认都是2个
executor中的task数量由partition数(最后一个stage的partition数)决定

Options:

--master MASTER_URL 选择运行模式，spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE 将driver运行在本地(client)或其他worker节点上(cluster) (Default: client).
--class CLASS_NAME 程序主类名
--name NAME 应用名
--jars JARS driver和executor都需要的包，多个包之间用逗号(,)分割
--properties-file FILE 读取的环境变量文件位置，默认读取的位置为conf/spark-defaults.conf
--driver-memory MEM driver使用的内存(e.g. 1000M, 2G) (Default: 512M).
--driver-class-path driver所依赖的包，多个包之间用冒号(:)分割

--executor-memory MEM 每个executor使用的内存 (e.g. 1000M, 2G) (Default: 1G).

Spark standalone with cluster deploy mode only:
--driver-cores NUM diver使用的核心数(Default: 1).
--supervise 重启失败的driver
--kill SUBMISSION_ID 删掉指定的driver

--status SUBMISSION_ID 返回指定的driver状态

Spark standalone and Mesos only:

--total-executor-cores NUM 所有executors使用的核心总数

YARN-only:
--driver-cores NUM diver使用的核心数(只用于cluster),(Default: 1)
--executor-cores NUM 每个executor使用的核心数 (Default: 1).
--queue QUEUE_NAME 提交到yarn上的队列名 (Default: "default").
--num-executors NUM 启动的executor的数量 (Default: 2).

posted on 2015-05-11 15:01 毛小娃阅读(878) 评论(0) 编辑收藏举报

刷新页面返回顶部

毛小娃

spark-submit常用参数

导航

公告