代码改变世界

HDFS YARN

2021-11-08 18:05  DataBases  阅读(65)  评论(0编辑  收藏  举报

Hive开窗函数整理 

https://www.cnblogs.com/zz-ksw/p/12917693.html

Hadoop基础-HDFS的API常见操作

https://www.cnblogs.com/yinzhengjie/p/9906192.html

Yarn 的三种资源调度器详解

https://www.cnblogs.com/zz-ksw/p/12895909.html

https://winyter.github.io/MyBlog/2020/05/23/yarn-fair-scheduler-guide/

yarn能并行运行任务总数

https://cloud.tencent.com/developer/article/1534332

arn为了很方便控制在运行的任务数,也即是处于running状态任务的数目,提供了一个重要的参数配置

<property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>      Maximum percent of resources in the cluster which can be used to run       application masters i.e. controls number of concurrent running      applications.    </description>
  </property>

配置文件是:hadoop-2.7.4/etc/hadoop/capacity-scheduler.xml

参数含义很明显就是所有AM占用的总内存数要小于yarn所管理总内存的一定比例,默认是0.1。

也即是yarn所能同时运行的任务数受限于该参数和单个AM的内存。

影响yarn能同时运行的任务的个数的因素有两个:

1.yarn的内存调度最小单元

2.hadoop-2.7.4/etc/hadoop/capacity-scheduler.xml中的参数

 

<property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>      Maximum percent of resources in the cluster which can be used to run       application masters i.e. controls number of concurrent running      applications.    </description>
  </property>

 

所以需要将调度内存调到默认值1GB,其实一般情况下没必要调整,然后将AM总内存占比提高,比如1,即可。

如下配置可以使任务使用实际指定的内存,cpu资源执行任务。

$HADOOP_HOME/etc/hadoop/capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

=========================================================================================

spark-submit

spark-submit --master yarn --driver-memory 1G --driver-cores 1 --executor-memory 1G --executor-cores 1 --num-executors 3 --py-files dacomponent.zip taskflow_22_5457.py