hadoop03-yarn

Yarn的基本思想是将资源管理和作业调度、监视功能划分为单独的守护进程。其思想是拥有一个全局ResourceManager（RM）以及每个应用程序拥有一个ApplicationMaster(AM)。应用程序可以是单个作业，也可以是一组作业。

一个ResourceManager和多个NodeManager构成了YARN资源管理框架。他们是YARN启动后长期运行的守护进程，来提供核心服务。

ResourceManager：是在系统中的所有应用程序之间仲裁资源的最终权威，即管理整个集群上的所有资源分配，内部含有一个Scheduler（资源调度器）

NodeManager:是每台机器的资源管理器，也就是单个节点的管理者，负责启动和监视容器（container）资源使用情况，并向ResourceManager及其Scheduler报告使用情况。

container:即集群上的可使用资源，包括cpu,内存，磁盘，网络等

ApplicationMaster(简称AM)：实际上是框架的特定的库，每启动一个应用程序，都会启动一个AM,它的任务是与ResourceManager协商资源，并与NodeManager一起执行和监视任务

yarn提交任务到任务执行的流程图

1.YARN的配置

Yarn是hadoop自带的不需要格外安装，只需要在配置文件中开启配置。

mapred-site.xml

        <!-- 指定MapReduce作业执行时，使用YARN进行资源调度 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.env</name>
                <value>HADOOP_MAPRED_HOME=/data/tools/hadoop</value>
        </property>
        <property>
                <name>mapreduce.map.env</name>
                <value>HADOOP_MAPRED_HOME=/data/tools/hadoop</value>
        </property>
        <property>
                <name>mapreduce.reduce.env</name>
                <value>HADOOP_MAPRED_HOME=/data/tools/hadoop</value>
        </property>

yarn-site.xml

        <!-- 设置ResourceManager -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <!--配置yarn的shuffle服务 -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

hadoop-env.sh

export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

分发给另外两个节点

scp mapred-site.xml yarn-site.xml  hadoop-env.sh hadoop02:$PWD
scp mapred-site.xml yarn-site.xml  hadoop-env.sh hadoop03:$PWD

在namenode开启yarn服务

start-yarn.sh

jps可以看到namenode节点起了ResourceManager，datanode起了NodeManager。

启动完成后可以访问http://10.12.20.15:8088/ 是yarn的web ui

执行统计词频

hadoop jar /data/tools/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar wordcount /input /output1

可以看到提交的任务与其状态

2.开启历史日志与日志聚合

yarn的日志位置在$HADOOP_HOME/logs/userlogs下，在yarn重启之前这里面的日志就会清空，如果需要查看历史日志需要开启历史日志

日志聚合是指将各个节点的日志都收集在一起

mapred-site.xml 文件

vim mapred-site.xml 
        <!-- 历史任务的内部通讯地址-->
        <property>
                <name>MapReduce.jobhistory.address</name>
                <value>hadoop01:10020</value>
        </property>
        <!-- 历史任务的外部监听页面-->
        <property>
                <name>MapReduce.jobhistory.webapp.address</name>
                <value>hadoop01:19888</value>
        </property>

yarn-site.xml 文件

vim yarn-site.xml 
    <!-- 是否需要开启日志聚合-->
        <!-- 开启日志聚合后，将会将各个Container的日志保存在yarn.nodemanager.remote-app-log-dir的位置-->
        <!-- 默认保存在/tmp/logs-->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- 历史日志在HDFS保存的时间，单位是秒-->
        <!-- 默认的是-1，表示永久报错-->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
        <property>
                <name>yarn.log.server.url</name>
                <value>http://hadoop01:19888/jobhistory/logs</value>
        </property>

将配置文件分发给其他服务器

scp mapred-site.xml  yarn-site.xml hadoop02:$PWD
scp mapred-site.xml  yarn-site.xml hadoop03:$PWD

重启yarn

stop-yarn.sh 
start-yarn.sh

开启历史日志服务

mapred --daemon start historyserver

http://10.12.20.15:19888是历史提交任务的页面

查看任务

yarn application -list

删除任务

yarn application -kill application_1713753771652_0001

3.yarn的常用命令

查看yarn任务占用的资源情况

yarn top

查看应用

yarn application

　　-list　

#通过任务的状态，列举yarn的任务。使用 -appStates指定状态
#任务状态：ALL NEW NEW_SAVING SUBMITTED ACCEPTED RUNNING FINISHED FAILED KILLED

#e.g.
#查看所有正在运行的任务
yarn application -list -appStates RUNNING
#查看所有失败的任务
yarn application -list -appSstates FAILED

　　 -movetoqueue

#将一个任务移动到指定的队列中
yarn application -movetoqueue  application_1713753771652_0008 -queue root.small

　　-kill

#杀死指定任务
yarn application -kill application_1713753771652_0007

查看容器，其中的ID是attempt id而不是application id

yarn container -list attempt_1713776860193_0002_000001

查看节点

yarn node -all -list

4.yarn的调度器

yarn的默认调度器是容量调度器，可以设置多个队列，分配不同的容量，在执行任务时指定哪个队列执行。容量调度器默认只有一个队列default

还有一个调度器为公平调度器，这个调度器在执行多个任务时，每个任务平分这些资源，在小任务执行完成后释放资源给其他正在执行的任务

4.1给容量调度器添加队列

修改capacity-scheduler.xml文件添加small队列，用于执行小任务

<configuration>
 <!-- 不需要修改-->
 <!-- 容量调度器中最多容纳多少个job-->
  <property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>
      Maximum number of applications that can be pending and running.
    </description>
  </property>
   <!-- 不需要修改-->
   <!-- MRAppMaster进程所占的资源可以占用队列总资源的百分比，可以通过修改这个参数来限制队列中提交的job数量-->
  <property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
      Maximum percent of resources in the cluster which can be used to run 
      application masters i.e. controls number of concurrent running
      applications.
    </description>
  </property>
  <!-- 不需要修改-->
  <!-- 为job分配资源的时候，使用什么策略-->
  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
      The ResourceCalculator implementation to be used to compare 
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare 
      multi-dimensional resources such as Memory, CPU etc.
    </description>
  </property>
  <!-- 修改！！！-->
  <!-- 调度器中有什么队列，我们添加一个small队列-->
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,small</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>
  <!-- 修改！！！-->
  <!-- 配置default队列的占集群资源的百分比 -->
  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>70</value>
    <description>Default queue target capacity.</description>
  </property>
  <!-- 新增！！！-->
  <!-- 新增small队列的占用集群资源的百分比 -->
  <!-- 所有的队列容量百分比之和需要是100-->
  <property>
    <name>yarn.scheduler.capacity.root.samall.capacity</name>
    <value>30</value>
    <description>Small queue target capacity.</description>
  </property>
  <!-- 不需要修改-->
  <!-- default队列用户使用百分量最大百分比-->
  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  <!-- 新增-->
  <!-- 新增small队列用户能使用百分量最大百分比-->
  <property>
    <name>yarn.scheduler.capacity.root.small.user-limit-factor</name>
    <value>1</value>
    <description>
      small queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>
  <!-- 不要修改-->
  <!-- default队列能使用的容量最大百分比-->
  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>
  <!-- 新增！！！-->
  <!-- small队列能使用的容量最大百分比-->
  <property>
    <name>yarn.scheduler.capacity.root.small.maximum-capacity</name>
    <value>100</value>
    <description>
      The maximum capacity of the default queue.
    </description>
  </property>
  <!-- 不需要修改-->
  <!-- default队列的状态-->
  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  <!-- 新增！！！-->
  <!-- small队列的状态-->
  <property>
    <name>yarn.scheduler.capacity.root.small.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>
  <!-- 不需要修改-->
  <!-- 限制向default队列提交的用户-->
  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
  <!-- 添加-->
  <!-- 限制向small队列提交的用户-->
  <property>
    <name>yarn.scheduler.capacity.root.small.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>
  <!-- 不需要修改-->
  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>
  <!-- 添加！！！-->
  <property>
    <name>yarn.scheduler.capacity.root.small.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
     </description>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.default-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
     </description>
   </property>

  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.workflow-priority-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to override application priority.
      The syntax for this list is
      [workflowId]:[full_queue_name]:[priority][,next mapping]*
      where an application submitted (or mapped to) queue "full_queue_name"
      and workflowId "workflowId" (as specified in application submission
      context) will be given priority "priority".
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.workflow-priority-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a priority mapping is present, will it override the value specified
      by the user? This can be used by administrators to give applications a
      priority that is different than the one specified by the user.
      The default is false.
    </description>
  </property>

</configuration>

分发给另外两台服务器

scp capacity-scheduler.xml hadoop02:$PWD
scp capacity-scheduler.xml hadoop03:$PWD

重启yarn

stop-yarn.sh 
start-yarn.sh

指定队列执行任务，-Dmapreduce.job.queuename=small

hadoop jar /data/tools/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi  -Dmapreduce.job.queuename=small 3 3

也可以在mapred-site.xml 直接配置默认的提交队列

vim mapred-site.xml  
   <!--配置默认的提交队列 -->
        <property>
                <name>MapReduce.job.queuename</name>
                <value>default</value>
        </property>

设置完后不用重启，直接提交任务就可以拉

5.Node Label 节点标签

可以给节点配置节点标签，指定任务运行在哪个节点上

修改yarn-site.xml

vim yarn-site.xml
        <!-- 启用节点标签-->
        <property>
                <name>yarn.node-labels.enabled</name>
                <value>true</value>
        </property>
        <!-- 节点标签存储的路径，可以是HDFS,也可以是本地文件系统-->
        <!-- 如果是本地文件系统，使用类似file:///home/yarn/node-label这样的路径-->
        <!-- 无论是hdfs,还是本地文件系统，需要保证RM有权限去访问-->
        <property>
                <name>yarn.node-labels.fs-store.root-dir</name>
                <value>hdfs://hadoop01:9820/tmp/yarn/node-labels</value>
        </property>
        <!-- 保持默认即可，也可以不配置这个选项-->
        <property>
                <name>yarn.node-labels.configuration-type</name>
                <value>centralized</value>
        </property>

分发到其他节点

scp yarn-site.xml hadoop02:$PWD
scp yarn-site.xml hadoop03:$PWD

重启yarn生效

stop-yarn.sh 
start-yarn.sh

使用命令的方式给yarn集群添加标签

#添加集群标签
yarn rmadmin -addToClusterNodeLabels "az1(exclusive=true)"
#删除集群标签
 yarn rmadmin -removeFromClusterNodeLabels az1,az2
#查看yarn集群已添加的标签
yarn cluster --list-node-labels

exclusive表示是否独占，当该标签分区下的节点有空闲资源时，是否可以共享给default分区的任务使用。
true表示独占，即不共享，资源仅分配给具有该标签的任务使用。
false表示非独占，即可以共享资源给default标签使用。

Namenode节点关联标签

一个节点只能属于一个分区，这样一个集群被分割为几个不相交的子集群。默认，所有计算节点的分区标签为：DEFAULT。单独一个队列需要使用每个分区的多少资源，需要提前配置好。应用程序只能使用包含该应用程序的对应分区。

节点分区有2种:

独占型：容器（container，yarn中的角色）将被分配到相应标签的节点上。比如，请求分区为x，则会分配到x节点上；请分区为DEFAULT，则会分配到DEFAULT节点上。

非独占型：如果一个分区是非独占，会将空闲资源共享给请求DEFAULT分区的容器

1）使用命令方式节点关联标签

# 节点关联标签，hostname拼接写法,节点间空格分割
#和yarn界面的Nodes显示的hostname保持一致,否则关联无效
yarn rmadmin -replaceLabelsOnNode "kubernetes-dev-worker-6=az1"

2）使用配置方式节点关联标签

哪个datanode需要配哪个标签，就在那个节点上修改配置

vim yarn-site.xml
<property>
  <name>yarn.node-labels.configuration-type</name>
  <value>distributed</value>
</property>

<property>
  <name>yarn.nodemanager.node-labels.provider</name>
  <value>config</value>
</property>

<property>
  <name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
  <value>az1</value>
</property>

可以在web ui上看到节点已经按标签进行了划分

配置，给队列分标签

vim capacity-scheduler.xml

<!-- 为队列指定标签，*表示所有标签，空格表示只能访问没有标签的节点，所有队列都可以访问没有标签的节点，不必指定。不指定此字段的队列，将从其父级继承。-->
<!-- root 队列 -->
    <property>
        <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
        <value>*</value>
        <description>root队列应用可用的节点标签</description>
    </property> 
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>100</value>
        <description>root队列对default标签节点可用的百分比</description>
    </property> -->
    <property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.az1.capacity</name>
  <value>100</value>
  <description>root队列对az1标签节点最大的可用百分比</description>
</property>
 <!-- small队列 -->
 <property>
    <name>yarn.scheduler.capacity.root.small.accessible-node-labels</name>
    <value>az1</value>
    <description>small队列应用可用的节点标签</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.small.default-node-label-expression</name>
    <value>az1</value>
    <description>small列应用默认节点标签</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.small.accessible-node-labels.az1.capacity</name>
    <value>100</value>
    <description>small队列对az1标签节点可用的百分比</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.small.accessible-node-labels.az1.maximum-capacity</name>
    <value>100</value>
    <description>small队列对az1标签节点最大的百分比</description>
</property>

执行hadoop jar /data/tools/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi 2 2

指定队列及标签执行任务（卡住了，未能验证）

hadoop jar /data/tools/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0.jar pi -Dmapreduce.job.queueName=default -Dmapreduce.job.node-label-expression=az1 2 2