记录一次hadoop2.8.4版本RM接入zk ha问题

背景:

公司将线上hadoop RM接入ZK 实现高可用 但ZK Znode 默认存储1M,当存储数据量大时候可能导致线上业务的崩溃

处理方案如下:

1,修改ZK配置 增加默认存储上限

2,修改RM数据存储在zk中的路径结构 使结构拆分能支撑更大的数据

 

问题一 修改ZK配置 增加默认存储上限

主要为修改配置参数 

在zk各节点上修改配置 (修改为10M大小)

vi zkServer.sh

新增配置到图中位置  ZOO_USER_CFG="-Djute.maxbuffer=10240000" 

 

 

 

 

 

修改zkCli.sh  (不修改 客户端命令行 将不能取得超出1M的数据)

 

 即使如此 当我们代码客户端也不能取得超出大小的数据 需要添加环境变量 如下

System.setProperty("jute.maxbuffer",String.valueOf(10240000));
同样的yarn的配置也要修改 不然也是白搭
yarn-env.sh
新增一行
YARN_RESOURCEMANAGER_OPTS="$YARN_RESOURCEMANAGER_OPTS -Djute.maxbuffer=10240000"

 




问题2 优化zk中存储结构

yarn 在zk中的存储如下
ROOT_DIR_PATH
      |--- VERSION_INFO
      |--- EPOCH_NODE
      |--- RM_ZK_FENCING_LOCK
      |--- RM_APP_ROOT
      |     |----- (#ApplicationId1)
      |     |        |----- (#ApplicationAttemptIds)
      |     |
      |     |----- (#ApplicationId2)
      |     |       |----- (#ApplicationAttemptIds)
      |     ....
      |
      |--- RM_DT_SECRET_MANAGER_ROOT
      |----- RM_DT_SEQUENTIAL_NUMBER_ZNODE_NAME
      |----- RM_DELEGATION_TOKENS_ROOT_ZNODE_NAME
      |       |----- Token_1
      |       |----- Token_2
      |       ....
      |
      |----- RM_DT_MASTER_KEYS_ROOT_ZNODE_NAME
      |      |----- Key_1
      |      |----- Key_2
      ....
      |--- AMRMTOKEN_SECRET_MANAGER_ROOT
      |----- currentMasterKey
      |----- nextMasterKey

更新为:

 * The znode structure is as follows:
 * ROOT_DIR_PATH
 * |--- VERSION_INFO
 * |--- EPOCH_NODE
 * |--- RM_ZK_FENCING_LOCK
 * |--- RM_APP_ROOT
 * |     |----- HIERARCHIES
 * |     |        |----- 1
 * |     |        |      |----- (#ApplicationId barring last character)
 * |     |        |      |       |----- (#Last character of ApplicationId)
 * |     |        |      |       |       |----- (#ApplicationAttemptIds)
 * |     |        |      ....
 * |     |        |
 * |     |        |----- 2
 * |     |        |      |----- (#ApplicationId barring last 2 characters)
 * |     |        |      |       |----- (#Last 2 characters of ApplicationId)
 * |     |        |      |       |       |----- (#ApplicationAttemptIds)
 * |     |        |      ....
 * |     |        |
 * |     |        |----- 3
 * |     |        |      |----- (#ApplicationId barring last 3 characters)
 * |     |        |      |       |----- (#Last 3 characters of ApplicationId)
 * |     |        |      |       |       |----- (#ApplicationAttemptIds)
 * |     |        |      ....
 * |     |        |
 * |     |        |----- 4
 * |     |        |      |----- (#ApplicationId barring last 4 characters)
 * |     |        |      |       |----- (#Last 4 characters of ApplicationId)
 * |     |        |      |       |       |----- (#ApplicationAttemptIds)
 * |     |        |      ....
 * |     |        |
 * |     |----- (#ApplicationId1)
 * |     |        |----- (#ApplicationAttemptIds)
 * |     |
 * |     |----- (#ApplicationId2)
 * |     |       |----- (#ApplicationAttemptIds)
 * |     ....
 * |
 * |--- RM_DT_SECRET_MANAGER_ROOT
 *        |----- RM_DT_SEQUENTIAL_NUMBER_ZNODE_NAME
 *        |----- RM_DELEGATION_TOKENS_ROOT_ZNODE_NAME
 *        |       |----- 1
 *        |       |      |----- (#TokenId barring last character)
 *        |       |      |       |----- (#Last character of TokenId)
 *        |       |      ....
 *        |       |----- 2
 *        |       |      |----- (#TokenId barring last 2 characters)
 *        |       |      |       |----- (#Last 2 characters of TokenId)
 *        |       |      ....
 *        |       |----- 3
 *        |       |      |----- (#TokenId barring last 3 characters)
 *        |       |      |       |----- (#Last 3 characters of TokenId)
 *        |       |      ....
 *        |       |----- 4
 *        |       |      |----- (#TokenId barring last 4 characters)
 *        |       |      |       |----- (#Last 4 characters of TokenId)
 *        |       |      ....
 *        |       |----- Token_1
 *        |       |----- Token_2
 *        |       ....
 *        |
 *        |----- RM_DT_MASTER_KEYS_ROOT_ZNODE_NAME
 *        |      |----- Key_1
 *        |      |----- Key_2
 *                ....
 * |--- AMRMTOKEN_SECRET_MANAGER_ROOT
 *        |----- currentMasterKey
 *        |----- nextMasterKey
 *
 * |-- RESERVATION_SYSTEM_ROOT
 *        |------PLAN_1
 *        |      |------ RESERVATION_1
 *        |      |------ RESERVATION_2
 *        |      ....
 *        |------PLAN_2
 *        ....

yarn-siting.xml文件新增一个配置项

<property>

    <description>Index at which last section of application id (with each section
      separated by _ in application id) will be split so that application znode
      stored in zookeeper RM state store will be stored as two different znodes
      (parent-child). Split is done from the end.
      For instance, with no split, appid znode will be of the form
      application_1352994193343_0001. If the value of this config is 1, the
      appid znode will be broken into two parts application_1352994193343_000
      and 1 respectively with former being the parent node.
      application_1352994193343_0002 will then be stored as 2 under the parent
      node application_1352994193343_000. This config can take values from 0 to 4.
      0 means there will be no split. If configuration value is outside this
      range, it will be treated as config value of 0(i.e. no split). A value
      larger than 0 (up to 4) should be configured if you are storing a large number
      of apps in ZK based RM state store and state store operations are failing due to
      LenError in Zookeeper.</description>
    <name>yarn.resourcemanager.zk-appid-node.split-index</name>
    <value>0</value>
  </property>

  

 

参考:https://cloud.tencent.com/developer/article/1491079

参考:https://issues.apache.org/jira/browse/YARN-2368

参考:https://issues.apache.org/jira/browse/YARN-2962






posted @ 2019-11-11 18:05  songchaolin  阅读(449)  评论(0编辑  收藏  举报