Learning Traces...

--Great Love involves great effort
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

解决 Ambari 部分从节点的 NodeManager 无法启动问题

Posted on 2023-01-16 14:37  suyang  阅读(252)  评论(0编辑  收藏  举报

一、问题描述

日志文件信息如下:

2019-07-18 11:20:28,104 INFO  nodemanager.NodeManager (LogAdapter.java:info(45)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-07-18 11:20:29,069 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:openDatabase(963)) - Using state database at /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2019-07-18 11:20:29,103 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002: No such file or directory
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002: No such file or directory
    at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
    at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
    at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
    at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:966)
    at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:953)
    at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:200)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
2019-07-18 11:20:29,106 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002: No such file or directory
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002: No such file or directory
    at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)

.......

二、解决办法一

i. 删除 yarn-nm-state 目录的所有文件再次启动

[root@zwlbs3 ~]# cd /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/
[root@zwlbs3 yarn-nm-state]# ls
000003.log  CURRENT  LOCK  MANIFEST-000002
[root@zwlbs3 yarn-nm-state]# rm -rf *

ii. 再次启动 NodeManager 组件

我这里使用的是 Ambari 来管理的,直接就 Web 界面操作,也可以使用命令操作启动。

iii. 查看是否启动成功,已启动成功

[root@zwlbs3 ~]# jps
4832 DataNode
11425 Jps
7091 HRegionServer
3894 NodeManager

三、解决办法二(如果办法一无效,试试办法二)

i. 如果过一会变成 stop 状态,报错日志跟上面相同,如下:

2019-08-07 17:14:01,850 INFO  nodemanager.NodeManager (LogAdapter.java:info(45)) - registered UNIX signal handlers for [TERM, HUP, INT]
2019-08-07 17:14:02,859 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:openDatabase(963)) - Using state database at /var/log/hadoop-yarn/nodema
nager/recovery-state/yarn-nm-state for recovery
2019-08-07 17:14:02,891 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateSt
oreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/
MANIFEST-000002: No such file or directory
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002: No such file or directory
        at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
        at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
        at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:966)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:953)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:200)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
2019-08-07 17:14:02,895 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state INITED; cause: org.apache.hadoop.servic
e.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/MANIFEST-000002:
No such file or directory
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm
-state/MANIFEST-000002: No such file or directory
.......

ii. 删除 yarn-nm-state 目录的所有文件

[root@zwlbs3 ~]# cd /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/
[root@zwlbs3 yarn-nm-state]# ls
000003.log  CURRENT  LOCK  MANIFEST-000002
[root@zwlbs3 yarn-nm-state]# rm -rf *

iii. 重启该服务器的整个 Hadoop