YARN 启动后失败退出——没有请求资源——Invalid resource request, no resources request

在ambari-server中修改了yarn的配置,重新启动服务,结果RM启动失败,错误也很奇怪,“不合理的资源请求,没有请求任何资源”!详细如下:

2018-08-21 16:06:16,639 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1495)) - Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1213)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1254)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1250)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1250)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1301)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1492)
Caused by: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:489)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
        at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1464)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:825)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        ... 10 more
2018-08-21 16:06:16,656 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x36546c044dc0113 closed
2018-08-21 16:06:16,656 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-08-21 16:06:16,741 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(49)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at ep-bd01/192.168.58.11

网上多方搜索无解,最后无奈重新启动主机,重启所有服务,结果成功! 再次重启RM,失败,原因同上。

一、配置RM HA,这次启动了,但是配置的两个RM节点都是standby状态! 期间再次修改配置文件无数次,无效,错误信息依然。

二、手工激活一台主机上的RM,失败,错误原因相同

[root@ep-bd01 zookeeper]# yarn rmadmin -transitionToActive --forceactive --forcemanual rm1
You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

You may abort safely by answering 'n' or hitting ^C now.

Are you sure you want to continue? (Y or N) y
......
......

 

18/08/29 14:31:10 WARN ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
        at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
        at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
        ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1213)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1254)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1250)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1250)
        at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
        ... 5 more

 

 

At Last! 经过好几天的网上搜索以及思考,这个错误可能是HDP3.0的新错误信息,和网上搜索到的一个问题有些类似,现象同样是RM启动成功后马上挂掉! 其中提到可能是RM回复application的状态引起的故障,急忙实验一下。

简而言之,使用zookeeper命令删除 /rmstore/ZKRMStateRoot/RMAppRoot 下面的所有子目录。

然后重启RM,没想到困扰几天的问题就这么解决了,具体请看输出吧(容我乐一会儿先)。

[root@ep-bd03 pg_log]# sudo -u zookeeper /usr/hdp/3.0.0.0-1634/zookeeper/bin/zkCli.sh

Connecting to localhost:
2181 2018-08-29 15:04:02,395 - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1634--1, built on 07/12/2018 20:01 GMT 2018-08-29 15:04:02,397 - INFO [main:Environment@100] - Client environment:host.name=ep-bd03 2018-08-29 15:04:02,397 - INFO [main:Environment@100] - Client environment:java.version=1.8.0_181 2018-08-29 15:04:02,398 - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2018-08-29 15:04:02,398 - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_181-amd64/jre 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.class.path=/usr/hdp/3.0.0.0-1634/zookeeper/bin/../build/classes:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../build/lib/*.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/xercesMinimal-1.9.6.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-provider-api-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-shared4-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-shared-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-file-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-utils-3.0.8.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-interpolation-1.11.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/nekohtml-1.9.6.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-settings-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-repository-metadata-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-project-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-profile-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-plugin-registry-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-model-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-error-diagnostics-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-artifact-manager-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-artifact-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-ant-tasks-2.1.3.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/jsoup-1.7.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-logging-1.1.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-io-2.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-codec-1.6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/classworlds-1.1-alpha-2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/backport-util-concurrent-3.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/ant-launcher-1.8.0.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/ant-1.8.0.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../zookeeper-3.4.6.3.0.0.0-1634.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../src/java/lib/*.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../conf::/usr/share/zookeeper/* 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.name=Linux 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.arch=amd64 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.version=3.10.0-862.6.3.el7.x86_64 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:user.name=zookeeper 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:user.home=/var/lib/zookeeper 2018-08-29 15:04:02,400 - INFO [main:Environment@100] - Client environment:user.dir=/tmp/hsperfdata_zookeeper 2018-08-29 15:04:02,401 - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@6438a396 Welcome to ZooKeeper! 2018-08-29 15:04:02,417 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1019] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2018-08-29 15:04:02,461 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@864] - Socket connection established, initiating session, client: /127.0.0.1:7637, server: localhost/127.0.0.1:2181 [zk: localhost:2181(CONNECTING) 0] 2018-08-29 15:04:02,484 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1279] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x3658450e5f202da, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls /rmstore [ZKRMStateRoot] [zk: localhost:2181(CONNECTED) 1] ls /rmstore/ZKRMStateRoot [ReservationSystemRoot, RMAppRoot, AMRMTokenSecretManagerRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

[zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot
[application_1534904073745_0001, HIERARCHIES, application_1534904073745_0003, application_1534904073745_0002]

[zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0001
[zk: localhost:2181(CONNECTED) 4] rmr /rmstore/ZKRMStateRoot/RMAppRoot/HIERARCHIES
[zk: localhost:2181(CONNECTED) 5] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0003
[zk: localhost:2181(CONNECTED) 5] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0002

[zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot
/RMAppRoot
[]
[zk: localhost:2181(CONNECTED) 8] 

 

posted @ 2018-08-22 10:30  柒零壹  阅读(3124)  评论(0编辑  收藏  举报