YARN异常YarnException:Failed while publishing entity的解决方案
版本:HDP3.0
mapreduce提交任务计算时,job已经结束,但是容器仍不能关闭持续等待五分钟
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
INFO[Thread-100] org.apache.hadoop.yarn.event.AsyncDispatcher:Waiting for AsyncDispatcher to drain.Thread state is :WAITING
五分钟后抛出异常:
org.apache.hadoop.yarn.exceptions.YarnException:Failed while publishing entity
...
Cause By :com.sun.jersey.api.client.ClientHandlerException:java.net.SocketTimeoutException:Read timed out
...
Cause By :java.net.SocketTimeoutException:Read timed out
发生这种情况是因为来自ATSv2的嵌入式HBASE崩溃。
解决这个问题的方法需要重置ATsv2内嵌HBASE数据库
1.停止Yarn服务
Ambari -> Yarn-Actions -> Stop
2.删除Zookeeper上的ATSv2 Znode
zookeeper-client -server zookeeper-quorum-servers
rmr /atsv2-hbase-unsecure或rmr /atsv2-hbase-secure(如果是kerberized集群)
3.从HDFS移动Hbase时间线服务器Hbase嵌入式数据库
hdfs dfs -mv /atsv2/hbase/tmp/
4.开始使用yarn服务
Ambari - > Yarn-Actions- > Start
再次重新提交任务,发现程序正常,问题解决