大叔问题定位分享(30)mesos agent启动失败:Failed to perform recovery: Incompatible agent info detected

mesos agent启动失败,报错如下:

Feb 15 22:03:18 server1.bj mesos-slave[1190]: E0215 22:03:18.622994 1192 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: Old agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1"
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: New agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1.bj"

通过日志发现是因为hostname有了变化,这是因为修改hosts文件导致的

# cat /etc/hosts
192.168.0.1 server1 server1.bj
->
192.168.0.1 server1.bj server1

解决方法也提示出来了

Feb 15 22:03:18 server1.bj mesos-slave[1190]: If recovery failed due to a change in configuration and you want to
Feb 15 22:03:18 server1.bj mesos-slave[1190]: keep the current agent id, you might want to change the
Feb 15 22:03:18 server1.bj mesos-slave[1190]: `--reconfiguration_policy` flag to a more permissive value.
Feb 15 22:03:18 server1.bj mesos-slave[1190]:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: To restart this agent with a new agent id instead, do as follows:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: rm -f /var/lib/mesos/meta/slaves/latest
Feb 15 22:03:18 server1.bj mesos-slave[1190]: This ensures that the agent does not recover old live executors.

mesos agent保存一个slave.info,其中包含hostname,如果hostname有变化,即和slave.info中不一样,就会报错

# cat /var/lib/mesos/meta/slaves/latest/slave.info
¥
server1
cpus @2*
mem ̀2*
disk  ~ᄇ*
ports"
↑2)

修复

# rm -f /var/lib/mesos/meta/slaves/latest
# service mesos-slave start

 

posted @ 2019-02-15 22:24  匠人先生  阅读(820)  评论(0编辑  收藏  举报