1. 程式人生 > >【原創】大叔問題定位分享(30)mesos agent啟動失敗:Failed to perform recovery: Incompatible agent info detected

【原創】大叔問題定位分享(30)mesos agent啟動失敗:Failed to perform recovery: Incompatible agent info detected

cpp 方法 fail mesos perf mes inf for cut

mesos agent啟動失敗,報錯如下:

Feb 15 22:03:18 server1.bj mesos-slave[1190]: E0215 22:03:18.622994 1192 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: Old agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1"
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: New agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1.bj"

通過日誌發現是因為hostname有了變化,這是因為修改hosts文件導致的

# cat /etc/hosts
192.168.0.1 server1 server1.bj
->
192.168.0.1 server1.bj server1

解決方法也提示出來了

Feb 15 22:03:18 server1.bj mesos-slave[1190]: If recovery failed due to a change in configuration and you want to
Feb 15 22:03:18 server1.bj mesos-slave[1190]: keep the current agent id, you might want to change the
Feb 15 22:03:18 server1.bj mesos-slave[1190]: `--reconfiguration_policy` flag to a more permissive value.
Feb 15 22:03:18 server1.bj mesos-slave[1190]:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: To restart this agent with a new agent id instead, do as follows:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: rm -f /var/lib/mesos/meta/slaves/latest
Feb 15 22:03:18 server1.bj mesos-slave[1190]: This ensures that the agent does not recover old live executors.

mesos agent保存一個slave.info,其中包含hostname,如果hostname有變化,即和slave.info中不一樣,就會報錯

# cat /var/lib/mesos/meta/slaves/latest/slave.info

server1
cpus @2*
mem ?2*
disk ~?*
ports"
??2)

修復

# rm -f /var/lib/mesos/meta/slaves/latest
# service mesos-slave start

【原創】大叔問題定位分享(30)mesos agent啟動失敗:Failed to perform recovery: Incompatible agent info detected