【原創】大叔問題定位分享(30)mesos agent啟動失敗:Failed to perform recovery: Incompatible agent info detected
mesos agent啟動失敗,報錯如下:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: E0215 22:03:18.622994 1192 slave.cpp:7311] EXIT with status 1: Failed to perform recovery: Incompatible agent info detected.
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: Old agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1"
...
Feb 15 22:03:18 server1.bj mesos-slave[1190]: ------------------------------------------------------------
Feb 15 22:03:18 server1.bj mesos-slave[1190]: New agent info:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: hostname: "server1.bj"
通過日誌發現是因為hostname有了變化,這是因為修改hosts文件導致的
# cat /etc/hosts
192.168.0.1 server1 server1.bj
->
192.168.0.1 server1.bj server1
解決方法也提示出來了
Feb 15 22:03:18 server1.bj mesos-slave[1190]: If recovery failed due to a change in configuration and you want to
Feb 15 22:03:18 server1.bj mesos-slave[1190]: keep the current agent id, you might want to change the
Feb 15 22:03:18 server1.bj mesos-slave[1190]: `--reconfiguration_policy` flag to a more permissive value.
Feb 15 22:03:18 server1.bj mesos-slave[1190]:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: To restart this agent with a new agent id instead, do as follows:
Feb 15 22:03:18 server1.bj mesos-slave[1190]: rm -f /var/lib/mesos/meta/slaves/latest
Feb 15 22:03:18 server1.bj mesos-slave[1190]: This ensures that the agent does not recover old live executors.
mesos agent保存一個slave.info,其中包含hostname,如果hostname有變化,即和slave.info中不一樣,就會報錯
# cat /var/lib/mesos/meta/slaves/latest/slave.info
¥
server1
cpus @2*
mem ?2*
disk ~?*
ports"
??2)
修復
# rm -f /var/lib/mesos/meta/slaves/latest
# service mesos-slave start
【原創】大叔問題定位分享(30)mesos agent啟動失敗:Failed to perform recovery: Incompatible agent info detected