机房停电,ceph启动出现问题:

[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details

怎么解决呢?不知道,最后一顿捣鼓,他自己好了。但是并不知道他为什么好了。也什么都没干。

捣鼓的步骤如下:
查看/var/log/ceph/ceph.log说是osd超时,看一下日志报的osd连接的端口对方不存在。

[root@node1 my-cluster]# systemctl restart ceph-osd@0
[root@node1 my-cluster]# systemctl restart ceph-mon@node1

结果都报同一个错误。
是不是重启间隔太短,导致出问题?改下service文件

vim /etc/systemd/system/ceph-mon.target.wants/ceph-mon\@node1.service

把StartLimitInterval改成1min。
其他几个模块类似。
重新试,结果还是报“Transaction order is cyclic”
那就要排查问题了:

tail -f /var/log/message
systemctl restart ceph-osd@0

结果message没报错。
再次尝试。

[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details
[root@node1 my-cluster]# journalctl |tail
5月 18 19:32:01 node1 CROND[20494]: (root) CMD (. /root/.bashrc;. ~/.bash_profile;. /etc/profile;/usr/bin/python /usr/local/yfs/yfsagent.py >/dev/null 2>&1 &)
5月 18 19:32:02 node1 polkitd[1120]: Registered Authentication Agent for unix-process:20594:832875 (system bus name :1.947 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8)
5月 18 19:32:02 node1 systemd[1]: Found ordering cycle on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd.target/restart
5月 18 19:32:02 node1 polkitd[1120]: Unregistered Authentication Agent for unix-process:20594:832875 (system bus name :1.947, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnected from bus)
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd@0.service/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-mon.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Unable to break cycle
5月 18 19:32:02 node1 systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.

发现启动的顺序中先启动的是osd,那就

[root@node1 my-cluster]# systemctl restart ceph-osd@0.service

发现命令不报错了。

总之是个诡异问题。
建议下次碰类似问题建议调试时用如下方式:
看日志:

journalctl -xe
tail -f /var/log/message
tail -f /var/log/ceph/ceph.log 

关于此问题的其他文档:(与我遇到的情况并不相同)
https://tracker.ceph.com/issues/14839
https://github.com/ceph/ceph/pull/15835
https://github.com/ceph/ceph/pull/15051
https://tracker.ceph.com/issues/19910
https://tracker.ceph.com/issues/21035
https://tracker.ceph.com/issues/21477

posted on 2020-05-18 20:34  步孤天  阅读(2362)  评论(0编辑  收藏  举报