ceph提示daemons have recently crashed
查看集群状态
$ ceph -s
cluster:
id: b87d2535-406b-442d-8de2-49d86f7dc599
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph02,ceph03,ceph01 (age 2m)
mgr: ceph02(active, since 3m), standbys: ceph01, ceph03
mds: cephfs:1 {0=ceph02=up:active} 2 up:standby
osd: 10 osds: 10 up (since 2m), 10 in (since 2w)
rgw: 2 daemons active (ceph04, ceph05)
task status:
data:
pools: 8 pools, 217 pgs
objects: 562 objects, 869 MiB
usage: 13 GiB used, 627 GiB / 640 GiB avail
pgs: 217 active+clean
查看告警信息
$ ceph crash ls-new
ID ENTITY NEW
2022-07-05T01:21:14.829333Z_94747800-d04b-423d-98a8-d9c815e01cde mon.ceph01 *
$ ceph crash info 2022-07-05T01:21:14.829333Z_94747800-d04b-423d-98a8-d9c815e01cde
{
"archived": "2022-07-06 03:10:59.892192",
"assert_condition": "abort",
"assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.12/rpm/el7/BUILD/ceph-15.2.12/src/kv/RocksDBStore.cc",
"assert_func": "virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)",
"assert_line": 1152,
"assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.12/rpm/el7/BUILD/ceph-15.2.12/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fab7396d340 time 2022-07-05T09:21:14.826144+0800\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.12/rpm/el7/BUILD/ceph-15.2.12/src/kv/RocksDBStore.cc: 1152: ceph_abort_msg(\"Bad table magic number: expected 9863518390377041911, found 183618458615808 in /var/lib/ceph/mon/ceph-ceph01/store.db/005441.sst\")\n",
"assert_thread_name": "ceph-mon",
"backtrace": [
"(()+0xf5d0) [0x7fab6899e5d0]",
"(gsignal()+0x37) [0x7fab67793207]",
"(abort()+0x148) [0x7fab677948f8]",
"(ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x1a7) [0x7fab6ab94eaa]",
"(RocksDBStore::get(std::string const&, std::string const&, ceph::buffer::v15_2_0::list*)+0x3c0) [0x565479428970]",
"(main()+0xfdc) [0x56547919599c]",
"(__libc_start_main()+0xf5) [0x7fab6777f3d5]",
"(()+0x230a20) [0x5654791c7a20]"
],
"ceph_version": "15.2.12",
"crash_id": "2022-07-05T01:21:14.829333Z_94747800-d04b-423d-98a8-d9c815e01cde",
"entity_name": "mon.ceph01",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "7 (Core)",
"os_version_id": "7",
"process_name": "ceph-mon",
"stack_sig": "b6e84a1f780236828ff189c9a70d25ae768d831ec7b575c6fd0a45a5090ff6c2",
"timestamp": "2022-07-05T01:21:14.829333Z",
"utsname_hostname": "ceph01.ecloud.com",
"utsname_machine": "x86_64",
"utsname_release": "3.10.0-957.el7.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Thu Nov 8 23:39:32 UTC 2018"
}
如果确认该daemons告警已经恢复的话,需要手工屏蔽该告警即可...
清除告警
$ ceph crash archive 2022-07-05T01:21:14.829333Z_94747800-d04b-423d-98a8-d9c815e01cde
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)