Ceph Reef(18.2.X)的内置Prometheus监控集群
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
目录
一.ceph集群的维护命令
1.查看ceph集群的服务
[root@ceph141 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 6m ago 3d count:1
ceph-exporter 3/3 6m ago 3d *
crash 3/3 6m ago 3d *
grafana ?:3000 1/1 6m ago 3d count:1
mds.yinzhengjie-cephfs 2/2 6m ago 4h count:2
mgr 2/2 6m ago 3d count:2
mon 3/5 6m ago 3d count:5
node-exporter ?:9100 2/3 6m ago 3d *
osd 7 6m ago - <unmanaged>
prometheus ?:9095 1/1 6m ago 3d count:1
rgw.yinzhengjie ?:80 1/1 6m ago 3h ceph142
[root@ceph141 ~]#
2.查看ceph集群的守护进程
[root@ceph141 ~]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph141 ceph141 *:9093,9094 running (3d) 7m ago 3d 33.5M - 0.25.0 c8568f914cd2 39916714f5c4
ceph-exporter.ceph141 ceph141 running (3d) 7m ago 3d 15.5M - 19.2.0 37996728e013 e83682a704c5
ceph-exporter.ceph142 ceph142 running (4h) 7m ago 3d 17.5M - 19.2.0 37996728e013 794faf7c83be
ceph-exporter.ceph143 ceph143 running (3d) 3m ago 3d 5816k - 19.2.0 37996728e013 ef35746c2790
crash.ceph141 ceph141 running (3d) 7m ago 3d 6011k - 19.2.0 37996728e013 239196bc2138
crash.ceph142 ceph142 running (4h) 7m ago 3d 10.7M - 19.2.0 37996728e013 c244607c0ac1
crash.ceph143 ceph143 running (3d) 3m ago 3d 7084k - 19.2.0 37996728e013 2c7cf4d86cec
grafana.ceph141 ceph141 *:3000 running (3d) 7m ago 3d 120M - 9.4.12 2bacad6d85d8 fc2aa29756db
mds.yinzhengjie-cephfs.ceph141.xafiwn ceph141 running (4h) 7m ago 4h 28.4M - 19.2.0 37996728e013 07be7955b835
mds.yinzhengjie-cephfs.ceph142.fgxhvp ceph142 running (4h) 7m ago 4h 22.4M - 19.2.0 37996728e013 abbc0cea25b6
mgr.ceph141.bszrgd ceph141 *:9283,8765,8443 running (3d) 7m ago 3d 426M - 19.2.0 37996728e013 a9ffef88ddd6
mgr.ceph143.ihhymg ceph143 *:8443,9283,8765 running (3d) 3m ago 3d 390M - 19.2.0 37996728e013 07c3adf66618
mon.ceph141 ceph141 running (3d) 7m ago 3d 473M 2048M 19.2.0 37996728e013 127756431a52
mon.ceph142 ceph142 running (4h) 7m ago 3d 277M 2048M 19.2.0 37996728e013 5c75c9081b83
mon.ceph143 ceph143 running (3d) 3m ago 3d 461M 2048M 19.2.0 37996728e013 87811f5e96d8
node-exporter.ceph141 ceph141 *:9100 running (3d) 7m ago 3d 18.4M - 1.5.0 0da6a335fe13 143c40f25383
node-exporter.ceph142 ceph142 *:9100 running (4h) 7m ago 3d 8647k - 1.5.0 0da6a335fe13 392ca160b1e1
node-exporter.ceph143 ceph143 *:9100 error 3m ago 3d - - <unknown> <unknown> <unknown>
osd.0 ceph141 running (3d) 7m ago 3d 89.2M 4096M 19.2.0 37996728e013 cb2344cd948b
osd.1 ceph141 running (3d) 7m ago 3d 132M 4096M 19.2.0 37996728e013 4a448a012065
osd.2 ceph142 running (4h) 7m ago 3d 89.7M 4096M 19.2.0 37996728e013 f6d2f5539f9e
osd.3 ceph142 running (4h) 7m ago 3d 79.0M 4096M 19.2.0 37996728e013 0f0f2c9f7ef1
osd.4 ceph142 running (4h) 7m ago 3d 125M 4096M 19.2.0 37996728e013 e0688f657e6b
osd.5 ceph143 running (3d) 3m ago 3d 117M 4096M 19.2.0 37996728e013 7f43160e7730
osd.6 ceph143 running (3d) 3m ago 3d 133M 4096M 19.2.0 37996728e013 30dce89758bf
prometheus.ceph141 ceph141 *:9095 running (3d) 7m ago 3d 112M - 2.43.0 a07b618ecd1d e36977d98fe5
rgw.yinzhengjie.ceph142.qxznra ceph142 *:80 running (3h) 7m ago 3h 156M - 19.2.0 37996728e013 623284cc5e9d
[root@ceph141 ~]#
3.查看指定节点的守护进程
[root@ceph141 ~]# ceph orch ps ceph143
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
ceph-exporter.ceph143 ceph143 running (3d) 5m ago 3d 5816k - 19.2.0 37996728e013 ef35746c2790
crash.ceph143 ceph143 running (3d) 5m ago 3d 7084k - 19.2.0 37996728e013 2c7cf4d86cec
mgr.ceph143.ihhymg ceph143 *:8443,9283,8765 running (3d) 5m ago 3d 390M - 19.2.0 37996728e013 07c3adf66618
mon.ceph143 ceph143 running (3d) 5m ago 3d 461M 2048M 19.2.0 37996728e013 87811f5e96d8
node-exporter.ceph143 ceph143 *:9100 error 5m ago 3d - - <unknown> <unknown> <unknown>
osd.5 ceph143 running (3d) 5m ago 3d 117M 4096M 19.2.0 37996728e013 7f43160e7730
osd.6 ceph143 running (3d) 5m ago 3d 133M 4096M 19.2.0 37996728e013 30dce89758bf
[root@ceph141 ~]#
4.重启指定节点守护进程服务
[root@ceph141 ~]# ceph orch daemon restart node-exporter.ceph143
Scheduled to restart node-exporter.ceph143 on host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps ceph143
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
ceph-exporter.ceph143 ceph143 running (3d) 5s ago 3d 5820k - 19.2.0 37996728e013 ef35746c2790
crash.ceph143 ceph143 running (3d) 5s ago 3d 7084k - 19.2.0 37996728e013 2c7cf4d86cec
mgr.ceph143.ihhymg ceph143 *:8443,9283,8765 running (3d) 5s ago 3d 390M - 19.2.0 37996728e013 07c3adf66618
mon.ceph143 ceph143 running (3d) 5s ago 3d 457M 2048M 19.2.0 37996728e013 87811f5e96d8
node-exporter.ceph143 ceph143 *:9100 running 5s ago 3d - - <unknown> <unknown> <unknown>
osd.5 ceph143 running (3d) 5s ago 3d 117M 4096M 19.2.0 37996728e013 7f43160e7730
osd.6 ceph143 running (3d) 5s ago 3d 134M 4096M 19.2.0 37996728e013 30dce89758bf
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps ceph143
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
ceph-exporter.ceph143 ceph143 running (3d) 30s ago 3d 5816k - 19.2.0 37996728e013 ef35746c2790
crash.ceph143 ceph143 running (3d) 30s ago 3d 7063k - 19.2.0 37996728e013 2c7cf4d86cec
mgr.ceph143.ihhymg ceph143 *:8443,9283,8765 running (3d) 30s ago 3d 389M - 19.2.0 37996728e013 07c3adf66618
mon.ceph143 ceph143 running (3d) 30s ago 3d 457M 2048M 19.2.0 37996728e013 87811f5e96d8
node-exporter.ceph143 ceph143 *:9100 running (35s) 30s ago 3d 2515k - 1.5.0 0da6a335fe13 ce23389f20e6
osd.5 ceph143 running (3d) 30s ago 3d 116M 4096M 19.2.0 37996728e013 7f43160e7730
osd.6 ceph143 running (3d) 30s ago 3d 134M 4096M 19.2.0 37996728e013 30dce89758bf
[root@ceph141 ~]#
5.查看主机有哪些设备列表
[root@ceph141 ~]# ceph orch device ls
HOST PATH TYPE DEVICE ID SIZE AVAILABLE REFRESHED REJECT REASONS
ceph141 /dev/sdb hdd 300G No 18m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph141 /dev/sdc hdd 500G No 18m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph141 /dev/sr0 hdd VMware_Virtual_SATA_CDRW_Drive_01000000000000000001 1023M No 18m ago Failed to determine if device is BlueStore, Insufficient space (<5GB)
ceph142 /dev/sdb hdd 300G No 11m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph142 /dev/sdc hdd 500G No 11m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph142 /dev/sdd hdd 1024G No 11m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph142 /dev/sr0 hdd VMware_Virtual_SATA_CDRW_Drive_01000000000000000001 1023M No 11m ago Failed to determine if device is BlueStore, Insufficient space (<5GB)
ceph143 /dev/sdb hdd 300G No 8m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph143 /dev/sdc hdd 500G No 8m ago Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected
ceph143 /dev/sr0 hdd VMware_Virtual_SATA_CDRW_Drive_01000000000000000001 1023M No 8m ago Failed to determine if device is BlueStore, Insufficient space (<5GB)
[root@ceph141 ~]#
6.查看集群有哪些主机列表
[root@ceph141 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph141 10.0.0.141 _admin
ceph142 10.0.0.142
ceph143 10.0.0.143
3 hosts in cluster
[root@ceph141 ~]#
7.报告配置的后端及其状态
[root@ceph141 ~]# ceph orch status
Backend: cephadm
Available: Yes
Paused: No
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch status --detail
Backend: cephadm
Available: Yes
Paused: No
Host Parallelism: 10
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch status --detail --format json
{"available": true, "backend": "cephadm", "paused": false, "workers": 10}
[root@ceph141 ~]#
8.检查服务版本与可用和目标容器
[root@ceph141 ~]# docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/ceph/ceph v19 37996728e013 2 months ago 1.28GB
quay.io/ceph/grafana 9.4.12 2bacad6d85d8 18 months ago 330MB
quay.io/prometheus/prometheus v2.43.0 a07b618ecd1d 20 months ago 234MB
quay.io/prometheus/alertmanager v0.25.0 c8568f914cd2 23 months ago 65.1MB
quay.io/prometheus/node-exporter v1.5.0 0da6a335fe13 2 years ago 22.5MB
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch upgrade check quay.io/ceph/ceph:v19
{
"needs_update": {},
"non_ceph_image_daemons": [
"prometheus.ceph141",
"grafana.ceph141",
"node-exporter.ceph141",
"alertmanager.ceph141",
"node-exporter.ceph142",
"node-exporter.ceph143"
],
"target_digest": "quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a",
"target_id": "37996728e013360fed4bcdfab53aacf63ee07216cc3b2d8def8ee4d3785da829",
"target_name": "quay.io/ceph/ceph:v19",
"target_version": "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)",
"up_to_date": [
"osd.1",
"crash.ceph141",
"mds.yinzhengjie-cephfs.ceph141.xafiwn",
"osd.0",
"mgr.ceph141.bszrgd",
"mon.ceph141",
"ceph-exporter.ceph141",
"osd.4",
"mds.yinzhengjie-cephfs.ceph142.fgxhvp",
"osd.2",
"ceph-exporter.ceph142",
"mon.ceph142",
"crash.ceph142",
"rgw.yinzhengjie.ceph142.qxznra",
"osd.3",
"osd.6",
"mgr.ceph143.ihhymg",
"mon.ceph143",
"ceph-exporter.ceph143",
"crash.ceph143",
"osd.5"
]
}
[root@ceph141 ~]#
9.查看指定服务的信息
[root@ceph141 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 4m ago 3d count:1
ceph-exporter 3/3 4m ago 3d *
crash 3/3 4m ago 3d *
grafana ?:3000 1/1 4m ago 3d count:1
mds.yinzhengjie-cephfs 2/2 4m ago 4h count:2
mgr 2/2 4m ago 3d count:2
mon 3/5 4m ago 3d count:5
node-exporter ?:9100 3/3 4m ago 3d *
osd 7 4m ago - <unmanaged>
prometheus ?:9095 1/1 4m ago 3d count:1
rgw.yinzhengjie ?:80 1/1 4m ago 3h ceph142
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name alertmanager
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph141 ceph141 *:9093,9094 running (3d) 4m ago 3d 33.6M - 0.25.0 c8568f914cd2 39916714f5c4
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name mon
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
mon.ceph141 ceph141 running (3d) 5m ago 3d 473M 2048M 19.2.0 37996728e013 127756431a52
mon.ceph142 ceph142 running (4h) 5m ago 3d 298M 2048M 19.2.0 37996728e013 5c75c9081b83
mon.ceph143 ceph143 running (3d) 3m ago 3d 463M 2048M 19.2.0 37996728e013 87811f5e96d8
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name mon --format json-pretty
[
{
"container_id": "127756431a52",
"container_image_digests": [
"quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a"
],
"container_image_id": "37996728e013360fed4bcdfab53aacf63ee07216cc3b2d8def8ee4d3785da829",
"container_image_name": "quay.io/ceph/ceph:v19",
"cpu_percentage": "0.80%",
"created": "2024-12-03T03:38:36.969252Z",
"daemon_id": "ceph141",
"daemon_name": "mon.ceph141",
"daemon_type": "mon",
"events": [
"2024-12-03T04:26:45.427264Z daemon:mon.ceph141 [INFO] \"Reconfigured mon.ceph141 on host 'ceph141'\""
],
"hostname": "ceph141",
"is_active": false,
"last_refresh": "2024-12-06T07:26:07.589268Z",
"memory_request": 2147483648,
"memory_usage": 496081305,
"ports": [],
"service_name": "mon",
"started": "2024-12-03T03:38:38.668866Z",
"status": 1,
"status_desc": "running",
"version": "19.2.0"
},
{
"container_id": "5c75c9081b83",
"container_image_digests": [
"quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a"
],
"container_image_id": "37996728e013360fed4bcdfab53aacf63ee07216cc3b2d8def8ee4d3785da829",
"container_image_name": "quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a",
"cpu_percentage": "0.41%",
"created": "2024-12-03T04:24:27.453259Z",
"daemon_id": "ceph142",
"daemon_name": "mon.ceph142",
"daemon_type": "mon",
"events": [
"2024-12-03T04:24:27.546563Z daemon:mon.ceph142 [INFO] \"Deployed mon.ceph142 on host 'ceph142'\"",
"2024-12-03T04:26:48.953106Z daemon:mon.ceph142 [INFO] \"Reconfigured mon.ceph142 on host 'ceph142'\""
],
"hostname": "ceph142",
"is_active": false,
"last_refresh": "2024-12-06T07:26:07.024367Z",
"memory_request": 2147483648,
"memory_usage": 313419366,
"ports": [],
"service_name": "mon",
"started": "2024-12-06T02:44:29.769729Z",
"status": 1,
"status_desc": "running",
"version": "19.2.0"
},
{
"container_id": "87811f5e96d8",
"container_image_digests": [
"quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a"
],
"container_image_id": "37996728e013360fed4bcdfab53aacf63ee07216cc3b2d8def8ee4d3785da829",
"container_image_name": "quay.io/ceph/ceph@sha256:200087c35811bf28e8a8073b15fa86c07cce85c575f1ccd62d1d6ddbfdc6770a",
"cpu_percentage": "0.47%",
"created": "2024-12-03T04:26:26.584745Z",
"daemon_id": "ceph143",
"daemon_name": "mon.ceph143",
"daemon_type": "mon",
"events": [
"2024-12-03T04:26:26.653445Z daemon:mon.ceph143 [INFO] \"Deployed mon.ceph143 on host 'ceph143'\"",
"2024-12-03T04:26:51.807015Z daemon:mon.ceph143 [INFO] \"Reconfigured mon.ceph143 on host 'ceph143'\""
],
"hostname": "ceph143",
"is_active": false,
"last_refresh": "2024-12-06T07:28:14.561826Z",
"memory_request": 2147483648,
"memory_usage": 485700403,
"ports": [],
"service_name": "mon",
"started": "2024-12-03T04:26:26.672043Z",
"status": 1,
"status_desc": "running",
"version": "19.2.0"
}
]
[root@ceph141 ~]#
二.ceph集群的监控
1.查看集群的架构
[root@ceph141 ~]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 2m ago 3d count:1
ceph-exporter 3/3 2m ago 3d *
crash 3/3 2m ago 3d *
grafana ?:3000 1/1 2m ago 3d count:1
mds.yinzhengjie-cephfs 2/2 2m ago 5h count:2
mgr 2/2 2m ago 3d count:2
mon 3/5 2m ago 3d count:5
node-exporter ?:9100 3/3 2m ago 3d *
osd 7 2m ago - <unmanaged>
prometheus ?:9095 1/1 2m ago 3d count:1
rgw.yinzhengjie ?:80 1/1 2m ago 3h ceph142
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name alertmanager
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.ceph141 ceph141 *:9093,9094 running (3d) 3m ago 3d 32.7M - 0.25.0 c8568f914cd2 39916714f5c4
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name grafana
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
grafana.ceph141 ceph141 *:3000 running (3d) 3m ago 3d 117M - 9.4.12 2bacad6d85d8 fc2aa29756db
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name node-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
node-exporter.ceph141 ceph141 *:9100 running (3d) 3m ago 3d 17.8M - 1.5.0 0da6a335fe13 143c40f25383
node-exporter.ceph142 ceph142 *:9100 running (4h) 3m ago 3d 8976k - 1.5.0 0da6a335fe13 392ca160b1e1
node-exporter.ceph143 ceph143 *:9100 running (22m) 84s ago 3d 8064k - 1.5.0 0da6a335fe13 ce23389f20e6
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name prometheus
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
prometheus.ceph141 ceph141 *:9095 running (3d) 3m ago 3d 115M - 2.43.0 a07b618ecd1d e36977d98fe5
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch ps --service_name ceph-exporter
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
ceph-exporter.ceph141 ceph141 running (3d) 3m ago 3d 15.5M - 19.2.0 37996728e013 e83682a704c5
ceph-exporter.ceph142 ceph142 running (4h) 3m ago 3d 17.5M - 19.2.0 37996728e013 794faf7c83be
ceph-exporter.ceph143 ceph143 running (3d) 103s ago 3d 5816k - 19.2.0 37996728e013 ef35746c2790
[root@ceph141 ~]#
温馨提示:
不难发现,有grafana,alertmanager,ceph-exporter,Prometheus等组件默认都是安装好的,说白了,无需手动安装。
所以,基于cephadm方式部署的环境,可以直接使用Prometheus监控。若使用的ceph-deploy方式部署,则需要手动配置各组件。
2.查看Prometheus的WEbUI
http://10.0.0.141:9095/targets?search=
3.查看grafana的WebUI
https://10.0.0.141:3000/
4.查看node-exporter
http://10.0.0.141:9100/metrics
5.查看alertmanger
http://10.0.0.141:9093/#/status
三.自实现Prometheus监控参考链接
推荐阅读:
https://github.com/digitalocean/ceph_exporter
https://github.com/blemmenes/radosgw_usage_exporter
本文来自博客园,作者:尹正杰,转载请注明原文链接:https://www.cnblogs.com/yinzhengjie/p/18378611,个人微信: "JasonYin2020"(添加时请备注来源及意图备注,有偿付费)
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。
标签:
Ceph
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 我干了两个月的大项目,开源了!
· 推荐一款非常好用的在线 SSH 管理工具
· 千万级的大表,如何做性能调优?
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· .NET周刊【1月第1期 2025-01-05】
2017-08-25 安装MACOS操作步骤详解