Ceph实验室:第四课:Ceph监控
本课程演示如何监控一个Ceph集群。我们将学习如何用ceph的命令行工具进行监控。
监控集群的整体状态
- 健康状态
ceph命令的health选项查看集群的健康状态。
# ceph health detail
HEALTH_WARN clock skew detected on mon.ceph-node2; Monitor clock skew detected
mon.ceph-node2 addr 192.168.1.121:6789/0 clock skew 0.194536s > max 0.05s (latency 0.00103766s)
HEALTH_WARN表示集群处于“告警”状态。
- 集群事件监控
ceph命令的-w选项用于监控集群的事件,这个命令会实时显示集群中所有的事件消息。
# ceph -w
cluster b6e0c604-c9dd-4eac-9b00-45ff246d64fb
health HEALTH_WARN
clock skew detected on mon.ceph-node2
Monitor clock skew detected
monmap e2: 2 mons at {ceph-node1=192.168.1.120:6789/0,ceph-node2=192.168.1.121:6789/0}
election epoch 24, quorum 0,1 ceph-node1,ceph-node2
osdmap e28: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v99: 0 pgs, 0 pools, 0 bytes data, 0 objects
197 MB used, 91895 MB / 92093 MB avail
2017-03-16 12:00:00.000245 mon.0 [INF] HEALTH_WARN; clock skew detected on mon.ceph-node2; Monitor clock skew detected
- 集群存储空间统计
ceph命令的df选项可以获取集群的存储空间使用统计数据,包括集群的总容量,剩余的可用容量,以及已使用容量的百分百。
# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
92093M 91895M 197M 0.21
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
- 集群状态
使用status选项可以查看集群的状态。
# ceph -s
cluster b6e0c604-c9dd-4eac-9b00-45ff246d64fb
health HEALTH_WARN
clock skew detected on mon.ceph-node2
Monitor clock skew detected
monmap e2: 2 mons at {ceph-node1=192.168.1.120:6789/0,ceph-node2=192.168.1.121:6789/0}
election epoch 24, quorum 0,1 ceph-node1,ceph-node2
osdmap e28: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v99: 0 pgs, 0 pools, 0 bytes data, 0 objects
197 MB used, 91895 MB / 92093 MB avail
# ntpdate pool.ntp.org
17 Mar 04:18:56 ntpdate[20268]: adjust time server 61.216.153.106 offset 0.000257 sec
# ceph -s
cluster b6e0c604-c9dd-4eac-9b00-45ff246d64fb
health HEALTH_OK
monmap e2: 2 mons at {ceph-node1=192.168.1.120:6789/0,ceph-node2=192.168.1.121:6789/0}
election epoch 26, quorum 0,1 ceph-node1,ceph-node2
osdmap e30: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v106: 0 pgs, 0 pools, 0 bytes data, 0 objects
198 MB used, 91895 MB / 92093 MB avail
监控monitor
- 查看MON状态
使用ceph命令的mon stat或者mon dump子命令来查看monitor的状态和map信息。
# ceph mon stat
e2: 2 mons at {ceph-node1=192.168.1.120:6789/0,ceph-node2=192.168.1.121:6789/0}, election epoch 26, quorum 0,1 ceph-node1,ceph-node2
# ceph mon dump
dumped monmap epoch 2
epoch 2
fsid b6e0c604-c9dd-4eac-9b00-45ff246d64fb
last_changed 2017-03-16 10:10:30.700537
created 2017-03-16 09:56:25.645057
0: 192.168.1.120:6789/0 mon.ceph-node1
1: 192.168.1.121:6789/0 mon.ceph-node2
- 查看MON的仲裁状态
在IO操作中,客户端会首先尝试与仲裁中的Leader Mon建立连接。如果leader不可用,客户端会以此尝试连接其他的mon。
# ceph quorum_status -f json-pretty
{
"election_epoch": 26,
"quorum": [
0,
1
],
"quorum_names": [
"ceph-node1",
"ceph-node2"
],
"quorum_leader_name": "ceph-node1",
"monmap": {
"epoch": 2,
"fsid": "b6e0c604-c9dd-4eac-9b00-45ff246d64fb",
"modified": "2017-03-16 10:10:30.700537",
"created": "2017-03-16 09:56:25.645057",
"mons": [
{
"rank": 0,
"name": "ceph-node1",
"addr": "192.168.1.120:6789\/0"
},
{
"rank": 1,
"name": "ceph-node2",
"addr": "192.168.1.121:6789\/0"
}
]
}
}
监控OSD
- OSD tree
OSD tree会显示每个节点上所有OSD及其在CRUSH map中的位置。
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.08752 root default
-2 0.04376 host ceph-node1
0 0.01459 osd.0 up 1.00000 1.00000
1 0.01459 osd.1 up 1.00000 1.00000
2 0.01459 osd.2 up 1.00000 1.00000
-3 0.04376 host ceph-node2
3 0.01459 osd.3 up 1.00000 1.00000
4 0.01459 osd.4 up 1.00000 1.00000
5 0.01459 osd.5 up 1.00000 1.00000
- 查看CRUSH map
# ceph osd crush rule list
[
"replicated_ruleset"
]
# ceph osd crush rule dump "replicated_ruleset"
{
"rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
- 监控PG
# ceph pg stat
v107: 0 pgs: ; 0 bytes data, 198 MB used, 91895 MB / 92093 MB avail