ceph(七)crush map及实现prometheus监控ceph状态
一、编辑ceph crush运行图实现基于HDD和SSD磁盘实现数据冷热数据分类存储
1.1 运行图介绍
ceph集群由mon服务器维护的五种运行图
- monitor map/监视运行图
- OSD map/OSD运行图
- PG map/PG运行图
- Crush map/Controllers replication under scalable hashing 可控的、可复制的、可伸缩的一致性hash算法
- MDS map/cephfs metadata运行图
crush运行图,当新建存储池会基于OSD map创建出新的PG组合列表用于存储数据
crush 算法针对目的节点的选择:
目前有 5 种算法来实现节点的选择, 包括 Uniform、 List、 Tree、 Straw、 Straw2, 早期版本使用的是 ceph 项目的发起者发明的算法 straw, 目前已经发展社区优化的 straw2 版本。
straw(抽签算法):
抽签是指挑取一个最长的签, 而这个签值就是 OSD 的权重, 当创建存储池的时候会向 PG分配 OSD, straw 算法会遍历当前可用的 OSD 并优先使用中签的 OSD, 以让权重高的 OSD被分配较多的 PG 以存储更多
的数据。
1.2 crush分类管理
ceph crush算法分配PG的时候可以将PG分配到不同主机的OSD上,实现以主机为单位的高可用,这也是默认机制;但无法保证不同PG位于不同机柜或机房的主机,如果要实现基于机柜或更高级的IDC等方式的数据高可用;而且也不能实现A项目的数据在SSD,B项目的数据在机械盘,如果要实现此功能则需要导出crush运行图并手动编辑,之后再导入并覆盖原crush运行图。
1.3 原ceph node各添加一块SSD硬盘
查看新增磁盘
[root@ceph-node1 ~]#lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 63.5M 1 loop /snap/core20/2015 loop1 7:1 0 62M 1 loop /snap/core20/1611 loop2 7:2 0 91.9M 1 loop /snap/lxd/24061 loop3 7:3 0 40.9M 1 loop /snap/snapd/20092 loop4 7:4 0 67.8M 1 loop /snap/lxd/22753 sda 8:0 0 20G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 2G 0 part [SWAP] └─sda4 8:4 0 17G 0 part / sdb 8:16 0 1T 0 disk └─ceph--be8feab1--4bc8--44b8--9394--2eb42fca07fe-osd--block--10cd2d83--482f--4687--93ae--fe5e172694a1 253:3 0 1024G 0 lvm sdc 8:32 0 1T 0 disk └─ceph--df9cb0c1--18bc--4e35--aee0--62e59bdd540c-osd--block--57a2c73e--2629--4c8c--a9c5--e798d07cd332 253:0 0 1024G 0 lvm sdd 8:48 0 1T 0 disk └─ceph--49382021--143c--45a1--99e9--ac718504ec02-osd--block--e282d88f--4937--4f0d--bba8--960dd0f0a26d 253:1 0 1024G 0 lvm sde 8:64 0 1T 0 disk └─ceph--e54d4fb9--6f42--4e06--96a3--330813cf9342-osd--block--36287ce5--ed8a--4494--84dd--a8a739139f90 253:2 0 1024G 0 lvm sdf 8:80 0 1T 0 disk └─ceph--817f0c8b--611c--44e0--b7eb--c07cf05a7c78-osd--block--444174d7--8cd5--4567--bab3--247506d981a2 253:4 0 1024G 0 lvm nvme0n1 259:0 0 1T 0 disk
添加osd
# 擦除磁盘 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy disk zap ceph-node1 /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy disk zap ceph-node2 /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy disk zap ceph-node3 /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy disk zap ceph-node4 /dev/nvme0n1 # osd cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy --overwrite-conf osd create ceph-node1 --data /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy --overwrite-conf osd create ceph-node2 --data /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy --overwrite-conf osd create ceph-node3 --data /dev/nvme0n1 cephadmin@ceph-deploy:/data/ceph-cluster$ ceph-deploy --overwrite-conf osd create ceph-node4 --data /dev/nvme0n1
验证
cephadmin@ceph-deploy:/data/ceph-cluster$ ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -1 24.00000 - 24 TiB 8.0 GiB 928 MiB 140 KiB 7.1 GiB 24 TiB 0.03 1.00 - root default -7 6.00000 - 6.0 TiB 2.0 GiB 247 MiB 33 KiB 1.8 GiB 6.0 TiB 0.03 1.01 - host ceph-node1 10 hdd 1.00000 1.00000 1024 GiB 338 MiB 42 MiB 10 KiB 297 MiB 1024 GiB 0.03 1.00 42 up osd.10 11 hdd 1.00000 1.00000 1024 GiB 345 MiB 49 MiB 5 KiB 296 MiB 1024 GiB 0.03 1.02 40 up osd.11 12 hdd 1.00000 1.00000 1024 GiB 335 MiB 25 MiB 5 KiB 309 MiB 1024 GiB 0.03 0.99 48 up osd.12 13 hdd 1.00000 1.00000 1024 GiB 375 MiB 69 MiB 6 KiB 306 MiB 1024 GiB 0.04 1.11 53 up osd.13 14 hdd 1.00000 1.00000 1024 GiB 334 MiB 29 MiB 7 KiB 305 MiB 1024 GiB 0.03 0.98 43 up osd.14 20 ssd 1.00000 1.00000 1024 GiB 323 MiB 33 MiB 0 B 290 MiB 1024 GiB 0.03 0.95 48 up osd.20 -3 6.00000 - 6.0 TiB 2.0 GiB 228 MiB 36 KiB 1.8 GiB 6.0 TiB 0.03 1.00 - host ceph-node2 0 hdd 1.00000 1.00000 1024 GiB 321 MiB 22 MiB 8 KiB 299 MiB 1024 GiB 0.03 0.94 38 up osd.0 1 hdd 1.00000 1.00000 1024 GiB 321 MiB 19 MiB 5 KiB 302 MiB 1024 GiB 0.03 0.95 54 up osd.1 2 hdd 1.00000 1.00000 1024 GiB 319 MiB 21 MiB 5 KiB 298 MiB 1024 GiB 0.03 0.94 40 up osd.2 3 hdd 1.00000 1.00000 1024 GiB 372 MiB 57 MiB 12 KiB 315 MiB 1024 GiB 0.04 1.10 58 up osd.3 4 hdd 1.00000 1.00000 1024 GiB 346 MiB 48 MiB 6 KiB 298 MiB 1024 GiB 0.03 1.02 45 up osd.4 21 ssd 1.00000 1.00000 1024 GiB 351 MiB 61 MiB 0 B 290 MiB 1024 GiB 0.03 1.03 48 up osd.21 -5 6.00000 - 6.0 TiB 2.0 GiB 223 MiB 32 KiB 1.8 GiB 6.0 TiB 0.03 0.99 - host ceph-node3 5 hdd 1.00000 1.00000 1024 GiB 345 MiB 37 MiB 6 KiB 308 MiB 1024 GiB 0.03 1.02 50 up osd.5 6 hdd 1.00000 1.00000 1024 GiB 347 MiB 37 MiB 13 KiB 310 MiB 1024 GiB 0.03 1.02 49 up osd.6 7 hdd 1.00000 1.00000 1024 GiB 345 MiB 49 MiB 4 KiB 295 MiB 1024 GiB 0.03 1.01 52 up osd.7 8 hdd 1.00000 1.00000 1024 GiB 326 MiB 25 MiB 6 KiB 301 MiB 1024 GiB 0.03 0.96 44 up osd.8 9 hdd 1.00000 1.00000 1024 GiB 349 MiB 54 MiB 3 KiB 295 MiB 1024 GiB 0.03 1.03 47 up osd.9 22 ssd 1.00000 1.00000 1024 GiB 311 MiB 21 MiB 0 B 290 MiB 1024 GiB 0.03 0.92 46 up osd.22 -9 6.00000 - 6.0 TiB 2.0 GiB 229 MiB 39 KiB 1.8 GiB 6.0 TiB 0.03 1.00 - host ceph-node4 15 hdd 1.00000 1.00000 1024 GiB 334 MiB 33 MiB 10 KiB 301 MiB 1024 GiB 0.03 0.98 48 up osd.15 16 hdd 1.00000 1.00000 1024 GiB 344 MiB 37 MiB 5 KiB 307 MiB 1024 GiB 0.03 1.01 60 up osd.16 17 hdd 1.00000 1.00000 1024 GiB 345 MiB 45 MiB 10 KiB 300 MiB 1024 GiB 0.03 1.02 45 up osd.17 18 hdd 1.00000 1.00000 1024 GiB 342 MiB 29 MiB 8 KiB 313 MiB 1024 GiB 0.03 1.01 56 up osd.18 19 hdd 1.00000 1.00000 1024 GiB 346 MiB 40 MiB 6 KiB 306 MiB 1024 GiB 0.03 1.02 53 up osd.19 23 ssd 1.00000 1.00000 1024 GiB 335 MiB 45 MiB 0 B 290 MiB 1024 GiB 0.03 0.99 48 up osd.23 TOTAL 24 TiB 8.0 GiB 928 MiB 150 KiB 7.1 GiB 24 TiB 0.03 MIN/MAX VAR: 0.92/1.11 STDDEV: 0.00
1.4 导出crush运行图
导出的crush运行图为二进制格式,无法通过文本编辑器直接打开,需要使用crushtool工具转换为文本格式后才能通过文本编辑工具打开和编辑。
# 安装crushtool工具 [root@ceph-deploy ~]#apt install ceph-base -y # 导出crush map [root@ceph-deploy ~]#mkdir -p /data/ceph [root@ceph-deploy ~]#ceph osd getcrushmap -o /data/ceph/crushmap-v1 82
1.5 将运行图转换为文本
导出的运行图不能直接编辑,需要转换为文本格式再进行查看与编辑。
# 转换为文本文件 [root@ceph-deploy ~]#crushtool -d /data/ceph/crushmap-v1 > /data/ceph/crushmap-v1.txt [root@ceph-deploy ~]#file /data/ceph/crushmap-v1.txt /data/ceph/crushmap-v1.txt: ASCII text
查看内容
[root@ceph-deploy ~]#cat /data/ceph/crushmap-v1.txt # begin crush map # 可调整的crush map参数 tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices # 当前的设备列表 device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class hdd device 7 osd.7 class hdd device 8 osd.8 class hdd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class hdd device 13 osd.13 class hdd device 14 osd.14 class hdd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd device 19 osd.19 class hdd device 20 osd.20 class ssd device 21 osd.21 class ssd device 22 osd.22 class ssd device 23 osd.23 class ssd # types # 当前支持的类型 type 0 osd # osd 守护进程,对应到一个磁盘设备 type 1 host # 一个主机 type 2 chassis # 刀片服务器的机箱 type 3 rack # 包含若干个服务器的机柜/机架 type 4 row # 包含若干个机柜的一排机柜(一行机柜) type 5 pdu # 机柜的接入电源插座 type 6 pod # 一个机房中的若干个小房间 type 7 room # 包含若干机柜的房间,一个数据中心有好多这样的房间组成 type 8 datacenter # 一个数据中心或IDC type 9 zone # 可用区 type 10 region # 可用域 type 11 root # 分层的最顶部,根 # buckets host ceph-node2 { # 类型Host,名称为ceph-node2 id -3 # do not change unnecessarily # ceph生成的OSD ID,非必要不要改 id -4 class hdd # do not change unnecessarily id -11 class ssd # do not change unnecessarily # weight 6.000 alg straw2 # crush算法,管理OSD角色 hash 0 # rjenkins1 # 使用是哪个hash算法, 0表示选择rjenkins1这种hash算法 item osd.0 weight 1.000 # 包含的osd及osd权重比例,crush会自动根据磁盘空间计算,不同的磁盘空间的权重不一样 item osd.1 weight 1.000 item osd.2 weight 1.000 item osd.3 weight 1.000 item osd.4 weight 1.000 item osd.21 weight 1.000 } host ceph-node3 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily id -12 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.5 weight 1.000 item osd.6 weight 1.000 item osd.7 weight 1.000 item osd.8 weight 1.000 item osd.9 weight 1.000 item osd.22 weight 1.000 } host ceph-node1 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily id -13 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.11 weight 1.000 item osd.12 weight 1.000 item osd.13 weight 1.000 item osd.14 weight 1.000 item osd.10 weight 1.000 item osd.20 weight 1.000 } host ceph-node4 { id -9 # do not change unnecessarily id -10 class hdd # do not change unnecessarily id -14 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.15 weight 1.000 item osd.16 weight 1.000 item osd.17 weight 1.000 item osd.18 weight 1.000 item osd.19 weight 1.000 item osd.23 weight 1.000 } root default { # 根的配置 id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily id -15 class ssd # do not change unnecessarily # weight 24.000 alg straw2 hash 0 # rjenkins1 item ceph-node2 weight 6.000 item ceph-node3 weight 6.000 item ceph-node1 weight 6.000 item ceph-node4 weight 6.000 } # rules rule replicated_rule { # 副本池的默认配置 id 0 type replicated min_size 1 max_size 10 # 默认最大副本为10 step take default # 基于default定义的主机分配OSD step chooseleaf firstn 0 type host # 选择主机,故障域类型为主机 step emit # 弹出配置即返回给客户端 } # end crush map
1.6 编辑运行图
将默认(冷)数据存放至hdd磁盘,将test项目(热)数据存放至ssd磁盘
[root@ceph-deploy ~]#vim /data/ceph/crushmap-v1.txt # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class hdd device 7 osd.7 class hdd device 8 osd.8 class hdd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class hdd device 13 osd.13 class hdd device 14 osd.14 class hdd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd device 19 osd.19 class hdd device 20 osd.20 class ssd device 21 osd.21 class ssd device 22 osd.22 class ssd device 23 osd.23 class ssd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root # buckets host ceph-node2 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily id -11 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 item osd.2 weight 1.000 item osd.3 weight 1.000 item osd.4 weight 1.000 # item osd.21 weight 1.000 # 默认移除ssd磁盘 } host ceph-node3 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily id -12 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.5 weight 1.000 item osd.6 weight 1.000 item osd.7 weight 1.000 item osd.8 weight 1.000 item osd.9 weight 1.000 #item osd.22 weight 1.000 } host ceph-node1 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily id -13 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.11 weight 1.000 item osd.12 weight 1.000 item osd.13 weight 1.000 item osd.14 weight 1.000 item osd.10 weight 1.000 #item osd.20 weight 1.000 } host ceph-node4 { id -9 # do not change unnecessarily id -10 class hdd # do not change unnecessarily id -14 class ssd # do not change unnecessarily # weight 6.000 alg straw2 hash 0 # rjenkins1 item osd.15 weight 1.000 item osd.16 weight 1.000 item osd.17 weight 1.000 item osd.18 weight 1.000 item osd.19 weight 1.000 #item osd.23 weight 1.000 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily id -15 class ssd # weight 24.000 alg straw2 hash 0 # rjenkins1 item ceph-node2 weight 6.000 item ceph-node3 weight 6.000 item ceph-node1 weight 6.000 item ceph-node4 weight 6.000 } # test项目,添加含有ssd磁盘的node host ceph-ssdnode1 { id -101 id -102 class ssd # weight 5.000 alg straw2 hash 0 # rjenkins1 item osd.20 weight 1.000 } host ceph-ssdnode2 { id -103 id -104 class ssd # weight 5.000 alg straw2 hash 0 # rjenkins1 item osd.21 weight 1.000 } host ceph-ssdnode3 { id -105 # do not change unnecessarily id -106 class ssd # do not change unnecessarily # weight 5.000 alg straw2 hash 0 # rjenkins1 item osd.22 weight 1.000 } host ceph-ssdnode4 { id -107 id -108 class ssd # weight 5.000 alg straw2 hash 0 # rjenkins1 item osd.23 weight 1.000 } # 新增bucket,由拥有ssd磁盘的主机组成 root ssd { id -127 id -11 class ssd # weight 20.000 alg straw2 hash 0 # rjenkins1 item ceph-ssdnode1 weight 5.000 item ceph-ssdnode2 weight 5.000 item ceph-ssdnode3 weight 5.000 item ceph-ssdnode4 weight 5.000 } # 新增test项目的rules rule test_ssd_rule { id 20 type replicated min_size 1 max_size 5 step take ssd # 采用基于ssd定义的主机分配OSD step chooseleaf firstn 0 type host step emit } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
1.7 转换为crush二进制格式
[root@ceph-deploy ~]#crushtool -c /data/ceph/crushmap-v1.txt -o /data/ceph/crushmap-v2
1.8 导入新的crush运行图
导入的运行图会立即覆盖原有的运行图并立即生效
[root@ceph-deploy ~]#ceph osd setcrushmap -i /data/ceph/crushmap-v2 84
1.9 验证crush运行图效果
[root@ceph-deploy ~]#ceph osd crush rule dump [ { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] }, { "rule_id": 20, # 新增test规则的信息 "rule_name": "test_ssd_rule", "ruleset": 20, "type": 1, "min_size": 1, "max_size": 5, "steps": [ { "op": "take", "item": -127, "item_name": "ssd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ]
查看osd、host与rule对应关系
[root@ceph-deploy ~]#ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -127 20.00000 - 4.0 TiB 1.2 GiB 38 MiB 0 B 1.1 GiB 4.0 TiB 0.03 0.88 - root ssd # ssd规则都是ssd磁盘 -101 5.00000 - 1024 GiB 304 MiB 9.5 MiB 0 B 294 MiB 1024 GiB 0.03 0.89 - host ceph-ssdnode1 20 ssd 1.00000 1.00000 1024 GiB 304 MiB 9.5 MiB 0 B 294 MiB 1024 GiB 0.03 0.89 23 up osd.20 -103 5.00000 - 1024 GiB 304 MiB 9.5 MiB 0 B 294 MiB 1024 GiB 0.03 0.89 - host ceph-ssdnode2 21 ssd 1.00000 1.00000 1024 GiB 304 MiB 9.5 MiB 0 B 294 MiB 1024 GiB 0.03 0.89 26 up osd.21 -105 5.00000 - 1024 GiB 300 MiB 9.5 MiB 0 B 290 MiB 1024 GiB 0.03 0.88 - host ceph-ssdnode3 22 ssd 1.00000 1.00000 1024 GiB 300 MiB 9.5 MiB 0 B 290 MiB 1024 GiB 0.03 0.88 21 up osd.22 -107 5.00000 - 1024 GiB 300 MiB 9.5 MiB 0 B 290 MiB 1024 GiB 0.03 0.88 - host ceph-ssdnode4 23 ssd 1.00000 1.00000 1024 GiB 300 MiB 9.5 MiB 0 B 290 MiB 1024 GiB 0.03 0.88 26 up osd.23 -1 24.00000 - 20 TiB 6.8 GiB 904 MiB 140 KiB 5.9 GiB 20 TiB 0.03 1.02 - root default # 默认规则都是hdd磁盘 -7 6.00000 - 5.0 TiB 1.7 GiB 241 MiB 33 KiB 1.5 GiB 5.0 TiB 0.03 1.04 - host ceph-node1 10 hdd 1.00000 1.00000 1024 GiB 339 MiB 42 MiB 10 KiB 297 MiB 1024 GiB 0.03 0.99 51 up osd.10 11 hdd 1.00000 1.00000 1024 GiB 362 MiB 62 MiB 5 KiB 300 MiB 1024 GiB 0.03 1.06 49 up osd.11 12 hdd 1.00000 1.00000 1024 GiB 347 MiB 34 MiB 5 KiB 313 MiB 1024 GiB 0.03 1.02 58 up osd.12 13 hdd 1.00000 1.00000 1024 GiB 384 MiB 74 MiB 6 KiB 310 MiB 1024 GiB 0.04 1.13 62 up osd.13 14 hdd 1.00000 1.00000 1024 GiB 338 MiB 30 MiB 7 KiB 309 MiB 1024 GiB 0.03 0.99 54 up osd.14 -3 6.00000 - 5.0 TiB 1.7 GiB 222 MiB 36 KiB 1.5 GiB 5.0 TiB 0.03 1.02 - host ceph-node2 0 hdd 1.00000 1.00000 1024 GiB 329 MiB 30 MiB 8 KiB 299 MiB 1024 GiB 0.03 0.97 47 up osd.0 1 hdd 1.00000 1.00000 1024 GiB 330 MiB 24 MiB 5 KiB 306 MiB 1024 GiB 0.03 0.97 62 up osd.1 2 hdd 1.00000 1.00000 1024 GiB 339 MiB 38 MiB 5 KiB 302 MiB 1024 GiB 0.03 0.99 51 up osd.2 3 hdd 1.00000 1.00000 1024 GiB 385 MiB 82 MiB 12 KiB 303 MiB 1024 GiB 0.04 1.13 68 up osd.3 4 hdd 1.00000 1.00000 1024 GiB 351 MiB 49 MiB 6 KiB 302 MiB 1024 GiB 0.03 1.03 55 up osd.4 -5 6.00000 - 5.0 TiB 1.7 GiB 217 MiB 32 KiB 1.5 GiB 5.0 TiB 0.03 1.01 - host ceph-node3 5 hdd 1.00000 1.00000 1024 GiB 350 MiB 38 MiB 6 KiB 312 MiB 1024 GiB 0.03 1.03 58 up osd.5 6 hdd 1.00000 1.00000 1024 GiB 334 MiB 40 MiB 13 KiB 294 MiB 1024 GiB 0.03 0.98 57 up osd.6 7 hdd 1.00000 1.00000 1024 GiB 348 MiB 53 MiB 4 KiB 295 MiB 1024 GiB 0.03 1.02 67 up osd.7 8 hdd 1.00000 1.00000 1024 GiB 331 MiB 26 MiB 6 KiB 305 MiB 1024 GiB 0.03 0.97 51 up osd.8 9 hdd 1.00000 1.00000 1024 GiB 360 MiB 61 MiB 3 KiB 299 MiB 1024 GiB 0.03 1.06 55 up osd.9 -9 6.00000 - 5.0 TiB 1.7 GiB 224 MiB 39 KiB 1.5 GiB 5.0 TiB 0.03 1.03 - host ceph-node4 15 hdd 1.00000 1.00000 1024 GiB 346 MiB 42 MiB 10 KiB 305 MiB 1024 GiB 0.03 1.02 62 up osd.15 16 hdd 1.00000 1.00000 1024 GiB 349 MiB 38 MiB 5 KiB 311 MiB 1024 GiB 0.03 1.02 64 up osd.16 17 hdd 1.00000 1.00000 1024 GiB 350 MiB 46 MiB 10 KiB 304 MiB 1024 GiB 0.03 1.03 54 up osd.17 18 hdd 1.00000 1.00000 1024 GiB 335 MiB 34 MiB 8 KiB 301 MiB 1024 GiB 0.03 0.98 66 up osd.18 19 hdd 1.00000 1.00000 1024 GiB 371 MiB 65 MiB 6 KiB 306 MiB 1024 GiB 0.04 1.09 64 up osd.19 TOTAL 24 TiB 8.0 GiB 942 MiB 150 KiB 7.1 GiB 24 TiB 0.03 MIN/MAX VAR: 0.88/1.13 STDDEV: 0.00
1.10 测试创建冷、热数据存储池
# 基于默认规则创建冷数据存储池 [root@ceph-deploy ~]#ceph osd pool create default-pool 32 32 pool 'default-pool' created # 基于ssd规则创建热数据存储池 [root@ceph-deploy ~]#ceph osd pool create test-ssd-pool 32 32 test_ssd_rule pool 'test-ssd-pool' created
1.11 验证pgp状态
- 查看冷数据pg磁盘分布
[root@ceph-deploy ~]#ceph pg ls-by-pool default-pool|awk '{print $1,$2,$15}' PG OBJECTS ACTING 14.0 0 [13,15,5]p13 14.1 0 [15,2,12]p15 14.2 0 [18,14,9]p18 14.3 0 [14,3,15]p14 14.4 0 [13,16,3]p13 14.5 0 [19,6,11]p19 14.6 0 [0,17,8]p0 14.7 0 [5,16,12]p5 14.8 0 [4,14,16]p4 14.9 0 [5,3,18]p5 14.a 0 [14,5,2]p14 14.b 0 [13,4,8]p13 14.c 0 [0,9,16]p0 14.d 0 [18,5,12]p18 14.e 0 [5,2,16]p5 14.f 0 [9,3,10]p9 14.10 0 [1,7,10]p1 14.11 0 [8,19,2]p8 14.12 0 [2,11,15]p2 14.13 0 [2,5,10]p2 14.14 0 [8,19,10]p8 14.15 0 [7,13,19]p7 14.16 0 [4,14,5]p4 14.17 0 [4,5,16]p4 14.18 0 [16,3,13]p16 14.19 0 [12,18,9]p12 14.1a 0 [17,1,6]p17 14.1b 0 [1,6,10]p1 14.1c 0 [13,0,18]p13 14.1d 0 [19,11,4]p19 14.1e 0 [7,1,15]p7 14.1f 0 [16,0,9]p16
pg全部分布在hdd磁盘上
- 查看热数据pg磁盘分布
[root@ceph-deploy ~]#ceph pg ls-by-pool test-ssd-pool|awk '{print $1,$2,$15}' PG OBJECTS ACTING 13.0 0 [21,23,20]p21 13.1 0 [22,23,20]p22 13.2 0 [21,22,23]p21 13.3 0 [20,23,21]p20 13.4 0 [23,22,20]p23 13.5 0 [23,21,20]p23 13.6 0 [20,21,23]p20 13.7 0 [23,20,21]p23 13.8 0 [21,23,20]p21 13.9 0 [23,20,21]p23 13.a 0 [21,23,20]p21 13.b 0 [22,23,21]p22 13.c 0 [22,21,23]p22 13.d 0 [23,22,21]p23 13.e 0 [23,20,22]p23 13.f 0 [21,23,22]p21 13.10 0 [23,20,22]p23 13.11 0 [21,22,20]p21 13.12 0 [23,20,21]p23 13.13 0 [20,22,21]p20 13.14 0 [20,23,22]p20 13.15 0 [23,22,21]p23 13.16 0 [23,22,20]p23 13.17 0 [21,22,23]p21 13.18 0 [23,20,21]p23 13.19 0 [22,21,20]p22 13.1a 0 [21,20,22]p21 13.1b 0 [20,21,22]p20 13.1c 0 [23,21,22]p23 13.1d 0 [22,21,20]p22 13.1e 0 [21,22,23]p21 13.1f 0 [23,20,21]p23 * NOTE: afterwards
test项目的存储池pg都分布在ssd磁盘上(osd20/21/22/23)
二、启用ceph dashboard
https://packages.debian.org/unstable/ceph-mgr-dashboard
Ceph dashboard是通过一个web界面,对已经运行的ceph集群进行状态查看以及功能配置等功能,dashboard需要安装在mgr节点。
ceph mgr是一个多插件(模块化)的组件,其组件可以单独开启或关闭(需要在deploy服务器操作)。
# mgr节点默认安装dashboard插件,若没有需手动安装 [root@ceph-mgr1 opt]#apt-cache madison ceph-mgr-dashboard [root@ceph-mgr1 opt]#apt install -y ceph-mgr-dashboard [root@ceph-mgr2 opt]#apt-cache madison ceph-mgr-dashboard [root@ceph-mgr2 opt]#apt install -y ceph-mgr-dashboard
查看mgr模块信息
[root@ceph-deploy ceph-cluster]#ceph mgr module ls { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ # 已开启的模块 "iostat", "nfs", "restful" ], "disabled_modules": [ # 已关闭的模块 { "name": "alerts", "can_run": true, "error_string": "", "module_options": { "interval": { "name": "interval", "type": "secs", "level": "advanced", "flags": 1, "default_value": "60", "min": "", "max": "", "enum_allowed": [], "desc": "How frequently to reexamine health status", "long_desc": "", "tags": [], "see_also": [] },
2.1 启用mgr dashboard模块
# ceph集群内的mgr节点都会启用dashboard [root@ceph-deploy ceph-cluster]#ceph mgr module enable dashboard
验证模块状态
[root@ceph-deploy ceph-cluster]#more module.json { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "dashboard", # dashboard模块已启用 "iostat", "nfs", "restful" ], ...
说明:模块启用后还需要配置SSL及制定监听地址、端口
2.2 配置dashboard
ceph dashboard在mgr节点进行开启设置,并且可以配置开启或关闭SSL,需对两个mgr节点都进行设置
# 关闭SSL [root@ceph-deploy ceph-cluster]#ceph config set mgr mgr/dashboard/ssl false # 配置dashboard监听地址 [root@ceph-deploy ceph-cluster]#ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 10.0.0.54 # 设置dashboard监听端口,默认端口为8080 [root@ceph-deploy ceph-cluster]#ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009 [root@ceph-deploy ceph-cluster]#ceph config set mgr mgr/dashboard/ceph-mgr2/server_port 9009
重启mgr服务
[root@ceph-mgr1 opt]#systemctl restart ceph-mgr@ceph-mgr1.service [root@ceph-mgr2 ~]#systemctl restart ceph-mgr@ceph-mgr2.service
验证集群状态
[root@ceph-deploy ceph-cluster]#ceph -s cluster: id: 28820ae5-8747-4c53-827b-219361781ada health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 3d) mgr: ceph-mgr2(active, since 56s), standbys: ceph-mgr1 mds: 2/2 daemons up, 2 standby osd: 24 osds: 24 up (since 4h), 24 in (since 4h) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 14 pools, 449 pgs objects: 455 objects, 269 MiB usage: 8.0 GiB used, 24 TiB / 24 TiB avail pgs: 449 active+clean
第一次启用dashboard插件需要等几分钟,再去被启用的节点验证
2.3 mgr节点验证端口与进程
[root@ceph-mgr1 opt]#lsof -i:9009 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 124631 ceph 22u IPv4 504873 0t0 TCP ceph-mgr1:9009 (LISTEN) [root@ceph-mgr2 ~]#lsof -i:9009 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 121573 ceph 38u IPv6 526348 0t0 TCP *:9009 (LISTEN)
2.4 dashboard访问验证
默认无用户名、密码
2.5 设置dashboard账户及密码
# 创建保存密码的文件 [root@ceph-deploy ceph-cluster]#echo "123456" > passwd.txt [root@ceph-deploy ceph-cluster]#cat passwd.txt 123456 # 创建用户名、密码 [root@ceph-deploy ceph-cluster]#ceph dashboard set-login-credentials tom -i passwd.txt ****************************************************************** *** WARNING: this command is deprecated. *** *** Please use the ac-user-* related commands to manage users. *** ****************************************************************** Username and password updated
2.6 dashboard界面
验证并访问页面
2.6.1 登录首页
2.6.2 集群信息
- 主机信息
- monitor节点
- OSD状态
- pool状态
- 块存储镜像状态
- cephFS状态
- 对象存储
网关服务状态
用户信息
bucket状态
三、通过prometheus监控ceph集群状态
3.1 部署prometheus
10.0.0.60
- 下载
mkdir /apps cd /apps wget https://github.com/prometheus/prometheus/releases/download/v2.40.7/prometheus-2.40.7.linux-amd64.tar.gz tar -xvf prometheus-2.40.7.linux-amd64.tar.gz ln -s /apps/prometheus-2.40.7.linux-amd64 /apps/prometheus
- 创建service
cat >>/etc/systemd/system/prometheus.service <<EOF [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] Restart=on-failure WorkingDirectory=/apps/prometheus/ ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml --web.enable-lifecycle [Install] WantedBy=multi-user.target EOF
- 启动服务
systemctl daemon-reload systemctl enable --now prometheus.service
- 验证访问prometheus
3.2 监控ceph node
3.2.1 部署node_exporter
在node节点安装node_exporter
- 下载
mkdir /apps cd /apps wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz tar -xvf node_exporter-1.4.0.linux-amd64.tar.gz ln -s /apps/node_exporter-1.4.0.linux-amd64 /apps/node_exporter
- 创建service
cat >>/etc/systemd/system/node-exporter.service <<EOF [Unit] Description=Prometheus Node Exporter Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] ExecStart=/apps/node_exporter/node_exporter [Install] WantedBy=multi-user.target EOF
- 启动node-exporter服务
systemctl daemon-reload systemctl enable --now node-exporter.service
- 访问验证
3.2.2 数据采集配置
- 添加监控信息
[root@prometheus apps]#cat /apps/prometheus/prometheus.yml ... scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] # 添加监控node节点 - job_name: "ceph-node" static_configs: - targets: ["10.0.0.56:9100","10.0.0.57:9100","10.0.0.58:9100","10.0.0.59:9100"]
- 重启服务
[root@prometheus apps]#systemctl restart prometheus.service
- 验证
3.3 监控ceph服务
ceph manager内部的模块中包含了prometheus的监控模块,并监听在每个manager节点的9283端口,该端口用于将采集到的信息通过http接口向prometheus提供。
https://docs.ceph.com/en/mimic/mgr/prometheus/
3.3.1 启用prometheus模块
[root@ceph-deploy ceph-cluster]#ceph mgr module enable prometheus
3.3.2 ceph mgr节点端口验证
[root@ceph-mgr1 opt]#lsof -i:9283 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 124631 ceph 21u IPv6 509936 0t0 TCP *:9283 (LISTEN) [root@ceph-mgr2 ~]#lsof -i:9283 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 121573 ceph 34u IPv6 532298 0t0 TCP *:9283 (LISTEN)
3.3.3 验证manager数据
3.3.4 采集数据配置
- 添加监控信息
[root@prometheus apps]#cat /apps/prometheus/prometheus.yml ... scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "ceph-node" static_configs: - targets: ["10.0.0.56:9100","10.0.0.57:9100","10.0.0.58:9100","10.0.0.59:9100"] # 集群服务状态 - job_name: "ceph-cluster" static_configs: - targets: ["10.0.0.54:9283","10.0.0.55:9283"]
- 重启服务
[root@prometheus apps]#systemctl restart prometheus.service
- 验证数据
3.4 grafana展示
3.4.1 部署grafana
- 下载并安装
wget https://mirrors.tuna.tsinghua.edu.cn/grafana/apt/pool/main/g/grafana-enterprise/grafana-enterprise_9.3.0_amd64.deb apt update apt-get install -y adduser libfontconfig1 dpkg -i grafana-enterprise_9.3.0_amd64.deb
- 修改grafana配置文件
vim /etc/grafana/grafana.ini ...... # 配置端口类型、地址、端口号 [server] protocol = http http_addr = 10.0.0.60 http_port = 3000
- 启动服务
systemctl enable grafana-server.service systemctl restart grafana-server.service
- 登录验证
3.4.2 数据源配置
3.4.3 导入模板
pool
7056
Ceph Cluster-2842
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?