Ceph Reef(18.2.X)集群的OSD管理基础及OSD节点扩缩容
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.ceph集群的OSD基础操作
1.查看osd的ID编号
[root@ceph141 ~]# ceph osd ls
0
1
2
3
4
5
[root@ceph141 ~]#
2.查看osd的详细信息
[root@ceph141 ~]# ceph osd dump
epoch 58
fsid c044ff3c-5f05-11ef-9d8b-51db832765d6
created 2024-08-20T15:06:28.128978+0000
modified 2024-08-20T22:48:38.568646+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 16
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client jewel
require_osd_release reef
stretch_mode_enabled false
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 21 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 6.00
max_osd 6
osd.0 up in weight 1 up_from 53 up_thru 24 down_at 52 last_clean_interval [8,48) [v2:10.0.0.141:6808/2198281705,v1:10.0.0.141:6809/2198281705] [v2:10.0.0.141:6810/2198281705,v1:10.0.0.141:6811/2198281705] exists,up aa4d4b47-b9f1-444b-bd36-3b622391ce71
osd.1 up in weight 1 up_from 53 up_thru 27 down_at 52 last_clean_interval [13,48) [v2:10.0.0.141:6800/335450708,v1:10.0.0.141:6801/335450708] [v2:10.0.0.141:6802/335450708,v1:10.0.0.141:6803/335450708] exists,up 212eab65-f6f2-41c7-9d58-2f75f86d84b2
osd.2 up in weight 1 up_from 54 up_thru 0 down_at 53 last_clean_interval [18,48) [v2:10.0.0.142:6800/163080901,v1:10.0.0.142:6801/163080901] [v2:10.0.0.142:6802/163080901,v1:10.0.0.142:6803/163080901] exists,up e0ffb1a9-ca9b-45a1-a95f-42aca94e8f47
osd.3 up in weight 1 up_from 52 up_thru 56 down_at 51 last_clean_interval [25,48) [v2:10.0.0.142:6808/2086272149,v1:10.0.0.142:6809/2086272149] [v2:10.0.0.142:6810/2086272149,v1:10.0.0.142:6811/2086272149] exists,up fda125ef-9776-47d9-baf4-9483966fe183
osd.4 up in weight 1 up_from 56 up_thru 0 down_at 55 last_clean_interval [34,48) [v2:10.0.0.143:6808/1331943799,v1:10.0.0.143:6809/1331943799] [v2:10.0.0.143:6810/1331943799,v1:10.0.0.143:6811/1331943799] exists,up a4f27770-20c9-4a75-b0c2-a212ddc7ab3f
osd.5 up in weight 1 up_from 56 up_thru 0 down_at 55 last_clean_interval [44,48) [v2:10.0.0.143:6800/3466236845,v1:10.0.0.143:6801/3466236845] [v2:10.0.0.143:6802/3466236845,v1:10.0.0.143:6803/3466236845] exists,up c6f8968f-c425-4539-ba9b-39ff08683170
blocklist 10.0.0.141:0/2920906744 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:6801/1616144223 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:6800/1616144223 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/3338469979 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/287245293 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1238275928 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/4254913043 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/4240352034 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:6801/1457839497 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6800/75389737 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1951266866 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:6800/1457839497 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:0/3710270280 expires 2024-08-21T22:48:38.568602+0000
blocklist 10.0.0.141:0/2072915682 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6801/75389737 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1341187958 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:0/1879865485 expires 2024-08-21T15:10:24.386665+0000
blocklist 10.0.0.141:6800/2392661167 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/1999918034 expires 2024-08-21T15:07:14.218755+0000
blocklist 10.0.0.141:6801/2392661167 expires 2024-08-21T15:06:48.971433+0000
blocklist 10.0.0.141:0/4277851589 expires 2024-08-21T15:10:24.386665+0000
[root@ceph141 ~]#
3.查看osd的状态信息
[root@ceph141 ~]# ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 ceph141 27.8M 199G 0 0 0 0 exists,up
1 ceph141 27.2M 299G 0 0 0 0 exists,up
2 ceph142 27.2M 199G 0 0 0 0 exists,up
3 ceph142 27.8M 299G 0 0 0 0 exists,up
4 ceph143 27.2M 299G 0 0 0 0 exists,up
5 ceph143 27.8M 199G 0 0 0 0 exists,up
[root@ceph141 ~]#
4.查看osd的统计信息
[root@ceph141 ~]# ceph osd stat
6 osds: 6 up (since 8m), 6 in (since 7h); epoch: e58
[root@ceph141 ~]#
5.查看osd在主机上的存储信息
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 1.00000 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
6.查看osd延迟的统计信息
[root@ceph141 ~]# ceph osd perf
osd commit_latency(ms) apply_latency(ms)
5 0 0
4 0 0
3 0 0
2 0 0
1 0 0
0 0 0
[root@ceph141 ~]#
7.查看各个osd使用率
[root@ceph141 ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 0.19530 1.00000 200 GiB 28 MiB 1.1 MiB 4 KiB 27 MiB 200 GiB 0.01 1.26 1 up
1 hdd 0.29300 1.00000 300 GiB 27 MiB 572 KiB 4 KiB 27 MiB 300 GiB 0.01 0.82 0 up
2 hdd 0.19530 1.00000 200 GiB 27 MiB 572 KiB 4 KiB 27 MiB 200 GiB 0.01 1.24 0 up
3 hdd 0.29300 1.00000 300 GiB 28 MiB 1.1 MiB 4 KiB 27 MiB 300 GiB 0.01 0.84 1 up
4 hdd 0.29300 1.00000 300 GiB 27 MiB 572 KiB 4 KiB 27 MiB 300 GiB 0.01 0.83 0 up
5 hdd 0.19530 1.00000 200 GiB 28 MiB 1.1 MiB 4 KiB 27 MiB 200 GiB 0.01 1.26 1 up
TOTAL 1.5 TiB 165 MiB 5.0 MiB 26 KiB 160 MiB 1.5 TiB 0.01
MIN/MAX VAR: 0.82/1.26 STDDEV: 0.00
[root@ceph141 ~]#
8.集群暂停接收数据
[root@ceph141 ~]# ceph -s
...
services:
mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 14m)
mgr: ceph141.gqogmi(active, since 14m), standbys: ceph142.tisapy
osd: 6 osds: 6 up (since 14m), 6 in (since 7h)
...
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd pause
pauserd,pausewr is set
[root@ceph141 ~]#
[root@ceph141 ~]# ceph -s
...
services:
mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 14m)
mgr: ceph141.gqogmi(active, since 14m), standbys: ceph142.tisapy
osd: 6 osds: 6 up (since 14m), 6 in (since 7h)
flags pauserd,pausewr # 注意观察,此处多了pause标签
...
[root@ceph141 ~]#
9.集群开始接收数据
[root@ceph141 ~]# ceph -s
...
services:
mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 16m)
mgr: ceph141.gqogmi(active, since 16m), standbys: ceph142.tisapy
osd: 6 osds: 6 up (since 16m), 6 in (since 7h)
flags pauserd,pausewr
...
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd unpause
pauserd,pausewr is unset
[root@ceph141 ~]#
[root@ceph141 ~]# ceph -s
...
services:
mon: 3 daemons, quorum ceph141,ceph143,ceph142 (age 16m)
mgr: ceph141.gqogmi(active, since 16m), standbys: ceph142.tisapy
osd: 6 osds: 6 up (since 16m), 6 in (since 7h)
...
[root@ceph141 ~]#
10.OSD写入权重操作
1.查看默认OSD操作权重值
[root@ceph141 ~]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0
1 hdd 0.29300 osd.1
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2
3 hdd 0.29300 osd.3
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4
5 hdd 0.19530 osd.5
[root@ceph141 ~]#
2.修改OSD数据操作权重值
[root@ceph141 ~]# ceph osd crush reweight osd.4 0 # 将一块此篇权重设置为0,表示不往该磁盘写入数据啦,一般是下线节点时会临时使用!
reweighted item id 4 name 'osd.4' to 0 in crush map
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 1.17189 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0
1 hdd 0.29300 osd.1
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2
3 hdd 0.29300 osd.3
-7 0.19530 host ceph143
4 hdd 0 osd.4
5 hdd 0.19530 osd.5
[root@ceph141 ~]#
3.测试完成后,可以将权重改回去哟~
[root@ceph141 ~]# ceph osd crush reweight osd.4 0.29300
reweighted item id 4 name 'osd.4' to 0.293 in crush map
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0
1 hdd 0.29300 osd.1
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2
3 hdd 0.29300 osd.3
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4
5 hdd 0.19530 osd.5
[root@ceph141 ~]#
11.OSD上下线
温馨提示:
- 1.由于OSD有专门的管理服务器"ceph-osd"控制,一旦发现被下线,会尝试启动它。
- 2.如果真的想要永久关闭,则需要关闭对应的ceph-osd进程即可,例如"ceph-osd@4";
1.临时关闭osd会自动被拉起
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 1.00000 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd down 4 ; ceph osd tree
marked down osd.4.
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 down 1.00000 1.00000 # 注意观察,此处osd已经关闭啦。
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 1.00000 1.00000 # 但不难发现,其会自动重启!
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
2.永久关闭
[root@ceph141 ~]# ceph orch daemon stop osd.3 # 直接停止osd.3的守护进程
Scheduled to stop osd.3 on host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch daemon stop osd.5 # 直接停止osd.5的守护进程
Scheduled to stop osd.5 on host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-7 0.48830 host ceph143
3 hdd 0.29300 osd.3 down 1.00000 1.00000
5 hdd 0.19530 osd.5 down 1.00000 1.00000
[root@ceph141 ~]#
12.驱逐OSD设备
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 1.00000 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd out 4 # 驱逐编号为4的OSD设备
marked out osd.4.
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 0 1.00000 # 本质上是对ceph集群数据操作的权重值REWEIGHT重新调整。
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
13.加入OSD设备
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 0 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd in 4 # 将编号为4的设备重新加入节点。
marked in osd.4.
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
3 hdd 0.29300 osd.3 up 1.00000 1.00000
-7 0.48830 host ceph143
4 hdd 0.29300 osd.4 up 1.00000 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
二.ceph集群的OSD节点缩容
1.删除OSD设备基本流程
ceph-deploy删除OSD设备时建议遵循如下流程:
- 1.到指定节点上,停止指定的OSD进程【选做】;
- 2.清理OSD数据【选做】;
- 3.从crush中移除OSD节点,该节点不作为数据的载体【必做】;
- 4.驱逐被下线节点【必做】;
- 5.下线OSD节点,即删除OSD节点主机【必做】;
- 6.客户端解除ceph对磁盘的占用【必做】;
温馨提示:
前2个步骤可以省略哟。
2.删除OSD实战案例
2.1 卸载集群前状态查看
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-7 0.48830 host ceph143
3 hdd 0.29300 osd.3 up 1.00000 1.00000
5 hdd 0.19530 osd.5 up 1.00000 1.00000
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph device ls
DEVICE HOST:DEV DAEMONS WEAR LIFE EXPECTANCY
ATA_VBOX_HARDDISK_VB44d8d962-22b2507e ceph142:sdb osd.2
ATA_VBOX_HARDDISK_VB586591eb-921dc802 ceph143:sdc osd.5
ATA_VBOX_HARDDISK_VB7b0f012c-688b1185 ceph142:sdc osd.4
ATA_VBOX_HARDDISK_VB7ddbae3f-13ea8edd ceph143:sda mon.ceph143
ATA_VBOX_HARDDISK_VB7f99f134-d6f80b2c ceph141:sdb osd.0
ATA_VBOX_HARDDISK_VB8587e457-f6eca36a ceph141:sdc osd.1
ATA_VBOX_HARDDISK_VBab58677d-fb9dc89f ceph143:sdb osd.3
ATA_VBOX_HARDDISK_VBbcff97b3-3bc2fb47 ceph141:sda mon.ceph141
ATA_VBOX_HARDDISK_VBe309cee1-15dd71d4 ceph142:sda mon.ceph142
[root@ceph141 ~]#
[root@ceph141 ~]# ceph -s
cluster:
id: c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
health: HEALTH_WARN
clock skew detected on mon.ceph142
services:
mon: 3 daemons, quorum ceph141,ceph142,ceph143 (age 2m)
mgr: ceph141.fuztcs(active, since 111s), standbys: ceph142.vdsfzv
osd: 6 osds: 6 up (since 115s), 6 in (since 2h)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 565 MiB used, 1.5 TiB / 1.5 TiB avail
pgs: 1 active+clean
[root@ceph141 ~]#
2.2 停止需要下线节点的所有osd守护进程
[root@ceph141 ~]# ceph orch daemon stop osd.3
Scheduled to stop osd.3 on host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch daemon stop osd.5
Scheduled to stop osd.5 on host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.46489 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-7 0.48830 host ceph143 # 停止后,不难发现设备已经处于down的状态啦~
3 hdd 0.29300 osd.3 down 1.00000 1.00000
5 hdd 0.19530 osd.5 down 1.00000 1.00000
[root@ceph141 ~]#
2.3 清理OSD数据和配置
[root@ceph141 ~]# ceph osd purge 3 --force
purged osd.3
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd purge 5 --force
purged osd.5
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.97659 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-7 0 host ceph143
[root@ceph141 ~]#
2.4 移除所有OSD后,从CRUSH map中删除主机
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.97659 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
-7 0 host ceph143
[root@ceph141 ~]#
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd crush rm ceph143
removed item id -7 name 'ceph143' from crush map
[root@ceph141 ~]#
[root@ceph141 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.97659 root default
-3 0.48830 host ceph141
0 hdd 0.19530 osd.0 up 1.00000 1.00000
1 hdd 0.29300 osd.1 up 1.00000 1.00000
-5 0.48830 host ceph142
2 hdd 0.19530 osd.2 up 1.00000 1.00000
4 hdd 0.29300 osd.4 up 1.00000 1.00000
[root@ceph141 ~]#
2.5 自动驱逐被下线节点的服务组件
[root@ceph141 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph141 10.0.0.141 _admin
ceph142 10.0.0.142
ceph143 10.0.0.143
3 hosts in cluster
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type id
-------------------- ---------------
node-exporter ceph143
ceph-exporter ceph143
osd 5
osd 3
mon ceph143
crash ceph143
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host drain ceph143 # 发现2个OSD是无法自动驱逐的
Scheduled to remove the following daemons from host 'ceph143'
type id
-------------------- ---------------
osd 5
osd 3
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph141 10.0.0.141 _admin
ceph142 10.0.0.142
ceph143 10.0.0.143 _no_schedule,_no_conf_keyring
3 hosts in cluster
[root@ceph141 ~]#
[root@ceph141 ~]# ceph -s
cluster:
id: c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
health: HEALTH_OK
services:
mon: 2 daemons, quorum ceph141,ceph142 (age 39s)
mgr: ceph141.fuztcs(active, since 13m), standbys: ceph142.vdsfzv
osd: 4 osds: 4 up (since 9m), 4 in (since 2h); 1 remapped pgs
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 111 MiB used, 1000 GiB / 1000 GiB avail
pgs: 2/6 objects misplaced (33.333%)
1 active+clean+remapped
[root@ceph141 ~]#
[root@ceph141 ~]# ceph -s
cluster:
id: c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4
health: HEALTH_OK
services:
mon: 2 daemons, quorum ceph141,ceph142 (age 39s)
mgr: ceph141.fuztcs(active, since 13m), standbys: ceph142.vdsfzv
osd: 4 osds: 4 up (since 9m), 4 in (since 2h); 1 remapped pgs
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 111 MiB used, 1000 GiB / 1000 GiB avail
pgs: 2/6 objects misplaced (33.333%)
1 active+clean+remapped
[root@ceph141 ~]#
2.6 下线节点
1.手动删除停止的OSD组件
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type id
-------------------- ---------------
osd 5
osd 3
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch daemon rm osd.3 --force
Removed osd.3 from host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch daemon rm osd.5 --force
Removed osd.5 from host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host drain ceph143
Scheduled to remove the following daemons from host 'ceph143'
type id
-------------------- ---------------
[root@ceph141 ~]#
[root@ceph141 ~]#
2.删除节点
[root@ceph141 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph141 10.0.0.141 _admin
ceph142 10.0.0.142
ceph143 10.0.0.143 _no_schedule,_no_conf_keyring
3 hosts in cluster
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host rm ceph143
Removed host 'ceph143'
[root@ceph141 ~]#
[root@ceph141 ~]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph141 10.0.0.141 _admin
ceph142 10.0.0.142
2 hosts in cluster
[root@ceph141 ~]#
3.再次查看设备信息,很明显,没有ceph143节点啦~
[root@ceph141 ~]# ceph device ls
DEVICE HOST:DEV DAEMONS WEAR LIFE EXPECTANCY
ATA_VBOX_HARDDISK_VB44d8d962-22b2507e ceph142:sdb osd.2
ATA_VBOX_HARDDISK_VB7b0f012c-688b1185 ceph142:sdc osd.4
ATA_VBOX_HARDDISK_VB7f99f134-d6f80b2c ceph141:sdb osd.0
ATA_VBOX_HARDDISK_VB8587e457-f6eca36a ceph141:sdc osd.1
ATA_VBOX_HARDDISK_VBbcff97b3-3bc2fb47 ceph141:sda mon.ceph141
ATA_VBOX_HARDDISK_VBe309cee1-15dd71d4 ceph142:sda mon.ceph142
[root@ceph141 ~]#
2.7 客户端解除ceph对磁盘的占用
1.查看客户端的设备信息
[root@ceph143 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
...
sdb 8:16 0 300G 0 disk
└─ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512
253:0 0 300G 0 lvm
sdc 8:32 0 200G 0 disk
└─ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4
253:1 0 200G 0 lvm
...
[root@ceph143 ~]#
2.查看本地的OSD编号和对应的磁盘设备对应关系
[root@ceph143 ~]# cat /var/lib/ceph/c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4/osd.3/fsid
0065bf70-6947-4b17-86ed-c1d902120512
[root@ceph143 ~]#
[root@ceph143 ~]# cat /var/lib/ceph/c0ed6ca0-5fbc-11ef-9ff6-cf3a9f02b0d4/osd.5/fsid
60ac4910-12bc-4f21-9d89-2cfd48ed0cb4
[root@ceph143 ~]#
3.查看ceph占用磁盘的信息编号
[root@ceph143 ~]# dmsetup status
ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4: 0 419422208 linear
ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512: 0 629137408 linear
ubuntu--vg-ubuntu--lv: 0 50323456 linear
[root@ceph143 ~]#
4.客户端解除ceph对磁盘的占用
[root@ceph143 ~]# dmsetup remove ceph--73509ee8--226b--4cb7--b35c--40b163d6aff6-osd--block--60ac4910--12bc--4f21--9d89--2cfd48ed0cb4
[root@ceph143 ~]#
[root@ceph143 ~]# dmsetup remove ceph--f10a734e--4198--4020--ba11--63c4dcb9e62d-osd--block--0065bf70--6947--4b17--86ed--c1d902120512
[root@ceph143 ~]#
5.再次查看本地磁盘设备,观察是否解除占用
[root@ceph143 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
...
sdb 8:16 0 300G 0 disk
sdc 8:32 0 200G 0 disk
...
[root@ceph143 ~]#
2.8 OSD节点下线流程推敲
通过上面的实验,你是否发现下线一个节点很简单,没错还可以更简单,也就是2-3步骤是可以省略的哟。
建议下线节点前拍个快照,如果没有做快照,就看我课堂演示吧。
推荐阅读:
https://docs.redhat.com/zh_hans/documentation/red_hat_ceph_storage/4/html/operations_guide/removing-a-ceph-osd-node_ops
https://docs.redhat.com/zh_hans/documentation/red_hat_ceph_storage/4/html-single/operations_guide/index#replacing-a-bluestore-database-disk-using-the-command-line-interface_ops
三.ceph集群的OSD节点扩容
1.添加OSD设备的基本流程
添加OSD设备的流程如下:
- 1.确定OSD节点的设备是否被占用;
- 2.擦出或者格式化OSD的数据【可选】;
- 3.添加OSD到集群;
温馨提示:
当OSD加入到集群的时候,它会自动为OSD所在的主机创建一个专属的fsid编号: "/var/lib/ceph/osd/${CEPH-CLUSTER-ID}/${osd-ID}/fsid"
2.扩容实战案例
略,参考之前的笔记即可。
推荐阅读:
https://www.cnblogs.com/yinzhengjie/p/18370686#5ceph集群添加或移除主机
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。
欢迎交流学习技术交流,个人微信: "JasonYin2020"(添加时请备注来源及意图备注)
作者: 尹正杰, 博客: https://www.cnblogs.com/yinzhengjie/p/18370804