ceph（四）ceph集群管理、pg常见状态总结

1. ceph常见管理命令总结

1.1 只显示存储池

ceph osd pool ls

示例

$ ceph osd pool ls
device_health_metrics
mypool
myrbd1
rbd-data1

1.2 列出存储池并显示id

ceph osd lspools

示例

$ ceph osd lspools
1 device_health_metrics
2 mypool
3 myrbd1
4 rbd-data1

1.3 查看pg状态

ceph pg stat

示例

$ ceph pg stat
97 pgs: 97 active+clean; 43 MiB data, 5.9 GiB used, 20 TiB / 20 TiB avail
cephadmin@ceph-deploy:~$ ceph osd pool stats mypool
pool mypool id 2
  nothing is going on

1.4 查看指定pool或所有pool的状态

ceph osd pool stats

示例

cephadmin@ceph-deploy:~$ ceph osd pool stats mypool
pool mypool id 2
  nothing is going on

cephadmin@ceph-deploy:~$ ceph osd pool stats
pool device_health_metrics id 1
  nothing is going on

pool mypool id 2
  nothing is going on

pool myrbd1 id 3
  nothing is going on

pool rbd-data1 id 4
  nothing is going on

1.5 查看集群存储状态

cephadmin@ceph-deploy:~$ ceph df
--- RAW STORAGE ---
CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED
hdd    20 TiB  20 TiB  5.9 GiB   5.9 GiB       0.03
TOTAL  20 TiB  20 TiB  5.9 GiB   5.9 GiB       0.03
 
--- POOLS ---
POOL                   ID  PGS  STORED  OBJECTS    USED  %USED  MAX AVAIL
device_health_metrics   1    1     0 B        0     0 B      0    6.3 TiB
mypool                  2   32     0 B        0     0 B      0    6.3 TiB
myrbd1                  3   32    19 B        3  12 KiB      0    6.3 TiB
rbd-data1               4   32  11 MiB       74  33 MiB      0    6.3 TiB

1.6 查看集群存储状态详情

ceph df detail

示例

$ ceph df detail
--- RAW STORAGE ---
CLASS    SIZE   AVAIL     USED  RAW USED  %RAW USED
hdd    20 TiB  20 TiB  5.9 GiB   5.9 GiB       0.03
TOTAL  20 TiB  20 TiB  5.9 GiB   5.9 GiB       0.03
 
--- POOLS ---
POOL                   ID  PGS  STORED  (DATA)  (OMAP)  OBJECTS    USED  (DATA)  (OMAP)  %USED  MAX AVAIL  QUOTA OBJECTS  QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
device_health_metrics   1    1     0 B     0 B     0 B        0     0 B     0 B     0 B      0    6.3 TiB            N/A          N/A    N/A         0 B          0 B
mypool                  2   32     0 B     0 B     0 B        0     0 B     0 B     0 B      0    6.3 TiB            N/A          N/A    N/A         0 B          0 B
myrbd1                  3   32    19 B    19 B     0 B        3  12 KiB  12 KiB     0 B      0    6.3 TiB            N/A          N/A    N/A         0 B          0 B
rbd-data1               4   32  11 MiB  11 MiB     0 B       74  33 MiB  33 MiB     0 B      0    6.3 TiB            N/A          N/A    N/A         0 B          0 B

1.7 查看osd状态

ceph osd stat

示例

$ ceph osd stat
20 osds: 20 up (since 3h), 20 in (since 2d); epoch: e302

1.8 显示osd底层详细信息

ceph osd dump

示例

$ ceph osd dump
epoch 302
fsid 28820ae5-8747-4c53-827b-219361781ada
created 2023-09-21T02:58:34.034362+0800
modified 2023-09-24T04:18:36.462497+0800
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 62
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release pacific
stretch_mode_enabled false
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 182 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr_devicehealth
pool 2 'mypool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 150 flags hashpspool stripe_width 0
pool 3 'myrbd1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 203 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 4 'rbd-data1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 299 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
max_osd 20
osd.0 up   in  weight 1 up_from 251 up_thru 291 down_at 248 last_clean_interval [184,250) [v2:10.0.0.57:6800/2618,v1:10.0.0.57:6801/2618] [v2:192.168.10.57:6816/1002618,v1:192.168.10.57:6817/1002618] exists,up 886cced3-abd0-47ba-9ecb-fff72fe44a5a
osd.1 up   in  weight 1 up_from 218 up_thru 291 down_at 216 last_clean_interval [184,217) [v2:10.0.0.57:6816/2633,v1:10.0.0.57:6817/2633] [v2:192.168.10.57:6804/1002633,v1:192.168.10.57:6805/1002633] exists,up 9774613d-32ca-4250-ab1a-f51db165a56f
osd.2 up   in  weight 1 up_from 259 up_thru 289 down_at 258 last_clean_interval [184,258) [v2:10.0.0.57:6812/2625,v1:10.0.0.57:6813/2625] [v2:192.168.10.57:6806/1002625,v1:192.168.10.57:6807/1002625] exists,up 15dc3bc8-0c06-410b-ac09-1a81e3cf207c
osd.3 up   in  weight 1 up_from 255 up_thru 291 down_at 253 last_clean_interval [184,254) [v2:10.0.0.57:6804/2630,v1:10.0.0.57:6806/2630] [v2:192.168.10.57:6800/2002630,v1:192.168.10.57:6801/2002630] exists,up 12accf18-5fb9-4645-b1ac-33d988db6753
osd.4 up   in  weight 1 up_from 270 up_thru 291 down_at 267 last_clean_interval [184,269) [v2:10.0.0.57:6805/2634,v1:10.0.0.57:6807/2634] [v2:192.168.10.57:6812/2002634,v1:192.168.10.57:6813/2002634] exists,up 7b2cbcbe-c572-4539-85ce-9df729be1e13
osd.5 up   in  weight 1 up_from 284 up_thru 288 down_at 281 last_clean_interval [187,283) [v2:10.0.0.58:6809/2645,v1:10.0.0.58:6811/2645] [v2:192.168.10.58:6804/3002645,v1:192.168.10.58:6805/3002645] exists,up bc8dfb8a-7906-49dc-a6da-89e626a24e4e
osd.6 up   in  weight 1 up_from 243 up_thru 291 down_at 237 last_clean_interval [187,242) [v2:10.0.0.58:6806/2647,v1:10.0.0.58:6807/2647] [v2:192.168.10.58:6820/2002647,v1:192.168.10.58:6821/2002647] exists,up f23ec8f1-56da-41c7-b8ed-837100612e26
osd.7 up   in  weight 1 up_from 243 up_thru 291 down_at 226 last_clean_interval [187,242) [v2:10.0.0.58:6800/2643,v1:10.0.0.58:6801/2643] [v2:192.168.10.58:6816/2002643,v1:192.168.10.58:6817/2002643] exists,up b6e9dfb1-1ab4-40b5-ba3e-fea590753f79
osd.8 up   in  weight 1 up_from 243 up_thru 291 down_at 234 last_clean_interval [187,242) [v2:10.0.0.58:6804/2651,v1:10.0.0.58:6805/2651] [v2:192.168.10.58:6809/2002651,v1:192.168.10.58:6811/2002651] exists,up fda2c1d3-acab-4c3b-a94c-089a585a76fe
osd.9 up   in  weight 1 up_from 286 up_thru 291 down_at 281 last_clean_interval [187,285) [v2:10.0.0.58:6816/2648,v1:10.0.0.58:6817/2648] [v2:192.168.10.58:6806/3002648,v1:192.168.10.58:6807/3002648] exists,up 2ca3d89a-2b5c-45ff-94a4-c87960f94533
osd.10 up   in  weight 1 up_from 271 up_thru 291 down_at 267 last_clean_interval [174,270) [v2:10.0.0.56:6805/1496,v1:10.0.0.56:6806/1496] [v2:192.168.10.56:6816/1001496,v1:192.168.10.56:6817/1001496] exists,up 10cd2d83-482f-4687-93ae-fe5e172694a1
osd.11 up   in  weight 1 up_from 264 up_thru 294 down_at 261 last_clean_interval [175,263) [v2:10.0.0.56:6807/1493,v1:10.0.0.56:6808/1493] [v2:192.168.10.56:6805/2001493,v1:192.168.10.56:6806/2001493] exists,up 57a2c73e-2629-4c8c-a9c5-e798d07cd332
osd.12 up   in  weight 1 up_from 256 up_thru 294 down_at 253 last_clean_interval [174,255) [v2:10.0.0.56:6816/1497,v1:10.0.0.56:6817/1497] [v2:192.168.10.56:6809/2001497,v1:192.168.10.56:6811/2001497] exists,up e282d88f-4937-4f0d-bba8-960dd0f0a26d
osd.13 up   in  weight 1 up_from 259 up_thru 291 down_at 258 last_clean_interval [174,258) [v2:10.0.0.56:6800/1494,v1:10.0.0.56:6809/1494] [v2:192.168.10.56:6820/2001494,v1:192.168.10.56:6821/2001494] exists,up 36287ce5-ed8a-4494-84dd-a8a739139f90
osd.14 up   in  weight 1 up_from 255 up_thru 291 down_at 253 last_clean_interval [174,254) [v2:10.0.0.56:6801/1495,v1:10.0.0.56:6802/1495] [v2:192.168.10.56:6800/2001495,v1:192.168.10.56:6801/2001495] exists,up 444174d7-8cd5-4567-bab3-247506d981a2
osd.15 up   in  weight 1 up_from 288 up_thru 291 down_at 276 last_clean_interval [179,287) [v2:10.0.0.59:6804/1462,v1:10.0.0.59:6805/1462] [v2:192.168.10.59:6816/2001462,v1:192.168.10.59:6817/2001462] exists,up c3a323cd-652f-4b70-9de2-e290547a3df0
osd.16 up   in  weight 1 up_from 289 up_thru 294 down_at 279 last_clean_interval [178,288) [v2:10.0.0.59:6800/1463,v1:10.0.0.59:6801/1463] [v2:192.168.10.59:6804/2001463,v1:192.168.10.59:6805/2001463] exists,up 485600b0-d305-49ef-84e4-112bab8f6cc2
osd.17 up   in  weight 1 up_from 288 up_thru 291 down_at 279 last_clean_interval [179,287) [v2:10.0.0.59:6808/1461,v1:10.0.0.59:6809/1461] [v2:192.168.10.59:6808/2001461,v1:192.168.10.59:6809/2001461] exists,up d4796af7-b353-4bea-a8a2-5e4f8ff262aa
osd.18 up   in  weight 1 up_from 288 up_thru 291 down_at 279 last_clean_interval [179,287) [v2:10.0.0.59:6812/1464,v1:10.0.0.59:6813/1464] [v2:192.168.10.59:6820/2001464,v1:192.168.10.59:6821/2001464] exists,up 93c30827-b7a0-4612-93b7-5b04618869a2
osd.19 up   in  weight 1 up_from 289 up_thru 289 down_at 276 last_clean_interval [179,288) [v2:10.0.0.59:6816/1465,v1:10.0.0.59:6817/1465] [v2:192.168.10.59:6812/2001465,v1:192.168.10.59:6813/2001465] exists,up 218b659a-0082-4760-8997-419d4b4b11c2
pg_upmap_items 4.6 [7,5]
pg_upmap_items 4.b [7,5]
pg_upmap_items 4.1f [1,0]
blocklist 10.0.0.65:0/3630118219 expires 2023-09-24T05:18:35.693210+0800
blocklist 10.0.0.55:6800/990 expires 2023-09-24T23:50:36.409813+0800
blocklist 10.0.0.55:0/4018866837 expires 2023-09-24T23:50:36.409813+0800
blocklist 10.0.0.55:6801/990 expires 2023-09-24T23:50:36.409813+0800
blocklist 10.0.0.55:0/403645778 expires 2023-09-24T23:50:36.409813+0800
blocklist 10.0.0.55:0/1925626704 expires 2023-09-24T23:50:36.409813+0800

1.9 显示osd和节点对应关系

ceph osd tree

示例：

查看osd对应的硬盘，如osd.13在node1节点上

$ ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         20.00000  root default                            
-7          5.00000      host ceph-node1                     
10    hdd   1.00000          osd.10           up   1.00000  1.00000
11    hdd   1.00000          osd.11           up   1.00000  1.00000
12    hdd   1.00000          osd.12           up   1.00000  1.00000
13    hdd   1.00000          osd.13           up   1.00000  1.00000
14    hdd   1.00000          osd.14           up   1.00000  1.00000
-3          5.00000      host ceph-node2                     
 0    hdd   1.00000          osd.0            up   1.00000  1.00000
 1    hdd   1.00000          osd.1            up   1.00000  1.00000
 2    hdd   1.00000          osd.2            up   1.00000  1.00000
 3    hdd   1.00000          osd.3            up   1.00000  1.00000
 4    hdd   1.00000          osd.4            up   1.00000  1.00000
-5          5.00000      host ceph-node3                     
 5    hdd   1.00000          osd.5            up   1.00000  1.00000
 6    hdd   1.00000          osd.6            up   1.00000  1.00000
 7    hdd   1.00000          osd.7            up   1.00000  1.00000
 8    hdd   1.00000          osd.8            up   1.00000  1.00000
 9    hdd   1.00000          osd.9            up   1.00000  1.00000
-9          5.00000      host ceph-node4                     
15    hdd   1.00000          osd.15           up   1.00000  1.00000
16    hdd   1.00000          osd.16           up   1.00000  1.00000
17    hdd   1.00000          osd.17           up   1.00000  1.00000
18    hdd   1.00000          osd.18           up   1.00000  1.00000
19    hdd   1.00000          osd.19           up   1.00000  1.00000

到osd对应的node节点查看与osd对应的硬盘

# 查看发现osd.13对应的硬盘为sde
[root@ceph-node1 ~]#ll /var/lib/ceph/osd/ceph-13/block 
lrwxrwxrwx 1 ceph ceph 93 Sep 23 23:49 /var/lib/ceph/osd/ceph-13/block -> /dev/ceph-e54d4fb9-6f42-4e06-96a3-330813cf9342/osd-block-36287ce5-ed8a-4494-84dd-a8a739139f90

[root@ceph-node1 ~]#lsblk -f|grep -B1 ceph
sdb                                                                                                   LVM2_member          96sQIt-6luz-rxwI-DevT-TTSX-gwPw-0akFDx            
└─ceph--be8feab1--4bc8--44b8--9394--2eb42fca07fe-osd--block--10cd2d83--482f--4687--93ae--fe5e172694a1 ceph_bluestore                                                         
...                                                        
sde                                                                                                   LVM2_member          jGHnbR-WmNT-8mZW-6pXD-8ZR6-xXqL-23ACnR            
└─ceph--e54d4fb9--6f42--4e06--96a3--330813cf9342-osd--block--36287ce5--ed8a--4494--84dd--a8a739139f90 ceph_bluestore

1.10 显示osd存储信息和节点对应关系

ceph osd df tree

示例

$ ceph osd df tree
ID  CLASS  WEIGHT    REWEIGHT  SIZE      RAW USE  DATA     OMAP    META     AVAIL     %USE  VAR   PGS  STATUS  TYPE NAME    
-1         20.00000         -    20 TiB  5.9 GiB  117 MiB     0 B  5.8 GiB    20 TiB  0.03  1.00    -          root default   
-7          5.00000         -   5.0 TiB  1.5 GiB   32 MiB     0 B  1.5 GiB   5.0 TiB  0.03  1.01    -              host ceph-node1
10    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.2 MiB     0 B  295 MiB  1024 GiB  0.03  0.99    9      up          osd.10   
11    hdd   1.00000   1.00000  1024 GiB  303 MiB  8.2 MiB     0 B  295 MiB  1024 GiB  0.03  1.00   11      up          osd.11   
12    hdd   1.00000   1.00000  1024 GiB  303 MiB  4.3 MiB     0 B  299 MiB  1024 GiB  0.03  1.00   20      up          osd.12   
13    hdd   1.00000   1.00000  1024 GiB  310 MiB  7.0 MiB     0 B  303 MiB  1024 GiB  0.03  1.03   22      up          osd.13   
14    hdd   1.00000   1.00000  1024 GiB  303 MiB  8.2 MiB     0 B  295 MiB  1024 GiB  0.03  1.00   12      up          osd.14   
-3          5.00000         -   5.0 TiB  1.5 GiB   27 MiB     0 B  1.4 GiB   5.0 TiB  0.03  1.00    -              host ceph-node2
 0    hdd   1.00000   1.00000  1024 GiB  303 MiB  8.2 MiB     0 B  295 MiB  1024 GiB  0.03  1.00    8      up          osd.0  
 1    hdd   1.00000   1.00000  1024 GiB  305 MiB  6.3 MiB     0 B  299 MiB  1024 GiB  0.03  1.01   21      up          osd.1  
 2    hdd   1.00000   1.00000  1024 GiB  295 MiB  4.2 MiB     0 B  291 MiB  1024 GiB  0.03  0.98    6      up          osd.2  
 3    hdd   1.00000   1.00000  1024 GiB  307 MiB  4.3 MiB     0 B  303 MiB  1024 GiB  0.03  1.02   20      up          osd.3  
 4    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.3 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   11      up          osd.4  
-5          5.00000         -   5.0 TiB  1.5 GiB   28 MiB     0 B  1.4 GiB   5.0 TiB  0.03  1.00    -              host ceph-node3
 5    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.2 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   11      up          osd.5  
 6    hdd   1.00000   1.00000  1024 GiB  303 MiB  8.2 MiB     0 B  295 MiB  1024 GiB  0.03  1.00   15      up          osd.6  
 7    hdd   1.00000   1.00000  1024 GiB  302 MiB  6.4 MiB     0 B  295 MiB  1024 GiB  0.03  1.00   19      up          osd.7  
 8    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.2 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   16      up          osd.8  
 9    hdd   1.00000   1.00000  1024 GiB  300 MiB  4.9 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   11      up          osd.9  
-9          5.00000         -   5.0 TiB  1.5 GiB   30 MiB     0 B  1.4 GiB   5.0 TiB  0.03  1.00    -              host ceph-node4
15    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.2 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   16      up          osd.15   
16    hdd   1.00000   1.00000  1024 GiB  299 MiB  4.2 MiB     0 B  295 MiB  1024 GiB  0.03  0.99   11      up          osd.16   
17    hdd   1.00000   1.00000  1024 GiB  303 MiB  8.2 MiB     0 B  295 MiB  1024 GiB  0.03  1.00   19      up          osd.17   
18    hdd   1.00000   1.00000  1024 GiB  307 MiB  8.4 MiB     0 B  299 MiB  1024 GiB  0.03  1.02   17      up          osd.18   
19    hdd   1.00000   1.00000  1024 GiB  301 MiB  4.9 MiB     0 B  296 MiB  1024 GiB  0.03  1.00   16      up          osd.19   
                        TOTAL    20 TiB  5.9 GiB  117 MiB  19 KiB  5.8 GiB    20 TiB  0.03                                  
MIN/MAX VAR: 0.98/1.03  STDDEV: 0

1.11 查看mon节点状态

ceph mon stat

示例

$ ceph mon stat
e3: 3 mons at {ceph-mon1=[v2:10.0.0.51:3300/0,v1:10.0.0.51:6789/0],ceph-mon2=[v2:10.0.0.52:3300/0,v1:10.0.0.52:6789/0],ceph-mon3=[v2:10.0.0.53:3300/0,v1:10.0.0.53:6789/0]} removed_ranks: {}, election epoch 30, leader 0 ceph-mon1, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3

1.12 查看mon节点的dump信息

ceph mon dump

示例

$ ceph mon dump
epoch 3
fsid 28820ae5-8747-4c53-827b-219361781ada
last_changed 2023-09-21T04:46:48.910442+0800
created 2023-09-21T02:58:33.478584+0800
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:10.0.0.51:3300/0,v1:10.0.0.51:6789/0] mon.ceph-mon1
1: [v2:10.0.0.52:3300/0,v1:10.0.0.52:6789/0] mon.ceph-mon2
2: [v2:10.0.0.53:3300/0,v1:10.0.0.53:6789/0] mon.ceph-mon3
dumped monmap epoch 3

2. ceph集群停止或重启步骤

重启之前，提前设置ceph集群将osd标记为noout，避免node节点关闭服务后踢出ceph集群

# 关闭服务前设置noout
cephadmin@ceph-deploy:~$ ceph osd set noout
noout is set

cephadmin@ceph-deploy:~$ ceph osd stat
20 osds: 20 up (since 13h), 20 in (since 3d); epoch: e304
flags noout


# 启动服务后取消noout
cephadmin@ceph-deploy:~$ ceph osd unset noout
noout is unset
cephadmin@ceph-deploy:~$ ceph osd stat
20 osds: 20 up (since 13h), 20 in (since 3d); epoch: e305

2.1 关闭顺序

关闭服务前设置noout
关闭存储客户端停止读写数据
如果使用RGW，关闭RGW
关闭cephfs元数据服务
关闭ceph osd
关闭ceph manager
关闭ceph monitor

2.2 启动顺序

启动ceph monitor
启动ceph manager
启动ceph osd
启动cephfs 元数据服务
启动RGW
启动存储客户端
启动服务后取消noout

2.3 添加服务器

添加ceph仓库源

安装ceph服务

ceph-deploy install --release pacific {ceph-nodeX}

擦除磁盘

ceph-deploy disk zap {ceph-nodeX} {/dev/sdx}

添加osd

ceph-deploy osd create {ceph-nodeX} --data {/dev/sdx}

2.4 删除OSD或服务器

把故障OSD从ceph集群删除

把osd踢出集群
```
ceph osd out osd.{id}
```
等一段时间
进入对应node节点，停止osd.{id}进程
```
systemctl stop ceph-osd@{id}.service
```
删除osd
```
ceph osd rm osd.{id}
```

‍

2.5 删除服务器

删除服务器之前要把服务器上OSD先停止并从ceph集群移除

把osd踢出集群
等一段时间
进入对应node节点，停止osd.{id}进程
删除osd
重复上述步骤，删除该node节点上所有osd
osd全部操作完成后下线主机
从crush删除ceph-nodeX节点
```
ceph osd crush rm ceph ceph-nodeX
```

3. pg常见状态总结

PG的常见状态如下：

3.1 peering

正在同步状态，同一个PG中的OSD需要将准备数据同步一致，而peering（对等）就是OSD同步过程中的状态。

3.2 activating

Peering 已经完成，PG正在等待所有PG实例同步Peering的结果(Info、Log等)

3.3 clean

干净态，PG当前不存在待修复的对象，并且大小等于存储池的副本数，即PG的活动集(Acting Set)和上行集(Up Set)为同一组OSD且内容一致。
活动集(Acting Set)：由PG当前主的OSD和其余处于活动状态的备用OSD组成，当前PG内的OSD负责处理用户的读写请求。
上行集(Up Set)：在某一个OSD故障时，需要将故障的OSD更换为可用的OSD，并主PG内部的主OSD同步数据到新的OSD上，例如PG内有OSD1、OSD2、OSD3，当OSD3故障后需要用OSD4替换OSD3，那么OSD1、OSD2、OSD3就是上行集，替换后OSD1,OSD2、OSD4就是活动集，OSD替换完成后活动集最终要替换上行集。

3.4 active

就绪状态或活跃状态，Active表示主OSD和备OSD处于正常工作状态，此时的PG可以正常处理来自客户端的读写请求，正常的PG默认就是Active+Clean状态。

cephadmin@ceph-deploy:~$ ceph pg stat
97 pgs: 97 active+clean; 43 MiB data, 5.9 GiB used, 20 TiB / 20 TiB avail

3.5 degraded

降级状态，该状态出现于OSD被标记为down以后，那么其他映射到此OSD的PG都会转换到降级状态。
如果此OSD还能重新启动完成并完成Peering操作后,那么使用此OSD的PG将重新恢复为clean状态。
如果此OSD被标记为down的时间超过5分钟还没有修复，那么此OSD将会被ceph踢出集群，然后ceph会对被降级的PG启动恢复操作，直到所有由于此OSD而被降级的PG重新恢复为clean状态。
恢复数据会从PG内的主OSD恢复，如果是主OSD故障，那么会在剩下的两个备用OSD重新选择一个作为主OSD。

3.6 stale

过期状态，正常情况下每个主OSD都要周期性的向RADOS集群中的监视器(Mon)报告其作为主OSD所持有的所有PG的最新统计数据，因任何原因导致某个OSD无法正常向监视器发送汇报信息的、或者由其他OSD报告某个OSD已经down 的时候，则所有以此OSD为主PG则会立即被标记为stale状态，即他们的主OSD已经不是最新的数据了，如果是备份的OSD发送down的时候，则ceph会执行修复而不会触发PG状态转换为stale状态。

3.7 undersized

小于正常状态，PG当前副本数小于其存储池定义的值的时候，PG会转换为undersized状态，比如两个备份OSD都down了，那么此时PG中就只有一个主OSD了，不符合ceph最少要求一个主OSD加一个备OSD的要求，那么就会导致使用此OSD的PG转换为undersized 状态，直到添加备份OSD添加完成，或者修复完成。

3.8 scrubbing

scrub是ceph对数据的清洗状态，用来保证数据完整性的机制，Ceph 的OSD定期启动scrub线程来扫描部分对象，通过与其他副本比对来发现是否一致，如果存在不一致，抛出异常提示用户手动解决, scrub 以PG为单位，对于每一个pg， ceph分析该pg下所有的object，产生一个类似于元数据信息摘要的数据结构,如对象大小,属性等,叫scrubmap,比较主与副scrubmap，来保证是不是有object丢失或者不匹配，扫描分为轻量级扫描和深度扫描，轻量级扫描也叫做light scrubs或者shallow scrubs或者simply scrubs即轻量级扫描.
Light scrub(daily)比较object size 和属性, deep scrub (weekly)读取数据部分并通过checksum(CRC32算法)对比和数据的一致性,深度扫描过程中的PG会处于scrubbing+deep状态.

3.9 recovering

正在恢复态，集群正在执行迁移或同步对象和他们的副本，这可能是由于添加了一个新的OSD到集群中或者某个OSD宕掉后，PG可能会被CRUSH算法重新分配不同的OSD，而由于OSD更换导致PG发生内部数据同步的过程中的PG会被标记为Recovering.

3.10 backfilling

正在后台填充态, backfill是recovery 的一种特殊场景，指peering完成后，如果基于当前权威日志无法对Up Set(上行集)当中的某些PG实例实施增量同步(例如承载这些PG实例的OSD离线太久,或者是新的OSD加入集群导致的PG实例整体迁移)则通过完全拷贝当前Primary所有对象的方式进行全量同步，此过程中的PG会处于backfilling.

3.11 backfill-toofull

某个需要被backfill的PG实例，其所在的OSD可用空间不足，Backfill流程当前被挂起时PG给的状态。

posted @ 2023-09-24 15:39 areke 阅读(1373) 评论(0) 收藏举报

刷新页面返回顶部

areke