09 OSD 日常管理(转载)

OSD 日常管理

OSD 健康状态

  • ceph status
[root@m1 ceph]# ceph -s
  cluster:
    id:     17a413b5-f140-441a-8b35-feec8ae29521
    health: HEALTH_WARN
            2 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 7s)
    mgr: a(active, since 7s)
    mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
    osd: 5 osds: 5 up (since 23s), 5 in (since 6d)
    rgw: 2 daemons active (my.store.a, my.store.b)
 
  task status:
 
  data:
    pools:   12 pools, 209 pgs
    objects: 800 objects, 1.3 GiB
    usage:   9.2 GiB used, 241 GiB / 250 GiB avail
    pgs:     209 active+clean
  • ceph osd tree
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.24399  root default                                       
 -5         0.04880      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000
  • ceph osd status
[root@m1 ceph]# ceph osd status
ID  HOST              USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  192.168.100.133  2067M  47.9G      0        0       2      105   exists,up  
 1  192.168.100.134  1814M  48.2G      0        0       1        0   exists,up  
 2  192.168.100.135  1845M  48.1G      0        0       1       15   exists,up  
 3  192.168.100.136  1794M  48.2G      0        0       0        0   exists,up  
 4  192.168.100.137  1893M  48.1G      0        0       1       89   exists,up
  • ceph osd df
[root@m1 ceph]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META      AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  0.04880   1.00000   50 GiB  2.0 GiB  1.0 GiB  1.6 MiB  1022 MiB   48 GiB  4.04  1.10  140      up
 1    hdd  0.04880   1.00000   50 GiB  1.8 GiB  790 MiB  2.3 MiB  1022 MiB   48 GiB  3.54  0.96  121      up
 2    hdd  0.04880   1.00000   50 GiB  1.8 GiB  821 MiB  2.2 MiB  1022 MiB   48 GiB  3.60  0.98  111      up
 3    hdd  0.04880   1.00000   50 GiB  1.8 GiB  770 MiB  596 KiB  1023 MiB   48 GiB  3.51  0.95  133      up
 4    hdd  0.04880   1.00000   50 GiB  1.8 GiB  870 MiB  3.7 MiB  1020 MiB   48 GiB  3.70  1.01  122      up
                       TOTAL  250 GiB  9.2 GiB  4.2 GiB   10 MiB   5.0 GiB  241 GiB  3.68                   
MIN/MAX VAR: 0.95/1.10  STDDEV: 0.19
  • ceph osd utilization
[root@m1 ceph]# ceph osd utilization
avg 125.4
stddev 10.0916 (expected baseline 10.016)
min osd.2 with 111 pgs (0.885167 * mean)
max osd.0 with 140 pgs (1.11643 * mean)

OSD 横向扩容

Ceph 的存储空间不够时候,需要对 Ceph 进行扩容, Ceph 能支持横向动态水平扩容,通常两种方式:

  • 添加更多的 osd
  • 添加额外的 host

rook 默认使用“所有节点上所有的磁盘”,采用默认策略只要添加了磁盘或者主机就会按照 ROOK_DISCOVER_DEVICES_INTERVAL 设定的间隔扩容磁盘,前面安装时候调整了相关的策略

[root@m1 ceph]# vim cluster.yaml

......
217   storage: # cluster level storage configuration and selection
218     useAllNodes: false
219     useAllDevices: false

即关闭状态,因此需要手动定义 nodes 信息,将需要扩展的磁盘添加到列表中,如将 m1sdc 磁盘扩容到 Ceph 集群中

[root@m1 ceph]# lsblk 
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   40G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   39G  0 part 
  ├─centos-root 253:0    0   37G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  
sdb               8:16   0   50G  0 disk 
└─ceph--bfc68a00--2c58--4226--bd07--1e1e8eb8d15c-osd--block--846588ac--71f8--4a78--83ee--3d70763e2ab4
                253:2    0   50G  0 lvm  
sdc               8:32   0   50G  0 disk 
sr0              11:0    1  973M  0 rom  
rbd0            252:0    0   10G  0 disk /var/lib/kubelet/pods/c8c29b82-3b17-46c5-b205-0c44009548a2/volumes/kubernetes.io~csi/pvc-af0e2916-e
rbd1            252:16   0   10G  0 disk /var/lib/kubelet/pods/3fd82fb7-97ff-4175-8f4d-88bdb1a7219c/volumes/kubernetes.io~csi/pvc-9d32f5ff-1

cluster.yaml 新增配置

[root@m1 ceph]# vim cluster.yaml

217   storage: # cluster level storage configuration and selection
218     useAllNodes: false
219     useAllDevices: false
220     #deviceFilter:
221     config:
......
230     nodes:
231     - name: "192.168.100.133"
232       devices:
233       - name: "sdb"
234         config:
235           storeType: bluestore
236           journalSizeMB: "4096"
237       - name: "sdc"   # 新增配置
238         config:
239           storeType: bluestore
240           journalSizeMB: "4096"

# 重新 apply 配置信息
[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured

扩容失败排查-1

扩容后发现此时 osd 并未加入到集群中,原因何在呢?在 章节5 中定义了 osd 的节点调度机制,设定调度到具有 ceph-osd=enabled 标签的 node 节点,如果没有这个标签则无法满足调度的要求,前面我们只设定了 n4 节点,因此只有该节点满足调度要求,需要将其他的节设置上

  • 查看当前 osd
[root@m1 ceph]# kubectl -n rook-ceph get pods -l app=rook-ceph-osd
NAME                               READY   STATUS    RESTARTS   AGE
rook-ceph-osd-0-654cdc8f98-9sdfg   1/1     Running   21         6d23h
rook-ceph-osd-1-6ff76dc6df-9k9fg   1/1     Running   0          6d23h
rook-ceph-osd-2-f4966d698-hzmcd    1/1     Running   0          6d23h
rook-ceph-osd-3-54cfdb49cb-pb6d7   1/1     Running   0          6d23h
rook-ceph-osd-4-566f4c5b4d-fqf5q   1/1     Running   0          6d22h
  • 查看 operator 日志
[root@m1 ceph]# kubectl -n rook-ceph logs rook-ceph-operator-7fdf75bb9d-7mnrc -f

2022-12-01 03:16:41.519366 I | ceph-cluster-controller: CR has changed for "rook-ceph". diff=  v1.ClusterSpec{
        CephVersion: {Image: "ceph/ceph:v15.2.8"},
        DriveGroups: nil,
        Storage: v1.StorageScopeSpec{
                Nodes: []v1.Node{
                        {
                                Name:      "192.168.100.133",
                                Resources: {},
                                Config:    nil,
                                Selection: v1.Selection{
                                        UseAllDevices:    nil,
                                        DeviceFilter:     "",
                                        DevicePathFilter: "",
                                        Devices: []v1.Device{
                                                {Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}},
                                                {
                                                        Name:     "sdc",
                                                        FullPath: "",
                                                        Config: map[string]string{
-                                                               "journalSizeMB": "8192",
+                                                               "journalSizeMB": "4096",
                                                                "storeType":     "bluestore",
                                                        },
                                                },
                                        },
                                        Directories:          nil,
                                        VolumeClaimTemplates: nil,
                                },
                        },
                        {Name: "192.168.100.134", Selection: {Devices: {{Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}}}}},
                        {Name: "192.168.100.135", Selection: {Devices: {{Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}}}}},
                        ... // 2 identical elements
                },
                UseAllNodes: false,
                NodeCount:   0,
                ... // 4 identical fields
        },
        Annotations: nil,
        Labels:      nil,
        ... // 19 identical fields
  }

.......

2022-12-01 03:17:10.448036 I | op-osd: start running osds in namespace rook-ceph
2022-12-01 03:17:10.448049 I | op-osd: start provisioning the osds on pvcs, if needed
2022-12-01 03:17:10.465629 I | op-osd: no volume sources defined to configure OSDs on PVCs.
2022-12-01 03:17:10.465667 I | op-osd: start provisioning the osds on nodes, if needed
2022-12-01 03:17:10.499216 I | op-osd: 1 of the 5 storage nodes are valid                                           ##### 只有1个节点可以使用 节点没有标签
2022-12-01 03:17:10.761098 I | op-mgr: successful modules: prometheus
2022-12-01 03:17:10.911845 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-192.168.100.137 to start a new one
2022-12-01 03:17:11.129678 I | op-k8sutil: batch job rook-ceph-osd-prepare-192.168.100.137 still exists
2022-12-01 03:17:11.568775 I | op-mgr: successful modules: mgr module(s) from the spec
2022-12-01 03:17:11.627933 I | op-mgr: successful modules: balancer
2022-12-01 03:17:12.535916 I | op-mgr: successful modules: http bind settings
2022-12-01 03:17:13.135066 I | op-k8sutil: batch job rook-ceph-osd-prepare-192.168.100.137 deleted
2022-12-01 03:17:13.141790 I | op-osd: osd provision job started for node 192.168.100.137
2022-12-01 03:17:13.141823 I | op-osd: start osds after provisioning is completed, if needed
2022-12-01 03:17:13.214453 I | op-osd: osd orchestration status for node 192.168.100.137 is starting
2022-12-01 03:17:13.214510 I | op-osd: 0/1 node(s) completed osd provisioning, resource version 2400822
2022-12-01 03:17:14.398473 I | op-osd: osd orchestration status for node 192.168.100.137 is computingDiff
2022-12-01 03:17:14.455762 I | op-osd: osd orchestration status for node 192.168.100.137 is orchestrating
2022-12-01 03:17:14.686591 I | op-osd: osd orchestration status for node 192.168.100.137 is completed
2022-12-01 03:17:14.686642 I | op-osd: starting 1 osd daemons on node 192.168.100.137
2022-12-01 03:17:14.690798 I | cephclient: getting or creating ceph auth key "osd.4"
2022-12-01 03:17:14.988322 I | op-k8sutil: deployment "rook-ceph-osd-4" did not change, nothing to update
2022-12-01 03:17:14.997776 I | op-osd: 1/1 node(s) completed osd provisioning
2022-12-01 03:17:15.438705 I | cephclient: successfully disallowed pre-octopus osds and enabled all new octopus-only functionality
2022-12-01 03:17:15.438752 I | op-osd: completed running osds in namespace rook-ceph
  • node 节点新增标签
[root@m1 ceph]# kubectl label node 192.168.100.133 ceph-osd=enabled
node/192.168.100.133 labeled
[root@m1 ceph]# kubectl label node 192.168.100.134 ceph-osd=enabled
node/192.168.100.134 labeled
[root@m1 ceph]# kubectl label node 192.168.100.135 ceph-osd=enabled
node/192.168.100.135 labeled
[root@m1 ceph]# kubectl label node 192.168.100.136 ceph-osd=enabled
node/192.168.100.136 labeled
  • 修改 operator.yaml 配置文件,重读 osd 信息
[root@m1 ~]# kubectl -n rook-ceph logs rook-ceph-osd-prepare-192.168.100.133-psmg8 
2022-12-01 03:24:07.833336 I | cephcmd: desired devices to configure osds: [{Name:sdb OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:sdc OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2022-12-01 03:24:07.838526 I | rookcmd: starting Rook v1.5.5 with arguments '/rook/rook ceph osd provision'
2022-12-01 03:24:07.838543 I | rookcmd: flag values: --cluster-id=8279c0cb-e44f-4af6-8689-115025bb2940, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"sdb","storeConfig":{"storeType":"bluestore","osdsPerDevice":1}},{"id":"sdc","storeConfig":{"storeType":"bluestore","osdsPerDevice":1}}], --drive-groups=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=DEBUG, --metadata-device=, --node-name=192.168.100.133, --operator-image=, --osd-crush-device-class=, --osd-database-size=0, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2022-12-01 03:24:07.838549 I | op-mon: parsing mon endpoints: a=10.68.231.222:6789,b=10.68.163.216:6789,c=10.68.61.127:6789
2022-12-01 03:24:08.785794 I | op-osd: CRUSH location=root=default host=192-168-100-133
2022-12-01 03:24:08.797646 I | cephcmd: crush location of osd: root=default host=192-168-100-133
2022-12-01 03:24:08.797739 D | exec: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt -- /usr/sbin/lvm --help
2022-12-01 03:24:08.936849 I | cephosd: successfully called nsenter
2022-12-01 03:24:08.936869 I | cephosd: binary "/usr/sbin/lvm" found on the host, proceeding with osd preparation
2022-12-01 03:24:08.936875 D | exec: Running command: dmsetup version
2022-12-01 03:24:08.950594 I | cephosd: Library version:   1.02.171-RHEL8 (2020-05-28)
Driver version:    4.37.1
2022-12-01 03:24:11.286550 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2022-12-01 03:24:11.286593 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-12-01 03:24:11.298967 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-12-01 03:24:11.309503 D | cephosd: config file @ /etc/ceph/ceph.conf: [global]
fsid                = 17a413b5-f140-441a-8b35-feec8ae29521
mon initial members = a b c
mon host            = [v2:10.68.231.222:3300,v1:10.68.231.222:6789],[v2:10.68.163.216:3300,v1:10.68.163.216:6789],[v2:10.68.61.127:3300,v1:10.68.61.127:6789]
public addr         = 172.20.0.116
cluster addr        = 172.20.0.116

[client.admin]
keyring = /var/lib/rook/rook-ceph/client.admin.keyring

2022-12-01 03:24:11.309570 I | cephosd: discovering hardware
2022-12-01 03:24:11.313813 D | exec: Running command: lsblk --all --noheadings --list --output KNAME
2022-12-01 03:24:11.413147 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.444846 D | exec: Running command: sgdisk --print /dev/sda
2022-12-01 03:24:11.463043 D | exec: Running command: udevadm info --query=property /dev/sda
2022-12-01 03:24:11.575745 D | exec: Running command: lsblk --noheadings --pairs /dev/sda
2022-12-01 03:24:11.580658 I | inventory: skipping device "sda" because it has child, considering the child instead.
2022-12-01 03:24:11.580703 D | exec: Running command: lsblk /dev/sda1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.584450 D | exec: Running command: udevadm info --query=property /dev/sda1
2022-12-01 03:24:11.602091 D | exec: Running command: lsblk /dev/sda2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.604729 D | exec: Running command: udevadm info --query=property /dev/sda2
2022-12-01 03:24:11.627506 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.630825 D | exec: Running command: sgdisk --print /dev/sdb
2022-12-01 03:24:11.645321 D | exec: Running command: udevadm info --query=property /dev/sdb
2022-12-01 03:24:11.650973 D | exec: Running command: lsblk --noheadings --pairs /dev/sdb
2022-12-01 03:24:11.656583 I | inventory: skipping device "sdb" because it has child, considering the child instead.
2022-12-01 03:24:11.656599 D | exec: Running command: lsblk /dev/sdc --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.658018 D | exec: Running command: sgdisk --print /dev/sdc
2022-12-01 03:24:11.666084 D | exec: Running command: udevadm info --query=property /dev/sdc
2022-12-01 03:24:11.671168 D | exec: Running command: lsblk --noheadings --pairs /dev/sdc
2022-12-01 03:24:11.676170 D | exec: Running command: lsblk /dev/sr0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.678167 W | inventory: skipping device "sr0". unsupported diskType rom
2022-12-01 03:24:11.678181 W | inventory: skipping rbd device "rbd0"
2022-12-01 03:24:11.678184 W | inventory: skipping rbd device "rbd1"
2022-12-01 03:24:11.678188 D | exec: Running command: lsblk /dev/dm-0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.680000 D | exec: Running command: sgdisk --print /dev/dm-0
2022-12-01 03:24:11.682639 D | exec: Running command: udevadm info --query=property /dev/dm-0
2022-12-01 03:24:11.687596 D | exec: Running command: lsblk /dev/dm-1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.689044 D | exec: Running command: sgdisk --print /dev/dm-1
2022-12-01 03:24:11.692035 D | exec: Running command: udevadm info --query=property /dev/dm-1
2022-12-01 03:24:11.698050 D | exec: Running command: lsblk /dev/dm-2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.699720 D | exec: Running command: sgdisk --print /dev/dm-2
2022-12-01 03:24:11.702599 D | exec: Running command: udevadm info --query=property /dev/dm-2
2022-12-01 03:24:11.714082 D | inventory: discovered disks are [0xc0003699e0 0xc000369e60 0xc00017ea20 0xc00017eea0 0xc00017f0e0 0xc00014a240]
2022-12-01 03:24:11.717932 I | cephosd: creating and starting the osds
2022-12-01 03:24:14.946745 D | cephosd: No Drive Groups configured.
2022-12-01 03:24:14.955043 D | cephosd: desiredDevices are [{Name:sdb OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:sdc OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2022-12-01 03:24:14.955082 D | cephosd: context.Devices are [0xc0003699e0 0xc000369e60 0xc00017ea20 0xc00017eea0 0xc00017f0e0 0xc00014a240]
2022-12-01 03:24:14.955110 I | cephosd: skipping device "sda1" because it contains a filesystem "xfs"
2022-12-01 03:24:14.955115 I | cephosd: skipping device "sda2" because it contains a filesystem "LVM2_member"
2022-12-01 03:24:14.955136 D | exec: Running command: lsblk /dev/sdc --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:15.091745 D | exec: Running command: ceph-volume inventory --format json /dev/sdc
2022-12-01 03:24:17.897499 I | cephosd: device "sdc" is available.
2022-12-01 03:24:17.897546 I | cephosd: "sdc" found in the desired devices
2022-12-01 03:24:17.897552 I | cephosd: device "sdc" is selected by the device filter/name "sdc"
2022-12-01 03:24:17.897565 I | cephosd: skipping 'dm' device "dm-0"
2022-12-01 03:24:17.897567 I | cephosd: skipping 'dm' device "dm-1"
2022-12-01 03:24:17.897569 I | cephosd: skipping 'dm' device "dm-2"
2022-12-01 03:24:17.897756 I | cephosd: configuring osd devices: {"Entries":{"sdc":{"Data":-1,"Metadata":null,"Config":{"Name":"sdc","OSDsPerDevice":1,"MetadataDevice":"","DatabaseSizeMB":0,"DeviceClass":"","IsFilter":false,"IsDevicePathFilter":false},"PersistentDevicePaths":["/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:2:0"]}}}
2022-12-01 03:24:17.901538 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2022-12-01 03:24:17.901858 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/297785192
2022-12-01 03:24:18.987463 I | cephosd: configuring new device sdc
2022-12-01 03:24:18.987533 I | cephosd: Base command - stdbuf
2022-12-01 03:24:18.987575 I | cephosd: immediateExecuteArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc]
2022-12-01 03:24:18.987590 I | cephosd: immediateReportArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc --report]
2022-12-01 03:24:18.987598 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc --report
2022-12-01 03:24:20.389969 D | exec: --> DEPRECATION NOTICE
2022-12-01 03:24:20.403403 D | exec: --> You are using the legacy automatic disk sorting behavior
2022-12-01 03:24:20.403406 D | exec: --> The Pacific release will change the default to --no-auto
2022-12-01 03:24:20.403409 D | exec: --> passed data devices: 1 physical, 0 LVM
2022-12-01 03:24:20.403411 D | exec: --> relative data size: 1.0
2022-12-01 03:24:20.403436 D | exec: 
2022-12-01 03:24:20.403440 D | exec: Total OSDs: 1
2022-12-01 03:24:20.403442 D | exec: 
2022-12-01 03:24:20.403444 D | exec:   Type            Path                                                    LV Size         % of device
2022-12-01 03:24:20.403446 D | exec: ----------------------------------------------------------------------------------------------------
2022-12-01 03:24:20.403447 D | exec:   data            /dev/sdc                                                50.00 GB        100.00%
2022-12-01 03:24:20.574537 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc
2022-12-01 03:24:25.923999 D | exec: --> DEPRECATION NOTICE
2022-12-01 03:24:25.954928 D | exec: --> You are using the legacy automatic disk sorting behavior
2022-12-01 03:24:25.954933 D | exec: --> The Pacific release will change the default to --no-auto
2022-12-01 03:24:25.954937 D | exec: --> passed data devices: 1 physical, 0 LVM
2022-12-01 03:24:25.954938 D | exec: --> relative data size: 1.0
2022-12-01 03:24:25.954940 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2022-12-01 03:24:25.954943 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 58119fd3-2da0-42c2-8abb-7a3931068e2e
2022-12-01 03:24:25.954951 D | exec: Running command: /usr/sbin/vgcreate --force --yes ceph-71fd0a8a-7194-4b79-9987-a46a622743f5 /dev/sdc
2022-12-01 03:24:25.954955 D | exec:  stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
2022-12-01 03:24:25.954957 D | exec:   Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
2022-12-01 03:24:25.954959 D | exec:  stdout: Physical volume "/dev/sdc" successfully created.
2022-12-01 03:24:25.954961 D | exec:  stdout: Volume group "ceph-71fd0a8a-7194-4b79-9987-a46a622743f5" successfully created
2022-12-01 03:24:25.954968 D | exec: Running command: /usr/sbin/lvcreate --yes -l 12799 -n osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e ceph-71fd0a8a-7194-4b79-9987-a46a622743f5
2022-12-01 03:24:25.954970 D | exec:  stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
2022-12-01 03:24:25.954971 D | exec:  stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
2022-12-01 03:24:25.954973 D | exec:  stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
2022-12-01 03:24:25.954974 D | exec:  stdout: Logical volume "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e" created.
2022-12-01 03:24:25.954976 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2022-12-01 03:24:25.954979 D | exec: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-5
2022-12-01 03:24:25.954980 D | exec: Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e
2022-12-01 03:24:25.954982 D | exec: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--71fd0a8a--7194--4b79--9987--a46a622743f5-osd--block--58119fd3--2da0--42c2--8abb--7a3931068e2e
2022-12-01 03:24:25.954983 D | exec: Running command: /usr/bin/ln -s /dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e /var/lib/ceph/osd/ceph-5/block
2022-12-01 03:24:25.954986 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-5/activate.monmap
2022-12-01 03:24:25.954987 D | exec:  stderr: got monmap epoch 3
2022-12-01 03:24:25.954989 D | exec: Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-5/keyring --create-keyring --name osd.5 --add-key AQBlHohjdTH9OBAA3kVn4rwfrPQvgN8gk3bYsQ==
2022-12-01 03:24:25.954990 D | exec:  stdout: creating /var/lib/ceph/osd/ceph-5/keyring
2022-12-01 03:24:25.954991 D | exec: added entity osd.5 auth(key=AQBlHohjdTH9OBAA3kVn4rwfrPQvgN8gk3bYsQ==)
2022-12-01 03:24:25.954993 D | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/keyring
2022-12-01 03:24:25.954995 D | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/
2022-12-01 03:24:25.954998 D | exec: Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/ceph-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-5/ --osd-uuid 58119fd3-2da0-42c2-8abb-7a3931068e2e --setuser ceph --setgroup ceph
2022-12-01 03:24:25.955007 D | exec:  stderr: 2022-12-01T03:24:23.646+0000 7f126ee39f40 -1 bluestore(/var/lib/ceph/osd/ceph-5/) _read_fsid unparsable uuid
2022-12-01 03:24:25.955009 D | exec:  stderr: 2022-12-01T03:24:23.688+0000 7f126ee39f40 -1 freelist read_size_meta_from_db missing size meta in DB
2022-12-01 03:24:25.955010 D | exec: --> ceph-volume lvm prepare successful for: /dev/sdc
2022-12-01 03:24:26.090265 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list  --format json
2022-12-01 03:24:27.445464 D | cephosd: {
    "0": [
        {
            "devices": [
                "/dev/sdb"
            ],
            "lv_name": "osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
            "lv_path": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
            "lv_size": "53682896896",
            "lv_tags": "ceph.block_device=/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4,ceph.block_uuid=6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=17a413b5-f140-441a-8b35-feec8ae29521,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=846588ac-71f8-4a78-83ee-3d70763e2ab4,ceph.osd_id=0,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0",
            "lv_uuid": "6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j",
            "name": "osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
            "path": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
            "tags": {
                "ceph.block_device": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
                "ceph.block_uuid": "6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j",
                "ceph.cephx_lockbox_secret": "",
                "ceph.cluster_fsid": "17a413b5-f140-441a-8b35-feec8ae29521",
                "ceph.cluster_name": "ceph",
                "ceph.crush_device_class": "None",
                "ceph.encrypted": "0",
                "ceph.osd_fsid": "846588ac-71f8-4a78-83ee-3d70763e2ab4",
                "ceph.osd_id": "0",
                "ceph.osdspec_affinity": "",
                "ceph.type": "block",
                "ceph.vdo": "0"
            },
            "type": "block",
            "vg_name": "ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c"
        }
    ],
    "5": [
        {
            "devices": [
                "/dev/sdc"
            ],
            "lv_name": "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
            "lv_path": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
            "lv_size": "53682896896",
            "lv_tags": "ceph.block_device=/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e,ceph.block_uuid=CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=17a413b5-f140-441a-8b35-feec8ae29521,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=58119fd3-2da0-42c2-8abb-7a3931068e2e,ceph.osd_id=5,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0",
            "lv_uuid": "CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ",
            "name": "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
            "path": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
            "tags": {
                "ceph.block_device": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
                "ceph.block_uuid": "CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ",
                "ceph.cephx_lockbox_secret": "",
                "ceph.cluster_fsid": "17a413b5-f140-441a-8b35-feec8ae29521",
                "ceph.cluster_name": "ceph",
                "ceph.crush_device_class": "None",
                "ceph.encrypted": "0",
                "ceph.osd_fsid": "58119fd3-2da0-42c2-8abb-7a3931068e2e",
                "ceph.osd_id": "5",
                "ceph.osdspec_affinity": "",
                "ceph.type": "block",
                "ceph.vdo": "0"
            },
            "type": "block",
            "vg_name": "ceph-71fd0a8a-7194-4b79-9987-a46a622743f5"
        }
    ]
}
2022-12-01 03:24:27.449559 I | cephosd: osdInfo has 1 elements. [{Name:osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 Path:/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 Tags:{OSDFSID:846588ac-71f8-4a78-83ee-3d70763e2ab4 Encrypted:0 ClusterFSID:17a413b5-f140-441a-8b35-feec8ae29521} Type:block}]
2022-12-01 03:24:27.449597 I | cephosd: osdInfo has 1 elements. [{Name:osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e Path:/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e Tags:{OSDFSID:58119fd3-2da0-42c2-8abb-7a3931068e2e Encrypted:0 ClusterFSID:17a413b5-f140-441a-8b35-feec8ae29521} Type:block}]
2022-12-01 03:24:27.449602 I | cephosd: 2 ceph-volume lvm osd devices configured on this node
2022-12-01 03:24:27.453319 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/192.168.100.133 --format json
2022-12-01 03:24:28.820501 D | cephosd: {}
2022-12-01 03:24:28.822962 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-01 03:24:28.823177 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:846588ac-71f8-4a78-83ee-3d70763e2ab4 DevicePartUUID: BlockPath:/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 MetadataPath: WalPath: SkipLVRelease:false Location: LVBackedPV:false CVMode:lvm Store:bluestore} {ID:5 Cluster:ceph UUID:58119fd3-2da0-42c2-8abb-7a3931068e2e DevicePartUUID: BlockPath:/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e MetadataPath: WalPath: SkipLVRelease:false Location: LVBackedPV:false CVMode:lvm Store:bluestore}]
  • 查看节点 pods 信息
[root@m1 ceph]# kubectl -n rook-ceph get pods
......
rook-ceph-osd-0-f4b99b44f-52z52                             0/1     Pending     0          7m20s
rook-ceph-osd-1-6ff76dc6df-9k9fg                            1/1     Running     0          7d
rook-ceph-osd-2-f4966d698-hzmcd                             1/1     Running     0          7d
rook-ceph-osd-3-54cfdb49cb-pb6d7                            1/1     Running     0          7d
rook-ceph-osd-4-566f4c5b4d-fqf5q                            1/1     Running     0          6d23h
rook-ceph-osd-5-7b6589474f-rlsn9                            0/1     Pending     0          1m
rook-ceph-osd-prepare-192.168.100.133-psmg8                 0/1     Completed   0          8m6s
rook-ceph-osd-prepare-192.168.100.134-nmn7l                 0/1     Completed   0          8m4s
rook-ceph-osd-prepare-192.168.100.135-n4k7x                 0/1     Completed   0          7m59s
rook-ceph-osd-prepare-192.168.100.136-j5fkq                 0/1     Completed   0          7m45s
rook-ceph-osd-prepare-192.168.100.137-pb422                 0/1     Completed   0          7m36s

扩容失败排查-2

设定完标签调度机制后,重新看pods的状态发现此时 osd-0 状态处于 pending 状态,查看详情发现提示内存资源是 Insufficient ,即资源不够

# 报错详情,1个节点内存资源不足,另外4个节点node亲和性不满足
[root@m1 ceph]# kubectl -n rook-ceph describe pods rook-ceph-osd-0-f4b99b44f-52z52
.....
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  5m1s  default-scheduler  0/5 nodes are available: 1 Insufficient memory, 4 node(s) didn't match Pod's node affinity.
  Warning  FailedScheduling  5m1s  default-scheduler  0/5 nodes are available: 1 Insufficient memory, 4 node(s) didn't match Pod's node affinity.

为啥资源不够呢?由于 m1 上承载 mgrmonitor 的设置,章节5 我们设定了 monmgrosd 的资源分配,此时有 2 个 osdm1 上运行,会占用 4G 内存,从而导致资源不够。

  • 查看 mgrmonitorosd 资源配置
[root@m1 ceph]# vim cluster.yaml

180   resources:
181 # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
182     mon:
183       limits:
184         cpu: "300m"
185         memory: "512Mi"
186       requests:
187         cpu: "300m"
188         memory: "512Mi"
189     mgr:
190       limits:
191         cpu: "300m"
192         memory: "512Mi"
193       requests:
194         cpu: "300m"
195         memory: "512Mi"
196     osd:
197       limits:
198         cpu: "1000m"
199         memory: "2048Mi"
200       requests:
201         cpu: "1000m"
202         memory: "2048Mi"
  • 修改配置,注释资源配置信息
[root@m1 ceph]# vim cluster.yaml

180   resources:
181 # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
182 #    mon:
183 #      limits:
184 #        cpu: "300m"
185 #        memory: "512Mi"
186 #      requests:
187 #        cpu: "300m"
188 #        memory: "512Mi"
189 #    mgr:
190 #      limits:
191 #        cpu: "300m"
192 #        memory: "512Mi"
193 #      requests:
194 #        cpu: "300m"
195 #        memory: "512Mi"
196 #    osd:
197 #      limits:
198 #        cpu: "1000m"
199 #        memory: "2048Mi"
200 #      requests:
201 #        cpu: "1000m"
202 #        memory: "2048Mi"
  • 查看 pod 运行情况
[root@m1 ceph]# kubectl -n rook-ceph get pods
......
rook-ceph-osd-0-66dd4575f7-c64wh                            1/1     Running     0          14s
rook-ceph-osd-1-5866f9f558-jq994                            1/1     Running     0          11s
rook-ceph-osd-2-f4966d698-hzmcd                             1/1     Running     0          7d2h
rook-ceph-osd-3-54cfdb49cb-pb6d7                            1/1     Running     1          7d2h
rook-ceph-osd-4-566f4c5b4d-fqf5q                            1/1     Running     0          7d1h
rook-ceph-osd-5-7c6ddb8b7c-qrmvb                            1/1     Running     0          12s
  • 检查 ceph 情况和 osd 信息等
# 查看 ceph 集群
[root@m1 ceph]# ceph -s
  cluster:
    id:     17a413b5-f140-441a-8b35-feec8ae29521
    health: HEALTH_WARN
            2 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 2m)
    mgr: a(active, since 5m)
    mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
    osd: 6 osds: 6 up (since 32s), 6 in (since 8m)
    rgw: 2 daemons active (my.store.a, my.store.b)
 
  task status:
 
  data:
    pools:   12 pools, 209 pgs
    objects: 800 objects, 1.3 GiB
    usage:   11 GiB used, 289 GiB / 300 GiB avail
    pgs:     209 active+clean
 
  io:
    client:   7.3 KiB/s rd, 597 B/s wr, 9 op/s rd, 4 op/s wr

# 查看 osd 信息
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.29279  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000

[root@m1 ceph]# ceph osd status
ID  HOST              USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  192.168.100.133  1652M  48.3G      0        0       2      105   exists,up  
 1  192.168.100.134  1745M  48.2G      0        0       1        0   exists,up  
 2  192.168.100.135  2037M  48.0G      0        0       1       15   exists,up  
 3  192.168.100.136  1769M  48.2G      0        0       0        0   exists,up  
 4  192.168.100.137  1785M  48.2G      0        0       1       89   exists,up  
 5  192.168.100.133  1778M  48.2G      0        0       0        0   exists,up  

[root@m1 ceph]# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA      OMAP     META      AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  0.04880   1.00000   50 GiB  1.8 GiB   770 MiB  645 KiB  1023 MiB   48 GiB  3.50  0.94   86      up
 5    hdd  0.04880   1.00000   50 GiB  1.7 GiB   674 MiB  305 KiB  1024 MiB   48 GiB  3.32  0.89   71      up
 1    hdd  0.04880   1.00000   50 GiB  1.7 GiB   717 MiB  4.2 MiB  1020 MiB   48 GiB  3.40  0.91  116      up
 2    hdd  0.04880   1.00000   50 GiB  2.0 GiB  1008 MiB  2.2 MiB  1022 MiB   48 GiB  3.97  1.07  114      up
 3    hdd  0.04880   1.00000   50 GiB  2.0 GiB   1.0 GiB  1.7 MiB  1022 MiB   48 GiB  4.02  1.08  125      up
 4    hdd  0.04880   1.00000   50 GiB  2.0 GiB   1.0 GiB  3.7 MiB  1020 MiB   48 GiB  4.09  1.10  114      up
                       TOTAL  300 GiB   11 GiB   5.1 GiB   13 MiB   6.0 GiB  289 GiB  3.72                   
MIN/MAX VAR: 0.89/1.10  STDDEV: 0.32

日常发现 pods 无法运行时可以通过如下两种方式排查

  • kubectl describe pods 查看 events 事件
  • kubectl logs 查看容器内部日志信息

配置 bluestore 加速

Ceph 支持两种存储引擎

  • Filestore:SSD作为journal
  • Bluestore:WAL+DB=SSD

主流均已使用Bluestore,对于Bluestore加速需要将wal+db存储在SSD中,其到加速的作用,如下讲解配置,不涉及具体配置

  • 查看主机磁盘信息
[root@n3 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   40G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   39G  0 part 
  ├─centos-root 253:0    0   37G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  
sdb               8:16   0   50G  0 disk 
└─ceph--22b639dc--3c7c--4e8e--96fd--1472725a6f0e-osd--block--9bb28919--ad73--4eff--af5f--e7c36238a7d4
                253:2    0   50G  0 lvm  
sdc               8:32   0   50G  0 disk 
sdd               8:48   0   50G  0 disk 
sr0              11:0    1  973M  0 rom  
rbd0            252:0    0   40G  0 disk /var/lib/kubelet/pods/c001d7c0-0294-4be2-a8c5-10137f870adc/volumes/kubernetes.io~csi/pvc-cfe4a32c-c
rbd2            252:32   0   10G  0 disk /var/lib/kubelet/pods/69f0f693-912f-44ff-9fbc-8d90001438f9/volumes/kubernetes.io~csi/pvc-174fb859-9
  • 修改配置信息
[root@m1 ceph]# vim cluster.yaml 

......
253     - name: "192.168.100.136"
254       devices:
255       - name: "sdb"
256         config:
257           storeType: bluestore
258           journalSizeMB: "4096"
259       devices:
260       - name: "sdc"
261         config:
262           storeType: bluestore
263           metadataDevice: "/dev/sdd"
264           databaseSizeMB: "4096"
265           walSizeMB: "4096"

[root@m1 ceph]# kubectl apply -f cluster.yaml 
cephcluster.ceph.rook.io/rook-ceph configured
  • 查看新增磁盘信息
[root@m1 ceph]# kubectl get pods -n rook-ceph

rook-ceph-osd-0-66dd4575f7-c64wh                            1/1     Running     4          31m
rook-ceph-osd-1-5866f9f558-jq994                            1/1     Running     0          31m
rook-ceph-osd-2-647f9d7fc-v7rnl                             1/1     Running     0          24m
rook-ceph-osd-3-7c6b7cb875-zf4tw                            1/1     Running     2          23m
rook-ceph-osd-4-7699568dc6-hs4z5                            1/1     Running     0          22m
rook-ceph-osd-5-7c6ddb8b7c-qrmvb                            1/1     Running     7          31m
rook-ceph-osd-6-56f5fdb4cb-hhq2c                            1/1     Running     0          5m39s
[root@m1 ceph]# ceph -s
  cluster:
    id:     17a413b5-f140-441a-8b35-feec8ae29521
    health: HEALTH_WARN
            Slow OSD heartbeats on back (longest 1473.860ms)
            Slow OSD heartbeats on front (longest 1462.079ms)
            2 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 5m)
    mgr: a(active, since 64s)
    mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
    osd: 7 osds: 7 up (since 81s), 7 in (since 8m)
    rgw: 2 daemons active (my.store.a, my.store.b)
 
  task status:
 
  data:
    pools:   12 pools, 209 pgs
    objects: 800 objects, 1.3 GiB
    usage:   15 GiB used, 339 GiB / 354 GiB avail
    pgs:     0.478% pgs not active
             208 active+clean
             1   peering
 
  io:
    client:   11 KiB/s rd, 895 B/s wr, 14 op/s rd, 6 op/s wr
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.34547  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.10149      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
  6    hdd  0.05269          osd.6                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000
[root@n3 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   40G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   39G  0 part 
  ├─centos-root 253:0    0   37G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  
sdb               8:16   0   50G  0 disk 
└─ceph--22b639dc--3c7c--4e8e--96fd--1472725a6f0e-osd--block--9bb28919--ad73--4eff--af5f--e7c36238a7d4
                253:2    0   50G  0 lvm  
sdc               8:32   0   50G  0 disk 
└─ceph--766c8c32--7ae9--40ec--817d--7ed72911c4f3-osd--block--18a961fe--90a1--4734--a0f1--88624ad93b88
                253:3    0   50G  0 lvm  
sdd               8:48   0   50G  0 disk 
└─ceph--eefbf3cc--43a9--43ff--a628--83cf6d0e2d15-osd--db--ad701ecb--4320--4e11--a199--6f718ceaff39
                253:4    0    4G  0 lvm  
sr0              11:0    1  973M  0 rom  
rbd0            252:0    0   40G  0 disk /var/lib/kubelet/pods/c001d7c0-0294-4be2-a8c5-10137f870adc/volumes/kubernetes.io~csi/pvc-cfe4a32c-c746-46af-84e8-cd981cdf8706/m
rbd2            252:32   0   10G  0 disk /var/lib/kubelet/pods/69f0f693-912f-44ff-9fbc-8d90001438f9/volumes/kubernetes.io~csi/pvc-174fb859-91ad-49e2-8e44-d7ee64645e7e/m

云原生删除 OSD

如果 OSD 所在的磁盘故障了或者需要更换配置,这时需要将 OSD 从集群中删除,删除 OSD 的注意事项有

  • 删除 osd 后确保集群有足够的容量
  • 删除 osd 后确保PG状态正常
  • 单次尽可能不要删除过多的 osd
  • 删除多个 osd 需要等待数据同步同步完毕后再执行(rebalancing

模拟 osd.6 故障,osd 故障时候 pods 状态会变成 CrashLoopBackoff 或者 error 状态,此时 cephosd 的状态也会变成 down 状态,通过如下方式可以模拟

  • 手动设置 osd-6 的副本数量为 0
[root@m1 ceph]# kubectl scale deployment rook-ceph-osd-6 --replicas=0 -n rook-ceph
  • 查看 osd-6 是否已经 down
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.34547  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.10149      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
  6    hdd  0.05269          osd.6               down   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000

通过 rook 提供的脚本来删除

  • 修改 osd-purge.yaml 资源清单,云原生删除 osd 坏盘
[root@m1 ceph]# cat osd-purge.yaml 
apiVersion: batch/v1
kind: Job
metadata:
  name: rook-ceph-purge-osd
  namespace: rook-ceph # namespace:operator
  labels:
    app: rook-ceph-purge-osd
spec:
  template:
    spec:
      serviceAccountName: rook-ceph-system
      containers:
        - name: osd-removal
          image: rook/ceph:v1.5.5
          # TODO: Insert the OSD ID in the last parameter that is to be removed
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
          #args: ["ceph", "osd", "remove", "--osd-ids", "<OSD-IDs>"] 
          #########################################################
          # 修改此次 <OSD-IDs> 为 6即可,如果有多个,可以使用","分割即可 #
          #########################################################
          args: ["ceph", "osd", "remove", "--osd-ids", "6"]
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: ROOK_MON_ENDPOINTS
              valueFrom:
                configMapKeyRef:
                  key: data
                  name: rook-ceph-mon-endpoints
            - name: ROOK_CEPH_USERNAME
              valueFrom:
                secretKeyRef:
                  key: ceph-username
                  name: rook-ceph-mon
            - name: ROOK_CEPH_SECRET
              valueFrom:
                secretKeyRef:
                  key: ceph-secret
                  name: rook-ceph-mon
            - name: ROOK_CONFIG_DIR
              value: /var/lib/rook
            - name: ROOK_CEPH_CONFIG_OVERRIDE
              value: /etc/rook/config/override.conf
            - name: ROOK_FSID
              valueFrom:
                secretKeyRef:
                  key: fsid
                  name: rook-ceph-mon
            - name: ROOK_LOG_LEVEL
              value: DEBUG
          volumeMounts:
            - mountPath: /etc/ceph
              name: ceph-conf-emptydir
      volumes:
        - emptyDir: {}
          name: ceph-conf-emptydir
      restartPolicy: Never
  • 执行此脚本
[root@m1 ceph]# kubectl apply -f osd-purge.yaml
job.batch/rook-ceph-purge-osd created

此时 Ceph 会自动的进行数据的同步,等带数据的同步完成

  • 查看 osd 是否删除
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.29279  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000
[root@m1 ceph]# ceph osd crush dump | grep devices -A 50
    "devices": [
        {
            "id": 0,
            "name": "osd.0",
            "class": "hdd"
        },
        {
            "id": 1,
            "name": "osd.1",
            "class": "hdd"
        },
        {
            "id": 2,
            "name": "osd.2",
            "class": "hdd"
        },
        {
            "id": 3,
            "name": "osd.3",
            "class": "hdd"
        },
        {
            "id": 4,
            "name": "osd.4",
            "class": "hdd"
        },
        {
            "id": 5,
            "name": "osd.5",
            "class": "hdd"
        }
    ],

删除不必要的 Deployment

[root@m1 ceph]# kubectl -n rook-ceph delete deploy rook-ceph-osd-6
deployment.apps "rook-ceph-osd-6" deleted

由于 cluster.yaml 中关闭了 useAllNodesuseAllDevices ,因此需要将 osd 的信息从 nodes 中删除,避免 apply 之后重新添加回集群。

  • 修改 cluster.yaml 配置
[root@m1 ceph]# vim cluster.yaml

253     - name: "192.168.100.136"
254       devices:
255       - name: "sdb"
256         config:
257           storeType: bluestore
258           journalSizeMB: "4096"
259     #  devices:
260     #  - name: "sdc"
261     #    config:
262     #      storeType: bluestore
263     #      metadataDevice: "/dev/sdd"
264     #      databaseSizeMB: "4096"
265     #      walSizeMB: "4096"

重新应用下 cluster.yaml 资源清单

[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured

手动删除 OSD

除了使用云原生的方式删除 osd 之外,也可以使用 Ceph 标准的方式进行删除,如下是删除的方法

[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.29279  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5               down   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000

osd 标识为 out

[root@m1 ceph]# ceph osd out osd.5
marked out osd.5. 
[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.29279  root default                                       
 -5         0.09760      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
  5    hdd  0.04880          osd.5               down         0  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000

删除 out ,此时会进行元数据的同步,即 backfillingrebalancing 动作,完成数据的迁移

[root@m1 ceph]# ceph osd purge 5
purged osd.5

通过 ceph -s 可以看到数据的迁移过程

[root@m1 ceph]# ceph -s
  cluster:
    id:     17a413b5-f140-441a-8b35-feec8ae29521
    health: HEALTH_WARN
            Degraded data redundancy: 46/2400 objects degraded (1.917%), 2 pgs degraded, 3 pgs undersized
            2 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum a,b,c (age 5m)
    mgr: a(active, since 10m)
    mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
    osd: 5 osds: 5 up (since 2m), 5 in (since 5m); 6 remapped pgs
    rgw: 2 daemons active (my.store.a, my.store.b)
 
  task status:
 
  data:
    pools:   12 pools, 209 pgs
    objects: 800 objects, 1.3 GiB
    usage:   9.5 GiB used, 241 GiB / 250 GiB avail
    pgs:     46/2400 objects degraded (1.917%)
             14/2400 objects misplaced (0.583%)
             203 active+clean
             3   active+remapped+backfill_wait
             2   active+recovery_wait+undersized+degraded+remapped
             1   active+recovering+undersized+remapped
 
  io:
    client:   1.7 KiB/s rd, 3 op/s rd, 0 op/s wr
    recovery: 0 B/s, 8 objects/s

查看 osd 的目录树,可以发现 osd 已经删除

[root@m1 ceph]# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         0.24399  root default                                       
 -5         0.04880      host 192-168-100-133                           
  0    hdd  0.04880          osd.0                 up   1.00000  1.00000
 -3         0.04880      host 192-168-100-134                           
  1    hdd  0.04880          osd.1                 up   1.00000  1.00000
 -7         0.04880      host 192-168-100-135                           
  2    hdd  0.04880          osd.2                 up   1.00000  1.00000
 -9         0.04880      host 192-168-100-136                           
  3    hdd  0.04880          osd.3                 up   1.00000  1.00000
-11         0.04880      host 192-168-100-137                           
  4    hdd  0.04880          osd.4                 up   1.00000  1.00000
[root@m1 ceph]# ceph osd crush dump | grep devices -A 50
    "devices": [
        {
            "id": 0,
            "name": "osd.0",
            "class": "hdd"
        },
        {
            "id": 1,
            "name": "osd.1",
            "class": "hdd"
        },
        {
            "id": 2,
            "name": "osd.2",
            "class": "hdd"
        },
        {
            "id": 3,
            "name": "osd.3",
            "class": "hdd"
        },
        {
            "id": 4,
            "name": "osd.4",
            "class": "hdd"
        }
    ],

将对应 deploymentcluster.yaml 中的内容删除

[root@m1 ceph]# kubectl -n rook-ceph delete deploy rook-ceph-osd-5
deployment.apps "rook-ceph-osd-5" deleted
[root@m1 ceph]# vim cluster.yaml

230     nodes:
231     - name: "192.168.100.133"
232       devices:
233       - name: "sdb"
234         config:
235           storeType: bluestore
236           journalSizeMB: "4096"
237     #  - name: "sdc"
238     #    config:
239     #      storeType: bluestore
240     #      journalSizeMB: "4096"

重新应用下 cluster.yaml 资源清单

[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured

OSD 替换方法

Replace an OSD

To replace a disk that has failed:

  1. Run the steps in the previous section to Remove an OSD.
  2. Replace the physical device and verify the new device is attached.
  3. Check if your cluster CR will find the new device. If you are using useAllDevices: true you can skip this step. If your cluster CR lists individual devices or uses a device filter you may need to update the CR.
  4. The operator ideally will automatically create the new OSD within a few minutes of adding the new device or updating the CR. If you don’t see a new OSD automatically created, restart the operator (by deleting the operator pod) to trigger the OSD creation.
  5. Verify if the OSD is created on the node by running ceph osd tree from the toolbox.

替换操作的思路是:

  • 将其从 Ceph 集群中删除—采用云原生方式或手动方式
  • 删除之后数据同步完毕后再通过扩容的方式添加回集群中
  • 添加回来时候注意将对应的LVM删除
posted @ 2022-12-01 15:10  evescn  阅读(497)  评论(0编辑  收藏  举报