09 OSD 日常管理(转载)
OSD
日常管理
OSD
健康状态
ceph status
[root@m1 ceph]# ceph -s
cluster:
id: 17a413b5-f140-441a-8b35-feec8ae29521
health: HEALTH_WARN
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 7s)
mgr: a(active, since 7s)
mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
osd: 5 osds: 5 up (since 23s), 5 in (since 6d)
rgw: 2 daemons active (my.store.a, my.store.b)
task status:
data:
pools: 12 pools, 209 pgs
objects: 800 objects, 1.3 GiB
usage: 9.2 GiB used, 241 GiB / 250 GiB avail
pgs: 209 active+clean
ceph osd tree
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.24399 root default
-5 0.04880 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
ceph osd status
[root@m1 ceph]# ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 192.168.100.133 2067M 47.9G 0 0 2 105 exists,up
1 192.168.100.134 1814M 48.2G 0 0 1 0 exists,up
2 192.168.100.135 1845M 48.1G 0 0 1 15 exists,up
3 192.168.100.136 1794M 48.2G 0 0 0 0 exists,up
4 192.168.100.137 1893M 48.1G 0 0 1 89 exists,up
ceph osd df
[root@m1 ceph]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 0.04880 1.00000 50 GiB 2.0 GiB 1.0 GiB 1.6 MiB 1022 MiB 48 GiB 4.04 1.10 140 up
1 hdd 0.04880 1.00000 50 GiB 1.8 GiB 790 MiB 2.3 MiB 1022 MiB 48 GiB 3.54 0.96 121 up
2 hdd 0.04880 1.00000 50 GiB 1.8 GiB 821 MiB 2.2 MiB 1022 MiB 48 GiB 3.60 0.98 111 up
3 hdd 0.04880 1.00000 50 GiB 1.8 GiB 770 MiB 596 KiB 1023 MiB 48 GiB 3.51 0.95 133 up
4 hdd 0.04880 1.00000 50 GiB 1.8 GiB 870 MiB 3.7 MiB 1020 MiB 48 GiB 3.70 1.01 122 up
TOTAL 250 GiB 9.2 GiB 4.2 GiB 10 MiB 5.0 GiB 241 GiB 3.68
MIN/MAX VAR: 0.95/1.10 STDDEV: 0.19
ceph osd utilization
[root@m1 ceph]# ceph osd utilization
avg 125.4
stddev 10.0916 (expected baseline 10.016)
min osd.2 with 111 pgs (0.885167 * mean)
max osd.0 with 140 pgs (1.11643 * mean)
OSD
横向扩容
当 Ceph
的存储空间不够时候,需要对 Ceph
进行扩容, Ceph
能支持横向动态水平扩容,通常两种方式:
- 添加更多的
osd
- 添加额外的
host
rook
默认使用“所有节点上所有的磁盘”,采用默认策略只要添加了磁盘或者主机就会按照 ROOK_DISCOVER_DEVICES_INTERVAL
设定的间隔扩容磁盘,前面安装时候调整了相关的策略
[root@m1 ceph]# vim cluster.yaml
......
217 storage: # cluster level storage configuration and selection
218 useAllNodes: false
219 useAllDevices: false
即关闭状态,因此需要手动定义 nodes
信息,将需要扩展的磁盘添加到列表中,如将 m1
的 sdc
磁盘扩容到 Ceph
集群中
[root@m1 ceph]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 50G 0 disk
└─ceph--bfc68a00--2c58--4226--bd07--1e1e8eb8d15c-osd--block--846588ac--71f8--4a78--83ee--3d70763e2ab4
253:2 0 50G 0 lvm
sdc 8:32 0 50G 0 disk
sr0 11:0 1 973M 0 rom
rbd0 252:0 0 10G 0 disk /var/lib/kubelet/pods/c8c29b82-3b17-46c5-b205-0c44009548a2/volumes/kubernetes.io~csi/pvc-af0e2916-e
rbd1 252:16 0 10G 0 disk /var/lib/kubelet/pods/3fd82fb7-97ff-4175-8f4d-88bdb1a7219c/volumes/kubernetes.io~csi/pvc-9d32f5ff-1
cluster.yaml
新增配置
[root@m1 ceph]# vim cluster.yaml
217 storage: # cluster level storage configuration and selection
218 useAllNodes: false
219 useAllDevices: false
220 #deviceFilter:
221 config:
......
230 nodes:
231 - name: "192.168.100.133"
232 devices:
233 - name: "sdb"
234 config:
235 storeType: bluestore
236 journalSizeMB: "4096"
237 - name: "sdc" # 新增配置
238 config:
239 storeType: bluestore
240 journalSizeMB: "4096"
# 重新 apply 配置信息
[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured
扩容失败排查-1
扩容后发现此时 osd
并未加入到集群中,原因何在呢?在 章节5 中定义了 osd
的节点调度机制,设定调度到具有 ceph-osd=enabled
标签的 node
节点,如果没有这个标签则无法满足调度的要求,前面我们只设定了 n4
节点,因此只有该节点满足调度要求,需要将其他的节设置上
- 查看当前
osd
[root@m1 ceph]# kubectl -n rook-ceph get pods -l app=rook-ceph-osd
NAME READY STATUS RESTARTS AGE
rook-ceph-osd-0-654cdc8f98-9sdfg 1/1 Running 21 6d23h
rook-ceph-osd-1-6ff76dc6df-9k9fg 1/1 Running 0 6d23h
rook-ceph-osd-2-f4966d698-hzmcd 1/1 Running 0 6d23h
rook-ceph-osd-3-54cfdb49cb-pb6d7 1/1 Running 0 6d23h
rook-ceph-osd-4-566f4c5b4d-fqf5q 1/1 Running 0 6d22h
- 查看
operator
日志
[root@m1 ceph]# kubectl -n rook-ceph logs rook-ceph-operator-7fdf75bb9d-7mnrc -f
2022-12-01 03:16:41.519366 I | ceph-cluster-controller: CR has changed for "rook-ceph". diff= v1.ClusterSpec{
CephVersion: {Image: "ceph/ceph:v15.2.8"},
DriveGroups: nil,
Storage: v1.StorageScopeSpec{
Nodes: []v1.Node{
{
Name: "192.168.100.133",
Resources: {},
Config: nil,
Selection: v1.Selection{
UseAllDevices: nil,
DeviceFilter: "",
DevicePathFilter: "",
Devices: []v1.Device{
{Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}},
{
Name: "sdc",
FullPath: "",
Config: map[string]string{
- "journalSizeMB": "8192",
+ "journalSizeMB": "4096",
"storeType": "bluestore",
},
},
},
Directories: nil,
VolumeClaimTemplates: nil,
},
},
{Name: "192.168.100.134", Selection: {Devices: {{Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}}}}},
{Name: "192.168.100.135", Selection: {Devices: {{Name: "sdb", Config: {"journalSizeMB": "4096", "storeType": "bluestore"}}}}},
... // 2 identical elements
},
UseAllNodes: false,
NodeCount: 0,
... // 4 identical fields
},
Annotations: nil,
Labels: nil,
... // 19 identical fields
}
.......
2022-12-01 03:17:10.448036 I | op-osd: start running osds in namespace rook-ceph
2022-12-01 03:17:10.448049 I | op-osd: start provisioning the osds on pvcs, if needed
2022-12-01 03:17:10.465629 I | op-osd: no volume sources defined to configure OSDs on PVCs.
2022-12-01 03:17:10.465667 I | op-osd: start provisioning the osds on nodes, if needed
2022-12-01 03:17:10.499216 I | op-osd: 1 of the 5 storage nodes are valid ##### 只有1个节点可以使用 节点没有标签
2022-12-01 03:17:10.761098 I | op-mgr: successful modules: prometheus
2022-12-01 03:17:10.911845 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-192.168.100.137 to start a new one
2022-12-01 03:17:11.129678 I | op-k8sutil: batch job rook-ceph-osd-prepare-192.168.100.137 still exists
2022-12-01 03:17:11.568775 I | op-mgr: successful modules: mgr module(s) from the spec
2022-12-01 03:17:11.627933 I | op-mgr: successful modules: balancer
2022-12-01 03:17:12.535916 I | op-mgr: successful modules: http bind settings
2022-12-01 03:17:13.135066 I | op-k8sutil: batch job rook-ceph-osd-prepare-192.168.100.137 deleted
2022-12-01 03:17:13.141790 I | op-osd: osd provision job started for node 192.168.100.137
2022-12-01 03:17:13.141823 I | op-osd: start osds after provisioning is completed, if needed
2022-12-01 03:17:13.214453 I | op-osd: osd orchestration status for node 192.168.100.137 is starting
2022-12-01 03:17:13.214510 I | op-osd: 0/1 node(s) completed osd provisioning, resource version 2400822
2022-12-01 03:17:14.398473 I | op-osd: osd orchestration status for node 192.168.100.137 is computingDiff
2022-12-01 03:17:14.455762 I | op-osd: osd orchestration status for node 192.168.100.137 is orchestrating
2022-12-01 03:17:14.686591 I | op-osd: osd orchestration status for node 192.168.100.137 is completed
2022-12-01 03:17:14.686642 I | op-osd: starting 1 osd daemons on node 192.168.100.137
2022-12-01 03:17:14.690798 I | cephclient: getting or creating ceph auth key "osd.4"
2022-12-01 03:17:14.988322 I | op-k8sutil: deployment "rook-ceph-osd-4" did not change, nothing to update
2022-12-01 03:17:14.997776 I | op-osd: 1/1 node(s) completed osd provisioning
2022-12-01 03:17:15.438705 I | cephclient: successfully disallowed pre-octopus osds and enabled all new octopus-only functionality
2022-12-01 03:17:15.438752 I | op-osd: completed running osds in namespace rook-ceph
node
节点新增标签
[root@m1 ceph]# kubectl label node 192.168.100.133 ceph-osd=enabled
node/192.168.100.133 labeled
[root@m1 ceph]# kubectl label node 192.168.100.134 ceph-osd=enabled
node/192.168.100.134 labeled
[root@m1 ceph]# kubectl label node 192.168.100.135 ceph-osd=enabled
node/192.168.100.135 labeled
[root@m1 ceph]# kubectl label node 192.168.100.136 ceph-osd=enabled
node/192.168.100.136 labeled
- 修改
operator.yaml
配置文件,重读osd
信息
[root@m1 ~]# kubectl -n rook-ceph logs rook-ceph-osd-prepare-192.168.100.133-psmg8
2022-12-01 03:24:07.833336 I | cephcmd: desired devices to configure osds: [{Name:sdb OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:sdc OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2022-12-01 03:24:07.838526 I | rookcmd: starting Rook v1.5.5 with arguments '/rook/rook ceph osd provision'
2022-12-01 03:24:07.838543 I | rookcmd: flag values: --cluster-id=8279c0cb-e44f-4af6-8689-115025bb2940, --data-device-filter=, --data-device-path-filter=, --data-devices=[{"id":"sdb","storeConfig":{"storeType":"bluestore","osdsPerDevice":1}},{"id":"sdc","storeConfig":{"storeType":"bluestore","osdsPerDevice":1}}], --drive-groups=, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=DEBUG, --metadata-device=, --node-name=192.168.100.133, --operator-image=, --osd-crush-device-class=, --osd-database-size=0, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2022-12-01 03:24:07.838549 I | op-mon: parsing mon endpoints: a=10.68.231.222:6789,b=10.68.163.216:6789,c=10.68.61.127:6789
2022-12-01 03:24:08.785794 I | op-osd: CRUSH location=root=default host=192-168-100-133
2022-12-01 03:24:08.797646 I | cephcmd: crush location of osd: root=default host=192-168-100-133
2022-12-01 03:24:08.797739 D | exec: Running command: nsenter --mount=/rootfs/proc/1/ns/mnt -- /usr/sbin/lvm --help
2022-12-01 03:24:08.936849 I | cephosd: successfully called nsenter
2022-12-01 03:24:08.936869 I | cephosd: binary "/usr/sbin/lvm" found on the host, proceeding with osd preparation
2022-12-01 03:24:08.936875 D | exec: Running command: dmsetup version
2022-12-01 03:24:08.950594 I | cephosd: Library version: 1.02.171-RHEL8 (2020-05-28)
Driver version: 4.37.1
2022-12-01 03:24:11.286550 D | cephclient: No ceph configuration override to merge as "rook-config-override" configmap is empty
2022-12-01 03:24:11.286593 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2022-12-01 03:24:11.298967 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2022-12-01 03:24:11.309503 D | cephosd: config file @ /etc/ceph/ceph.conf: [global]
fsid = 17a413b5-f140-441a-8b35-feec8ae29521
mon initial members = a b c
mon host = [v2:10.68.231.222:3300,v1:10.68.231.222:6789],[v2:10.68.163.216:3300,v1:10.68.163.216:6789],[v2:10.68.61.127:3300,v1:10.68.61.127:6789]
public addr = 172.20.0.116
cluster addr = 172.20.0.116
[client.admin]
keyring = /var/lib/rook/rook-ceph/client.admin.keyring
2022-12-01 03:24:11.309570 I | cephosd: discovering hardware
2022-12-01 03:24:11.313813 D | exec: Running command: lsblk --all --noheadings --list --output KNAME
2022-12-01 03:24:11.413147 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.444846 D | exec: Running command: sgdisk --print /dev/sda
2022-12-01 03:24:11.463043 D | exec: Running command: udevadm info --query=property /dev/sda
2022-12-01 03:24:11.575745 D | exec: Running command: lsblk --noheadings --pairs /dev/sda
2022-12-01 03:24:11.580658 I | inventory: skipping device "sda" because it has child, considering the child instead.
2022-12-01 03:24:11.580703 D | exec: Running command: lsblk /dev/sda1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.584450 D | exec: Running command: udevadm info --query=property /dev/sda1
2022-12-01 03:24:11.602091 D | exec: Running command: lsblk /dev/sda2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.604729 D | exec: Running command: udevadm info --query=property /dev/sda2
2022-12-01 03:24:11.627506 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.630825 D | exec: Running command: sgdisk --print /dev/sdb
2022-12-01 03:24:11.645321 D | exec: Running command: udevadm info --query=property /dev/sdb
2022-12-01 03:24:11.650973 D | exec: Running command: lsblk --noheadings --pairs /dev/sdb
2022-12-01 03:24:11.656583 I | inventory: skipping device "sdb" because it has child, considering the child instead.
2022-12-01 03:24:11.656599 D | exec: Running command: lsblk /dev/sdc --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.658018 D | exec: Running command: sgdisk --print /dev/sdc
2022-12-01 03:24:11.666084 D | exec: Running command: udevadm info --query=property /dev/sdc
2022-12-01 03:24:11.671168 D | exec: Running command: lsblk --noheadings --pairs /dev/sdc
2022-12-01 03:24:11.676170 D | exec: Running command: lsblk /dev/sr0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.678167 W | inventory: skipping device "sr0". unsupported diskType rom
2022-12-01 03:24:11.678181 W | inventory: skipping rbd device "rbd0"
2022-12-01 03:24:11.678184 W | inventory: skipping rbd device "rbd1"
2022-12-01 03:24:11.678188 D | exec: Running command: lsblk /dev/dm-0 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.680000 D | exec: Running command: sgdisk --print /dev/dm-0
2022-12-01 03:24:11.682639 D | exec: Running command: udevadm info --query=property /dev/dm-0
2022-12-01 03:24:11.687596 D | exec: Running command: lsblk /dev/dm-1 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.689044 D | exec: Running command: sgdisk --print /dev/dm-1
2022-12-01 03:24:11.692035 D | exec: Running command: udevadm info --query=property /dev/dm-1
2022-12-01 03:24:11.698050 D | exec: Running command: lsblk /dev/dm-2 --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:11.699720 D | exec: Running command: sgdisk --print /dev/dm-2
2022-12-01 03:24:11.702599 D | exec: Running command: udevadm info --query=property /dev/dm-2
2022-12-01 03:24:11.714082 D | inventory: discovered disks are [0xc0003699e0 0xc000369e60 0xc00017ea20 0xc00017eea0 0xc00017f0e0 0xc00014a240]
2022-12-01 03:24:11.717932 I | cephosd: creating and starting the osds
2022-12-01 03:24:14.946745 D | cephosd: No Drive Groups configured.
2022-12-01 03:24:14.955043 D | cephosd: desiredDevices are [{Name:sdb OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false} {Name:sdc OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2022-12-01 03:24:14.955082 D | cephosd: context.Devices are [0xc0003699e0 0xc000369e60 0xc00017ea20 0xc00017eea0 0xc00017f0e0 0xc00014a240]
2022-12-01 03:24:14.955110 I | cephosd: skipping device "sda1" because it contains a filesystem "xfs"
2022-12-01 03:24:14.955115 I | cephosd: skipping device "sda2" because it contains a filesystem "LVM2_member"
2022-12-01 03:24:14.955136 D | exec: Running command: lsblk /dev/sdc --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME
2022-12-01 03:24:15.091745 D | exec: Running command: ceph-volume inventory --format json /dev/sdc
2022-12-01 03:24:17.897499 I | cephosd: device "sdc" is available.
2022-12-01 03:24:17.897546 I | cephosd: "sdc" found in the desired devices
2022-12-01 03:24:17.897552 I | cephosd: device "sdc" is selected by the device filter/name "sdc"
2022-12-01 03:24:17.897565 I | cephosd: skipping 'dm' device "dm-0"
2022-12-01 03:24:17.897567 I | cephosd: skipping 'dm' device "dm-1"
2022-12-01 03:24:17.897569 I | cephosd: skipping 'dm' device "dm-2"
2022-12-01 03:24:17.897756 I | cephosd: configuring osd devices: {"Entries":{"sdc":{"Data":-1,"Metadata":null,"Config":{"Name":"sdc","OSDsPerDevice":1,"MetadataDevice":"","DatabaseSizeMB":0,"DeviceClass":"","IsFilter":false,"IsDevicePathFilter":false},"PersistentDevicePaths":["/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:2:0"]}}}
2022-12-01 03:24:17.901538 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd"
2022-12-01 03:24:17.901858 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/297785192
2022-12-01 03:24:18.987463 I | cephosd: configuring new device sdc
2022-12-01 03:24:18.987533 I | cephosd: Base command - stdbuf
2022-12-01 03:24:18.987575 I | cephosd: immediateExecuteArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc]
2022-12-01 03:24:18.987590 I | cephosd: immediateReportArgs - [-oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc --report]
2022-12-01 03:24:18.987598 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc --report
2022-12-01 03:24:20.389969 D | exec: --> DEPRECATION NOTICE
2022-12-01 03:24:20.403403 D | exec: --> You are using the legacy automatic disk sorting behavior
2022-12-01 03:24:20.403406 D | exec: --> The Pacific release will change the default to --no-auto
2022-12-01 03:24:20.403409 D | exec: --> passed data devices: 1 physical, 0 LVM
2022-12-01 03:24:20.403411 D | exec: --> relative data size: 1.0
2022-12-01 03:24:20.403436 D | exec:
2022-12-01 03:24:20.403440 D | exec: Total OSDs: 1
2022-12-01 03:24:20.403442 D | exec:
2022-12-01 03:24:20.403444 D | exec: Type Path LV Size % of device
2022-12-01 03:24:20.403446 D | exec: ----------------------------------------------------------------------------------------------------
2022-12-01 03:24:20.403447 D | exec: data /dev/sdc 50.00 GB 100.00%
2022-12-01 03:24:20.574537 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sdc
2022-12-01 03:24:25.923999 D | exec: --> DEPRECATION NOTICE
2022-12-01 03:24:25.954928 D | exec: --> You are using the legacy automatic disk sorting behavior
2022-12-01 03:24:25.954933 D | exec: --> The Pacific release will change the default to --no-auto
2022-12-01 03:24:25.954937 D | exec: --> passed data devices: 1 physical, 0 LVM
2022-12-01 03:24:25.954938 D | exec: --> relative data size: 1.0
2022-12-01 03:24:25.954940 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2022-12-01 03:24:25.954943 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 58119fd3-2da0-42c2-8abb-7a3931068e2e
2022-12-01 03:24:25.954951 D | exec: Running command: /usr/sbin/vgcreate --force --yes ceph-71fd0a8a-7194-4b79-9987-a46a622743f5 /dev/sdc
2022-12-01 03:24:25.954955 D | exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
2022-12-01 03:24:25.954957 D | exec: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
2022-12-01 03:24:25.954959 D | exec: stdout: Physical volume "/dev/sdc" successfully created.
2022-12-01 03:24:25.954961 D | exec: stdout: Volume group "ceph-71fd0a8a-7194-4b79-9987-a46a622743f5" successfully created
2022-12-01 03:24:25.954968 D | exec: Running command: /usr/sbin/lvcreate --yes -l 12799 -n osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e ceph-71fd0a8a-7194-4b79-9987-a46a622743f5
2022-12-01 03:24:25.954970 D | exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
2022-12-01 03:24:25.954971 D | exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
2022-12-01 03:24:25.954973 D | exec: stderr: Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
2022-12-01 03:24:25.954974 D | exec: stdout: Logical volume "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e" created.
2022-12-01 03:24:25.954976 D | exec: Running command: /usr/bin/ceph-authtool --gen-print-key
2022-12-01 03:24:25.954979 D | exec: Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-5
2022-12-01 03:24:25.954980 D | exec: Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e
2022-12-01 03:24:25.954982 D | exec: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--71fd0a8a--7194--4b79--9987--a46a622743f5-osd--block--58119fd3--2da0--42c2--8abb--7a3931068e2e
2022-12-01 03:24:25.954983 D | exec: Running command: /usr/bin/ln -s /dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e /var/lib/ceph/osd/ceph-5/block
2022-12-01 03:24:25.954986 D | exec: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-5/activate.monmap
2022-12-01 03:24:25.954987 D | exec: stderr: got monmap epoch 3
2022-12-01 03:24:25.954989 D | exec: Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-5/keyring --create-keyring --name osd.5 --add-key AQBlHohjdTH9OBAA3kVn4rwfrPQvgN8gk3bYsQ==
2022-12-01 03:24:25.954990 D | exec: stdout: creating /var/lib/ceph/osd/ceph-5/keyring
2022-12-01 03:24:25.954991 D | exec: added entity osd.5 auth(key=AQBlHohjdTH9OBAA3kVn4rwfrPQvgN8gk3bYsQ==)
2022-12-01 03:24:25.954993 D | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/keyring
2022-12-01 03:24:25.954995 D | exec: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-5/
2022-12-01 03:24:25.954998 D | exec: Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/ceph-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-5/ --osd-uuid 58119fd3-2da0-42c2-8abb-7a3931068e2e --setuser ceph --setgroup ceph
2022-12-01 03:24:25.955007 D | exec: stderr: 2022-12-01T03:24:23.646+0000 7f126ee39f40 -1 bluestore(/var/lib/ceph/osd/ceph-5/) _read_fsid unparsable uuid
2022-12-01 03:24:25.955009 D | exec: stderr: 2022-12-01T03:24:23.688+0000 7f126ee39f40 -1 freelist read_size_meta_from_db missing size meta in DB
2022-12-01 03:24:25.955010 D | exec: --> ceph-volume lvm prepare successful for: /dev/sdc
2022-12-01 03:24:26.090265 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json
2022-12-01 03:24:27.445464 D | cephosd: {
"0": [
{
"devices": [
"/dev/sdb"
],
"lv_name": "osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
"lv_path": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
"lv_size": "53682896896",
"lv_tags": "ceph.block_device=/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4,ceph.block_uuid=6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=17a413b5-f140-441a-8b35-feec8ae29521,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=846588ac-71f8-4a78-83ee-3d70763e2ab4,ceph.osd_id=0,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0",
"lv_uuid": "6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j",
"name": "osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
"path": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
"tags": {
"ceph.block_device": "/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4",
"ceph.block_uuid": "6KfR8X-YAJa-KpCB-ftTr-TQnf-R0R9-OTCY6j",
"ceph.cephx_lockbox_secret": "",
"ceph.cluster_fsid": "17a413b5-f140-441a-8b35-feec8ae29521",
"ceph.cluster_name": "ceph",
"ceph.crush_device_class": "None",
"ceph.encrypted": "0",
"ceph.osd_fsid": "846588ac-71f8-4a78-83ee-3d70763e2ab4",
"ceph.osd_id": "0",
"ceph.osdspec_affinity": "",
"ceph.type": "block",
"ceph.vdo": "0"
},
"type": "block",
"vg_name": "ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c"
}
],
"5": [
{
"devices": [
"/dev/sdc"
],
"lv_name": "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
"lv_path": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
"lv_size": "53682896896",
"lv_tags": "ceph.block_device=/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e,ceph.block_uuid=CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=17a413b5-f140-441a-8b35-feec8ae29521,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=58119fd3-2da0-42c2-8abb-7a3931068e2e,ceph.osd_id=5,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0",
"lv_uuid": "CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ",
"name": "osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
"path": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
"tags": {
"ceph.block_device": "/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e",
"ceph.block_uuid": "CIc5aW-HWWT-finr-hEFR-q9Cx-k7tP-RueBNZ",
"ceph.cephx_lockbox_secret": "",
"ceph.cluster_fsid": "17a413b5-f140-441a-8b35-feec8ae29521",
"ceph.cluster_name": "ceph",
"ceph.crush_device_class": "None",
"ceph.encrypted": "0",
"ceph.osd_fsid": "58119fd3-2da0-42c2-8abb-7a3931068e2e",
"ceph.osd_id": "5",
"ceph.osdspec_affinity": "",
"ceph.type": "block",
"ceph.vdo": "0"
},
"type": "block",
"vg_name": "ceph-71fd0a8a-7194-4b79-9987-a46a622743f5"
}
]
}
2022-12-01 03:24:27.449559 I | cephosd: osdInfo has 1 elements. [{Name:osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 Path:/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 Tags:{OSDFSID:846588ac-71f8-4a78-83ee-3d70763e2ab4 Encrypted:0 ClusterFSID:17a413b5-f140-441a-8b35-feec8ae29521} Type:block}]
2022-12-01 03:24:27.449597 I | cephosd: osdInfo has 1 elements. [{Name:osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e Path:/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e Tags:{OSDFSID:58119fd3-2da0-42c2-8abb-7a3931068e2e Encrypted:0 ClusterFSID:17a413b5-f140-441a-8b35-feec8ae29521} Type:block}]
2022-12-01 03:24:27.449602 I | cephosd: 2 ceph-volume lvm osd devices configured on this node
2022-12-01 03:24:27.453319 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/192.168.100.133 --format json
2022-12-01 03:24:28.820501 D | cephosd: {}
2022-12-01 03:24:28.822962 I | cephosd: 0 ceph-volume raw osd devices configured on this node
2022-12-01 03:24:28.823177 I | cephosd: devices = [{ID:0 Cluster:ceph UUID:846588ac-71f8-4a78-83ee-3d70763e2ab4 DevicePartUUID: BlockPath:/dev/ceph-bfc68a00-2c58-4226-bd07-1e1e8eb8d15c/osd-block-846588ac-71f8-4a78-83ee-3d70763e2ab4 MetadataPath: WalPath: SkipLVRelease:false Location: LVBackedPV:false CVMode:lvm Store:bluestore} {ID:5 Cluster:ceph UUID:58119fd3-2da0-42c2-8abb-7a3931068e2e DevicePartUUID: BlockPath:/dev/ceph-71fd0a8a-7194-4b79-9987-a46a622743f5/osd-block-58119fd3-2da0-42c2-8abb-7a3931068e2e MetadataPath: WalPath: SkipLVRelease:false Location: LVBackedPV:false CVMode:lvm Store:bluestore}]
- 查看节点
pods
信息
[root@m1 ceph]# kubectl -n rook-ceph get pods
......
rook-ceph-osd-0-f4b99b44f-52z52 0/1 Pending 0 7m20s
rook-ceph-osd-1-6ff76dc6df-9k9fg 1/1 Running 0 7d
rook-ceph-osd-2-f4966d698-hzmcd 1/1 Running 0 7d
rook-ceph-osd-3-54cfdb49cb-pb6d7 1/1 Running 0 7d
rook-ceph-osd-4-566f4c5b4d-fqf5q 1/1 Running 0 6d23h
rook-ceph-osd-5-7b6589474f-rlsn9 0/1 Pending 0 1m
rook-ceph-osd-prepare-192.168.100.133-psmg8 0/1 Completed 0 8m6s
rook-ceph-osd-prepare-192.168.100.134-nmn7l 0/1 Completed 0 8m4s
rook-ceph-osd-prepare-192.168.100.135-n4k7x 0/1 Completed 0 7m59s
rook-ceph-osd-prepare-192.168.100.136-j5fkq 0/1 Completed 0 7m45s
rook-ceph-osd-prepare-192.168.100.137-pb422 0/1 Completed 0 7m36s
扩容失败排查-2
设定完标签调度机制后,重新看pods的状态发现此时 osd-0
状态处于 pending
状态,查看详情发现提示内存
资源是 Insufficient
,即资源不够
# 报错详情,1个节点内存资源不足,另外4个节点node亲和性不满足
[root@m1 ceph]# kubectl -n rook-ceph describe pods rook-ceph-osd-0-f4b99b44f-52z52
.....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m1s default-scheduler 0/5 nodes are available: 1 Insufficient memory, 4 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 5m1s default-scheduler 0/5 nodes are available: 1 Insufficient memory, 4 node(s) didn't match Pod's node affinity.
为啥资源不够呢?由于 m1
上承载 mgr
和 monitor
的设置,章节5 我们设定了 mon
, mgr
和 osd
的资源分配,此时有 2 个 osd
在 m1
上运行,会占用 4G
内存,从而导致资源不够。
- 查看
mgr
、monitor
和osd
资源配置
[root@m1 ceph]# vim cluster.yaml
180 resources:
181 # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
182 mon:
183 limits:
184 cpu: "300m"
185 memory: "512Mi"
186 requests:
187 cpu: "300m"
188 memory: "512Mi"
189 mgr:
190 limits:
191 cpu: "300m"
192 memory: "512Mi"
193 requests:
194 cpu: "300m"
195 memory: "512Mi"
196 osd:
197 limits:
198 cpu: "1000m"
199 memory: "2048Mi"
200 requests:
201 cpu: "1000m"
202 memory: "2048Mi"
- 修改配置,注释资源配置信息
[root@m1 ceph]# vim cluster.yaml
180 resources:
181 # The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
182 # mon:
183 # limits:
184 # cpu: "300m"
185 # memory: "512Mi"
186 # requests:
187 # cpu: "300m"
188 # memory: "512Mi"
189 # mgr:
190 # limits:
191 # cpu: "300m"
192 # memory: "512Mi"
193 # requests:
194 # cpu: "300m"
195 # memory: "512Mi"
196 # osd:
197 # limits:
198 # cpu: "1000m"
199 # memory: "2048Mi"
200 # requests:
201 # cpu: "1000m"
202 # memory: "2048Mi"
- 查看
pod
运行情况
[root@m1 ceph]# kubectl -n rook-ceph get pods
......
rook-ceph-osd-0-66dd4575f7-c64wh 1/1 Running 0 14s
rook-ceph-osd-1-5866f9f558-jq994 1/1 Running 0 11s
rook-ceph-osd-2-f4966d698-hzmcd 1/1 Running 0 7d2h
rook-ceph-osd-3-54cfdb49cb-pb6d7 1/1 Running 1 7d2h
rook-ceph-osd-4-566f4c5b4d-fqf5q 1/1 Running 0 7d1h
rook-ceph-osd-5-7c6ddb8b7c-qrmvb 1/1 Running 0 12s
- 检查
ceph
情况和osd
信息等
# 查看 ceph 集群
[root@m1 ceph]# ceph -s
cluster:
id: 17a413b5-f140-441a-8b35-feec8ae29521
health: HEALTH_WARN
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 2m)
mgr: a(active, since 5m)
mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
osd: 6 osds: 6 up (since 32s), 6 in (since 8m)
rgw: 2 daemons active (my.store.a, my.store.b)
task status:
data:
pools: 12 pools, 209 pgs
objects: 800 objects, 1.3 GiB
usage: 11 GiB used, 289 GiB / 300 GiB avail
pgs: 209 active+clean
io:
client: 7.3 KiB/s rd, 597 B/s wr, 9 op/s rd, 4 op/s wr
# 查看 osd 信息
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
[root@m1 ceph]# ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 192.168.100.133 1652M 48.3G 0 0 2 105 exists,up
1 192.168.100.134 1745M 48.2G 0 0 1 0 exists,up
2 192.168.100.135 2037M 48.0G 0 0 1 15 exists,up
3 192.168.100.136 1769M 48.2G 0 0 0 0 exists,up
4 192.168.100.137 1785M 48.2G 0 0 1 89 exists,up
5 192.168.100.133 1778M 48.2G 0 0 0 0 exists,up
[root@m1 ceph]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 0.04880 1.00000 50 GiB 1.8 GiB 770 MiB 645 KiB 1023 MiB 48 GiB 3.50 0.94 86 up
5 hdd 0.04880 1.00000 50 GiB 1.7 GiB 674 MiB 305 KiB 1024 MiB 48 GiB 3.32 0.89 71 up
1 hdd 0.04880 1.00000 50 GiB 1.7 GiB 717 MiB 4.2 MiB 1020 MiB 48 GiB 3.40 0.91 116 up
2 hdd 0.04880 1.00000 50 GiB 2.0 GiB 1008 MiB 2.2 MiB 1022 MiB 48 GiB 3.97 1.07 114 up
3 hdd 0.04880 1.00000 50 GiB 2.0 GiB 1.0 GiB 1.7 MiB 1022 MiB 48 GiB 4.02 1.08 125 up
4 hdd 0.04880 1.00000 50 GiB 2.0 GiB 1.0 GiB 3.7 MiB 1020 MiB 48 GiB 4.09 1.10 114 up
TOTAL 300 GiB 11 GiB 5.1 GiB 13 MiB 6.0 GiB 289 GiB 3.72
MIN/MAX VAR: 0.89/1.10 STDDEV: 0.32
日常发现 pods
无法运行时可以通过如下两种方式排查
kubectl describe pods
查看events
事件kubectl logs
查看容器内部日志信息
配置 bluestore
加速
Ceph
支持两种存储引擎
- Filestore:SSD作为journal
- Bluestore:WAL+DB=SSD
主流均已使用Bluestore
,对于Bluestore
加速需要将wal+db
存储在SSD
中,其到加速的作用,如下讲解配置,不涉及具体配置
- 查看主机磁盘信息
[root@n3 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 50G 0 disk
└─ceph--22b639dc--3c7c--4e8e--96fd--1472725a6f0e-osd--block--9bb28919--ad73--4eff--af5f--e7c36238a7d4
253:2 0 50G 0 lvm
sdc 8:32 0 50G 0 disk
sdd 8:48 0 50G 0 disk
sr0 11:0 1 973M 0 rom
rbd0 252:0 0 40G 0 disk /var/lib/kubelet/pods/c001d7c0-0294-4be2-a8c5-10137f870adc/volumes/kubernetes.io~csi/pvc-cfe4a32c-c
rbd2 252:32 0 10G 0 disk /var/lib/kubelet/pods/69f0f693-912f-44ff-9fbc-8d90001438f9/volumes/kubernetes.io~csi/pvc-174fb859-9
- 修改配置信息
[root@m1 ceph]# vim cluster.yaml
......
253 - name: "192.168.100.136"
254 devices:
255 - name: "sdb"
256 config:
257 storeType: bluestore
258 journalSizeMB: "4096"
259 devices:
260 - name: "sdc"
261 config:
262 storeType: bluestore
263 metadataDevice: "/dev/sdd"
264 databaseSizeMB: "4096"
265 walSizeMB: "4096"
[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured
- 查看新增磁盘信息
[root@m1 ceph]# kubectl get pods -n rook-ceph
rook-ceph-osd-0-66dd4575f7-c64wh 1/1 Running 4 31m
rook-ceph-osd-1-5866f9f558-jq994 1/1 Running 0 31m
rook-ceph-osd-2-647f9d7fc-v7rnl 1/1 Running 0 24m
rook-ceph-osd-3-7c6b7cb875-zf4tw 1/1 Running 2 23m
rook-ceph-osd-4-7699568dc6-hs4z5 1/1 Running 0 22m
rook-ceph-osd-5-7c6ddb8b7c-qrmvb 1/1 Running 7 31m
rook-ceph-osd-6-56f5fdb4cb-hhq2c 1/1 Running 0 5m39s
[root@m1 ceph]# ceph -s
cluster:
id: 17a413b5-f140-441a-8b35-feec8ae29521
health: HEALTH_WARN
Slow OSD heartbeats on back (longest 1473.860ms)
Slow OSD heartbeats on front (longest 1462.079ms)
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 5m)
mgr: a(active, since 64s)
mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
osd: 7 osds: 7 up (since 81s), 7 in (since 8m)
rgw: 2 daemons active (my.store.a, my.store.b)
task status:
data:
pools: 12 pools, 209 pgs
objects: 800 objects, 1.3 GiB
usage: 15 GiB used, 339 GiB / 354 GiB avail
pgs: 0.478% pgs not active
208 active+clean
1 peering
io:
client: 11 KiB/s rd, 895 B/s wr, 14 op/s rd, 6 op/s wr
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.34547 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.10149 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
6 hdd 0.05269 osd.6 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
[root@n3 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 39G 0 part
├─centos-root 253:0 0 37G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 50G 0 disk
└─ceph--22b639dc--3c7c--4e8e--96fd--1472725a6f0e-osd--block--9bb28919--ad73--4eff--af5f--e7c36238a7d4
253:2 0 50G 0 lvm
sdc 8:32 0 50G 0 disk
└─ceph--766c8c32--7ae9--40ec--817d--7ed72911c4f3-osd--block--18a961fe--90a1--4734--a0f1--88624ad93b88
253:3 0 50G 0 lvm
sdd 8:48 0 50G 0 disk
└─ceph--eefbf3cc--43a9--43ff--a628--83cf6d0e2d15-osd--db--ad701ecb--4320--4e11--a199--6f718ceaff39
253:4 0 4G 0 lvm
sr0 11:0 1 973M 0 rom
rbd0 252:0 0 40G 0 disk /var/lib/kubelet/pods/c001d7c0-0294-4be2-a8c5-10137f870adc/volumes/kubernetes.io~csi/pvc-cfe4a32c-c746-46af-84e8-cd981cdf8706/m
rbd2 252:32 0 10G 0 disk /var/lib/kubelet/pods/69f0f693-912f-44ff-9fbc-8d90001438f9/volumes/kubernetes.io~csi/pvc-174fb859-91ad-49e2-8e44-d7ee64645e7e/m
云原生删除 OSD
如果 OSD
所在的磁盘故障了或者需要更换配置,这时需要将 OSD
从集群中删除,删除 OSD
的注意事项有
- 删除
osd
后确保集群有足够的容量 - 删除
osd
后确保PG状态正常 - 单次尽可能不要删除过多的
osd
- 删除多个
osd
需要等待数据同步同步完毕后再执行(rebalancing
)
模拟 osd.6
故障,osd
故障时候 pods
状态会变成 CrashLoopBackoff
或者 error
状态,此时 ceph
中 osd
的状态也会变成 down
状态,通过如下方式可以模拟
- 手动设置
osd-6
的副本数量为0
[root@m1 ceph]# kubectl scale deployment rook-ceph-osd-6 --replicas=0 -n rook-ceph
- 查看
osd-6
是否已经down
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.34547 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.10149 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
6 hdd 0.05269 osd.6 down 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
通过 rook
提供的脚本来删除
- 修改
osd-purge.yaml
资源清单,云原生删除osd
坏盘
[root@m1 ceph]# cat osd-purge.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: rook-ceph-purge-osd
namespace: rook-ceph # namespace:operator
labels:
app: rook-ceph-purge-osd
spec:
template:
spec:
serviceAccountName: rook-ceph-system
containers:
- name: osd-removal
image: rook/ceph:v1.5.5
# TODO: Insert the OSD ID in the last parameter that is to be removed
# The OSD IDs are a comma-separated list. For example: "0" or "0,2".
#args: ["ceph", "osd", "remove", "--osd-ids", "<OSD-IDs>"]
#########################################################
# 修改此次 <OSD-IDs> 为 6即可,如果有多个,可以使用","分割即可 #
#########################################################
args: ["ceph", "osd", "remove", "--osd-ids", "6"]
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: ROOK_MON_ENDPOINTS
valueFrom:
configMapKeyRef:
key: data
name: rook-ceph-mon-endpoints
- name: ROOK_CEPH_USERNAME
valueFrom:
secretKeyRef:
key: ceph-username
name: rook-ceph-mon
- name: ROOK_CEPH_SECRET
valueFrom:
secretKeyRef:
key: ceph-secret
name: rook-ceph-mon
- name: ROOK_CONFIG_DIR
value: /var/lib/rook
- name: ROOK_CEPH_CONFIG_OVERRIDE
value: /etc/rook/config/override.conf
- name: ROOK_FSID
valueFrom:
secretKeyRef:
key: fsid
name: rook-ceph-mon
- name: ROOK_LOG_LEVEL
value: DEBUG
volumeMounts:
- mountPath: /etc/ceph
name: ceph-conf-emptydir
volumes:
- emptyDir: {}
name: ceph-conf-emptydir
restartPolicy: Never
- 执行此脚本
[root@m1 ceph]# kubectl apply -f osd-purge.yaml
job.batch/rook-ceph-purge-osd created
此时 Ceph
会自动的进行数据的同步,等带数据的同步完成
- 查看
osd
是否删除
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
[root@m1 ceph]# ceph osd crush dump | grep devices -A 50
"devices": [
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
{
"id": 1,
"name": "osd.1",
"class": "hdd"
},
{
"id": 2,
"name": "osd.2",
"class": "hdd"
},
{
"id": 3,
"name": "osd.3",
"class": "hdd"
},
{
"id": 4,
"name": "osd.4",
"class": "hdd"
},
{
"id": 5,
"name": "osd.5",
"class": "hdd"
}
],
删除不必要的 Deployment
[root@m1 ceph]# kubectl -n rook-ceph delete deploy rook-ceph-osd-6
deployment.apps "rook-ceph-osd-6" deleted
由于 cluster.yaml
中关闭了 useAllNodes
和 useAllDevices
,因此需要将 osd
的信息从 nodes
中删除,避免 apply
之后重新添加回集群。
- 修改
cluster.yaml
配置
[root@m1 ceph]# vim cluster.yaml
253 - name: "192.168.100.136"
254 devices:
255 - name: "sdb"
256 config:
257 storeType: bluestore
258 journalSizeMB: "4096"
259 # devices:
260 # - name: "sdc"
261 # config:
262 # storeType: bluestore
263 # metadataDevice: "/dev/sdd"
264 # databaseSizeMB: "4096"
265 # walSizeMB: "4096"
重新应用下 cluster.yaml
资源清单
[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured
手动删除 OSD
除了使用云原生的方式删除 osd
之外,也可以使用 Ceph
标准的方式进行删除,如下是删除的方法
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 down 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
将 osd
标识为 out
[root@m1 ceph]# ceph osd out osd.5
marked out osd.5.
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-5 0.09760 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
5 hdd 0.04880 osd.5 down 0 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
删除 out
,此时会进行元数据的同步,即 backfilling
和 rebalancing
动作,完成数据的迁移
[root@m1 ceph]# ceph osd purge 5
purged osd.5
通过 ceph -s 可以看到数据的迁移过程
[root@m1 ceph]# ceph -s
cluster:
id: 17a413b5-f140-441a-8b35-feec8ae29521
health: HEALTH_WARN
Degraded data redundancy: 46/2400 objects degraded (1.917%), 2 pgs degraded, 3 pgs undersized
2 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 5m)
mgr: a(active, since 10m)
mds: myfs:2 {0=myfs-d=up:active,1=myfs-a=up:active} 2 up:standby-replay
osd: 5 osds: 5 up (since 2m), 5 in (since 5m); 6 remapped pgs
rgw: 2 daemons active (my.store.a, my.store.b)
task status:
data:
pools: 12 pools, 209 pgs
objects: 800 objects, 1.3 GiB
usage: 9.5 GiB used, 241 GiB / 250 GiB avail
pgs: 46/2400 objects degraded (1.917%)
14/2400 objects misplaced (0.583%)
203 active+clean
3 active+remapped+backfill_wait
2 active+recovery_wait+undersized+degraded+remapped
1 active+recovering+undersized+remapped
io:
client: 1.7 KiB/s rd, 3 op/s rd, 0 op/s wr
recovery: 0 B/s, 8 objects/s
查看 osd
的目录树,可以发现 osd
已经删除
[root@m1 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.24399 root default
-5 0.04880 host 192-168-100-133
0 hdd 0.04880 osd.0 up 1.00000 1.00000
-3 0.04880 host 192-168-100-134
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-7 0.04880 host 192-168-100-135
2 hdd 0.04880 osd.2 up 1.00000 1.00000
-9 0.04880 host 192-168-100-136
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-11 0.04880 host 192-168-100-137
4 hdd 0.04880 osd.4 up 1.00000 1.00000
[root@m1 ceph]# ceph osd crush dump | grep devices -A 50
"devices": [
{
"id": 0,
"name": "osd.0",
"class": "hdd"
},
{
"id": 1,
"name": "osd.1",
"class": "hdd"
},
{
"id": 2,
"name": "osd.2",
"class": "hdd"
},
{
"id": 3,
"name": "osd.3",
"class": "hdd"
},
{
"id": 4,
"name": "osd.4",
"class": "hdd"
}
],
将对应 deployment
和 cluster.yaml
中的内容删除
[root@m1 ceph]# kubectl -n rook-ceph delete deploy rook-ceph-osd-5
deployment.apps "rook-ceph-osd-5" deleted
[root@m1 ceph]# vim cluster.yaml
230 nodes:
231 - name: "192.168.100.133"
232 devices:
233 - name: "sdb"
234 config:
235 storeType: bluestore
236 journalSizeMB: "4096"
237 # - name: "sdc"
238 # config:
239 # storeType: bluestore
240 # journalSizeMB: "4096"
重新应用下 cluster.yaml
资源清单
[root@m1 ceph]# kubectl apply -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph configured
OSD
替换方法
Replace an OSD
To replace a disk that has failed:
- Run the steps in the previous section to Remove an OSD.
- Replace the physical device and verify the new device is attached.
- Check if your cluster CR will find the new device. If you are using
useAllDevices: true
you can skip this step. If your cluster CR lists individual devices or uses a device filter you may need to update the CR. - The operator ideally will automatically create the new OSD within a few minutes of adding the new device or updating the CR. If you don’t see a new OSD automatically created, restart the operator (by deleting the operator pod) to trigger the OSD creation.
- Verify if the OSD is created on the node by running
ceph osd tree
from the toolbox.
替换操作的思路是:
- 将其从
Ceph
集群中删除—采用云原生方式或手动方式 - 删除之后数据同步完毕后再通过扩容的方式添加回集群中
- 添加回来时候注意将对应的LVM删除