云原生环境搭建第三篇:Ceph存储
云原生环境搭建第三篇:Ceph存储
GlusterFS在k8s1.25中被移除,所以开源的,社区活跃度高的分布式存储最优选就是ceph,本文使用rook-ceph部署ceph集群。
ceph是一个开源的存储平台。它的架构如下:
- Ceph存储集群(RADOS):这是ceph的基础层,也是核心。这个组建是一个分布式的对象存储服务,能够提供API以及设备管理。
- Ceph块设备(RBD):这个组件位于RADOS上层,通过Linux内核和KVM/QEMU驱动提供一个分布式的块设备。可以为虚拟机提供虚拟磁盘,或通过内核映射为物理机提供磁盘空间。
- Ceph文件系统(CephFS):这个组件位于RADOS上层,提供了一个POSIX兼容的文件系统,可以在多个节点上进行扩展,同时通过元数据服务器管理文件系统的命名空间。
- Ceph对象存储(RGW):这个组件位于RADOS上层,提供对象存储,兼容s3和swift的网关接口。可以通过RESTful API访问ceph存储。
ceph的核心组件包括 OSD、Monitor、和MDS
- Ceph Monitors:在Ceph系统中,它用来监视ceph集群,维护集群健康状态。维护Ceph系统中的关键信息,如OSD Map、Map。当用户需要存储数据到ceph集群中时,需要先通过monitor获取最新的Map,然后根据Map图和object id等计算出数据最终的存储位置。在Rook-Ceph环境中,Rook Operator会负责部署和管理Ceph Monitors。
- Ceph OSDs:OSDs是Ceph系统的数据存储节点,它的主要功能是存储数据、复制数据、平衡数据、恢复数据等,与其它OSD间进行心跳检查等,并将一些变化情况上报给Ceph Monitor,一般一个OSD对应一个物理或逻辑磁盘。在Rook-Ceph环境中,Rook Operator也会负责部署和管理OSDs。
- Ceph MDS:Ceph Metadata Server,主要保存文件系统的元数据,但对象存储和块存储设备不需要使用它。
- Ceph MGR:ceph的早期版本中,monitor负责存储u集群的远数据,随着ceph系统的扩展,元数据管理变得复杂,ceph manager就是用来解决这个问题。
假设Ceph为一个城市。城市的基础设施(RADOS)是由多个存储节点(Object Storage Daemons,OSDs)和几个元数据服务节点(Monitors)组成的。这些节点就像城市的建筑和路网。
每个OSD可以想象成一个小型仓库,用于存储数据。OSDs之间可以进行数据的复制和分配,类似于城市里各个仓库之间的物流系统。
Monitors则类似于城市的信息中心,它们保存着有关OSD和其他Monitors的信息,确保整个系统的一致性。
现在,城市(Ceph)有三个主要的入口:Ceph块设备(RBD),Ceph文件系统(CephFS),和Ceph对象网关(RGW)。每个入口都对应不同的服务类型。例如,RBD类似于自助仓储,客户可以直接在仓库(OSD)中存储或检索块数据。CephFS则更像是传统的文件存储系统,客户可以在其中创建目录和文件。RGW则提供了一个兼容S3和Swift的接口,客户可以通过RESTful API进行数据存取。
Rook-Ceph是一个开源的存储编排器,能够快速在k8s上部署集群并管理集群。Rook Ceph Operatir是rook的核心,管理所有与ceph集群的任务,它是一个定制的k8s控制器,用于观察和调整ceph集群的状态。并且负责创建和管理其他ceph组建,比如monitors、OSDs以及其他组件。
0. 准备工作
0.1 准备k8s集群
准备好k8s集群,这个集群是之前搭建好的,集群搭建方法在(这里)[https://www.ytg2097.com/cloud-native/k8s/k8s-cluster-1.27.3.html]
host | hostname | os | role | hardware |
---|---|---|---|---|
10.20.24.51 | 10-20-24-51 | centos7.9 | control-plane | cpu:4c 内存: 16G 硬盘:500G |
10.20.24.54 | 10-20-24-54 | centos7.9 | worker | cpu:4c 内存: 16G 硬盘1:500G 硬盘2:1T |
10.20.24.55 | 10-20-24-55 | centos7.9 | worker | cpu:4c 内存: 16G 硬盘1:500G 硬盘2:1T |
10.20.24.56 | 10-20-24-56 | centos7.9 | worker | cpu:4c 内存: 16G 硬盘1:500G 硬盘2:1T |
注意。要配置 Ceph 存储集群,至少需要以下本地存储类型之一:
- 原始设备(无分区或格式化文件系统)
- 原始分区(无格式化文件系统)
- LVM 逻辑卷(无格式化文件系统)
block
模式下存储类可用的持久卷
使用以下命令确认分区或设备是否使用文件系统进行格式化:
[root@10-20-24-54 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sdb
sr0
sda
├─sda2 LVM2_member hHkL3N-vxe0-112l-wj0I-gCj1-MpQI-5OdZZe
│ ├─centos-swap
swap e48b0faf-b82b-4063-bbfc-14602d80c7ff
│ └─centos-root
xfs 51c645fe-8655-4a1a-b669-27a936380df2 /
└─sda1 xfs d8580215-c95f-4eed-b255-ae212cb01f6a /boot
如果FSTYPE不为空,就表示相应设备上已经有一个文件系统了,这个示例中vdb可以供ROOK使用,vda和其分区不可以。
0.2 准备内核配置
OSD在部分场景依赖LVM https://rook.github.io/docs/rook/v1.11/Getting-Started/Prerequisites/prerequisites/#lvm-package
sudo yum install -y lvm2
ceph需要使用rbd模块构建的linux内核,检查一下有没有,如果没有则需要重建内核 https://rook.github.io/docs/rook/v1.11/Getting-Started/Prerequisites/prerequisites/#kernel
[root@10-20-24-54 ~]# lsmod | grep rbd
rbd 118784 10
libceph 483328 1 rbd
如果需要使用CephFS创建volume的话需要至少4.17版本的内核,否则PVC的请求配额大小不会被执行,存储配额制在较新的内核上强制执行。
[root@10-20-24-54 ~]# uname -r
3.10.0-1160.el7.x86_64
# 开始升级
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install kernel-ml-devel kernel-ml-headers kernel-ml -y
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
uname -sr
# 升级之后
[root@10-20-24-54 ~]# uname -r
6.4.2-1.el7.elrepo.x86_64
1. 部署集群
1.1 拉取git分支
git clone --single-branch --branch v1.11.9 https://github.com/rook/rook.git
cd rook/deploy/examples
1.2 替换镜像
rook-ceph中所需的镜像大部分都在quay.io和registry.k8s.io中,国内网络原因无法拉取,并且截止本文发布当天,阿里源中也没有所需镜像。所以需要自行拉取并替换,
#vi operator.yaml
#找到一下内容
# ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.8.0"
# ROOK_CSI_REGISTRAR_IMAGE: "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0"
# ROOK_CSI_RESIZER_IMAGE: "registry.k8s.io/sig-storage/csi-resizer:v1.7.0"
# ROOK_CSI_PROVISIONER_IMAGE: "registry.k8s.io/sig-storage/csi-provisioner:v3.4.0"
# ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1"
# ROOK_CSI_ATTACHER_IMAGE: "registry.k8s.io/sig-storage/csi-attacher:v4.1.0"
# 然后替换为以下内容
ROOK_CSI_CEPH_IMAGE: "10.20.24.50/library/cephcsi/cephcsi:v3.8.0"
ROOK_CSI_REGISTRAR_IMAGE: "10.20.24.50/library/sig-storage/csi-node-driver-registrar:v2.7.0"
ROOK_CSI_RESIZER_IMAGE: "10.20.24.50/library/sig-storage/csi-resizer:v1.7.0"
ROOK_CSI_PROVISIONER_IMAGE: "10.20.24.50/library/sig-storage/csi-provisioner:v3.4.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "10.20.24.50/library/sig-storage/csi-snapshotter:v6.2.1"
ROOK_CSI_ATTACHER_IMAGE: "10.20.24.50/library/sig-storage/csi-attacher:v4.1.0"
# 找到以下内容并替换
image: rook/ceph:v1.11.9 # 替换为10.20.24.50/library/rook/ceph:v1.11.9
#vi cluster.yaml
image: quay.io/ceph/ceph:v17.2.6 #替换为image: 10.20.24.50/library/ceph/ceph:v17.2.6
1.3 部署operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# 查看全部处于running状态后进入下一步
kubectl get all -n rook-ceph --watch
1.4 部署ceph集群
kubectl create -f cluster.yaml
[root@10-20-24-51 examples]# kubectl get pods -n rook-ceph --watch
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-jzbsj 2/2 Running 0 4d21h
csi-cephfsplugin-p9kt8 2/2 Running 0 4d21h
csi-cephfsplugin-provisioner-c48d85cb7-cj7hm 5/5 Running 0 4d21h
csi-cephfsplugin-provisioner-c48d85cb7-h6n8x 5/5 Running 0 4d21h
csi-cephfsplugin-zkxlb 2/2 Running 0 4d21h
csi-rbdplugin-6w9mc 2/2 Running 0 4d21h
csi-rbdplugin-7wk8z 2/2 Running 0 4d21h
csi-rbdplugin-hfcfs 2/2 Running 0 4d21h
csi-rbdplugin-provisioner-c56b7dc95-nnknk 5/5 Running 0 4d21h
csi-rbdplugin-provisioner-c56b7dc95-x8nfh 5/5 Running 0 4d21h
rook-ceph-crashcollector-10-20-24-54-6bb7f8b7dc-5v9s8 1/1 Running 0 4d21h
rook-ceph-crashcollector-10-20-24-55-7988cb995b-wf9zg 1/1 Running 0 4d21h
rook-ceph-crashcollector-10-20-24-56-6b49f749df-r2sx5 1/1 Running 0 4d21h
rook-ceph-mgr-a-7696fbb478-wbhqk 3/3 Running 0 4d21h
rook-ceph-mgr-b-599cfb7fd5-5pvkn 3/3 Running 0 4d21h
rook-ceph-mon-a-57498df9b-jkswq 2/2 Running 0 4d21h
rook-ceph-mon-b-d59c5cdf8-h6r2p 2/2 Running 0 4d21h
rook-ceph-mon-c-79cf789896-pbrf7 2/2 Running 0 4d21h
rook-ceph-operator-76cc696798-mgpf8 1/1 Running 0 4d21h
rook-ceph-osd-0-6c6b57cd7c-pq4sx 2/2 Running 0 4d21h
rook-ceph-osd-1-5956c4b8b-s2v84 2/2 Running 0 4d21h
rook-ceph-osd-2-7d44fc865c-d7dtz 2/2 Running 0 4d21h
rook-ceph-osd-prepare-10-20-24-54-wlrg7 0/1 Completed 0 115m
rook-ceph-osd-prepare-10-20-24-55-5bpvg 0/1 Completed 0 115m
rook-ceph-osd-prepare-10-20-24-56-4vz2n 0/1 Completed 0 115m
所有osd-prepare应该全部处于completed,其他pod全部处于running状态
1.5 验证集群
kubectl create -f toolbox.yaml
ytg@ytgdeMacBook-Pro ~ % kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
...
rook-ceph-tools-5c76bc5fdc-4q27c 1/1 Running 0 4d21h
ytg@ytgdeMacBook-Pro ~ % kubectl exec -it rook-ceph-tools-5c76bc5fdc-4q27c -n rook-ceph -- bash
bash-4.4$ ceph status
cluster:
id: e0272d4c-08f2-4a6b-aa7b-ff5d5f442a55
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 4d)
mgr: a(active, since 119m), standbys: b
osd: 3 osds: 3 up (since 4d), 3 in (since 4d)
data:
pools: 3 pools, 65 pgs
objects: 3.84k objects, 14 GiB
usage: 44 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 65 active+clean
io:
client: 298 KiB/s wr, 0 op/s rd, 24 op/s wr
bash-4.4$ ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 10-20-24-54 14.5G 985G 13 161k 0 0 exists,up
1 10-20-24-56 14.5G 985G 11 144k 0 0 exists,up
2 10-20-24-55 14.5G 985G 10 100k 0 0 exists,up
1.6. 部署nodeport用于访问Dashboard
kubectl create -f dashboard-external-https.yaml
# 查看初始密码
ytg@ytgdeMacBook-Pro ~ % kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
g)|D,*dV]|[.(kf`C@VV
# 查看nodeport
ytg@ytgdeMacBook-Pro ~ % kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.110.199.225 <none> 9283/TCP 4d21h
rook-ceph-mgr-dashboard ClusterIP 10.106.177.242 <none> 8443/TCP 4d21h
rook-ceph-mgr-dashboard-external-https NodePort 10.100.39.43 <none> 8443:31163/TCP 4d21h
rook-ceph-mon-a ClusterIP 10.96.237.66 <none> 6789/TCP,3300/TCP 4d21h
rook-ceph-mon-b ClusterIP 10.109.242.180 <none> 6789/TCP,3300/TCP 4d21h
rook-ceph-mon-c ClusterIP 10.103.138.229 <none> 6789/TCP,3300/TCP 4d21h
2. 部署StorageClass
# vi ceph-storage-class.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
clusterID: rook-ceph
# Ceph pool into which the RBD image shall be created
pool: replicapool
# (optional) mapOptions is a comma-separated list of map options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# mapOptions: lock_on_read,queue_depth=1024
# (optional) unmapOptions is a comma-separated list of unmap options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# unmapOptions: force
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features
# Available for imageFormat: "2". Older releases of CSI RBD
# support only the `layering` feature. The Linux kernel (KRBD) supports the
# full complement of features as of 5.4
# `layering` alone corresponds to Ceph's bitfield value of "2" ;
# `layering` + `fast-diff` + `object-map` + `deep-flatten` + `exclusive-lock` together
# correspond to Ceph's OR'd bitfield value of "63". Here we use
# a symbolic, comma-separated format:
# For 5.4 or later kernels:
#imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
# For 5.3 or earlier kernels:
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
# in hyperconverged settings where the volume is mounted on the same node as the osds.
csi.storage.k8s.io/fstype: ext4
# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete
# Optional, if you want to add dynamic resize for PVC.
# For now only ext3, ext4, xfs resize support provided, like in Kubernetes itself.
allowVolumeExpansion: true
验证
[root@10-20-24-51 examples]# kubectl get crd | grep cephblockpools.ceph.rook.io
cephblockpools.ceph.rook.io 2023-07-13T05:06:37Z
[root@10-20-24-51 examples]# kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 4d21h
[root@10-20-24-51 examples]# kubectl get cephblockpool -n rook-ceph
NAME PHASE
replicapool Ready
设置为默认存储
[root@10-20-24-51 examples]# kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
完毕