ceph 003 对osd操作 对存储池操作 存储池配额 存储池快照 pgp
主机被加入集群时,会自动被分配角色以达到集群的默认状态。(mon,mgr之类)
想要超过默认状态可以进行设置
ceph容器与客户端
ceph集群的客户端
需要
ceph-common 软件包
ceph.conf 配置文件
ceph.client.admin.keyring 秘钥文件
[root@ceph01 ceph]# ls
ceph.client.admin.keyring ceph.conf ceph.pub rbdmap
[root@ceph01 ceph]# pwd
/etc/ceph
容器也拥有配置文件与秘钥
[root@ceph01 ceph]# cephadm shell
Inferring fsid cb8f4abe-14a7-11ed-a76d-000c2939fb75
Inferring config /var/lib/ceph/cb8f4abe-14a7-11ed-a76d-000c2939fb75/mon.ceph01.example.com/config
Using recent ceph image quay.io/ceph/ceph@sha256:c3336a5b10b069b127d1a66ef97d489867fc9c2e4f379100e5a06f99f137a420
[ceph: root@ceph01 /]#
[ceph: root@ceph01 /]# cd /etc/ceph/
[ceph: root@ceph01 ceph]# ls
ceph.conf ceph.keyring rbdmap
[ceph: root@ceph01 ceph]# pwd
/etc/ceph
cephadm shell进入并创建临时容器。
容器里的配置文件来自宿主机,宿主机映射到容器
举例
[root@ceph01 ceph]# docker inspect f90bf6876862 | tail -n 7
{
"Type": "bind",
"Source": "/etc/ceph/ceph.client.admin.keyring",
"Destination": "/etc/ceph/ceph.keyring",
"Mode": "z",
"RW": true,
"Propagation": "rprivate"
},
ceph03其他节点没有ceph.client.admin.keyring
所以无法访问,但是其他节点是有ceph.conf并可以进入临时容器
[ceph: root@ceph03 /]# ceph -s
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 AuthRegistry(0x7f525405eba0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 AuthRegistry(0x7f5258b76f90) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
[root@ceph01 ceph]# ceph -s
cluster:
id: cb8f4abe-14a7-11ed-a76d-000c2939fb75
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph01.example.com,ceph02,ceph03 (age 41m)
mgr: ceph02.alqzfq(active, since 41m), standbys: ceph01.example.com.wvuoii
osd: 9 osds: 9 up (since 41m), 9 in (since 25h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 9.1 GiB used, 171 GiB / 180 GiB avail
pgs: 1 active+clean
quorum(仲裁) 默认mon半数以上才算存活
查帮助加mon之类角色
in为在集群里 up是在运行
[root@ceph01 ceph]# ceph orch -h | grep mon
orch daemon add [mon|mgr|rbd-mirror|crash|alertmanager| Add daemon(s)
添加mgr
[root@ceph01 ceph]# ceph orch daemon add mgr --placement=ceph03.example.com
Deployed mgr.ceph03.rujrrk on host 'ceph03.example.com'
osd
ceph orch ps 查看所有ceph服务
通过图形化可以精确看到哪个osd属于哪个磁盘,属于哪个服务
dashboard inventory
[root@ceph01 ceph]# ceph orch daemon stop osd.8
Scheduled to stop osd.8 on host 'ceph03.example.com'
unknown状态
osd.8 ceph03.example.com stopped 3s ago 55s <unknown> quay.io/ceph/ceph:v15 <unknown> <unknown>
[root@ceph01 ceph]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.17537 root default
-3 0.05846 host ceph01
0 hdd 0.01949 osd.0 up 1.00000 1.00000
1 hdd 0.01949 osd.1 up 1.00000 1.00000
2 hdd 0.01949 osd.2 up 1.00000 1.00000
-5 0.05846 host ceph02
3 hdd 0.01949 osd.3 up 1.00000 1.00000
4 hdd 0.01949 osd.4 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-7 0.05846 host ceph03
6 hdd 0.01949 osd.6 up 1.00000 1.00000
7 hdd 0.01949 osd.7 up 1.00000 1.00000
8 hdd 0.01949 osd.8 down 1.00000 1.00000
[root@ceph01 ceph]# ceph orch daemon rm osd.8 --force
Removed osd.8 from host 'ceph03.example.com'
移除服务,会掉数据,很危险
容器对应容器被删除
[root@ceph01 ceph]# ceph osd rm osd.8
removed osd.8
删除osd
这里直接显示osd少一个了
存储驱动从filestore 换成了 bluestore
filestore时,可以看到对象,存在目录里以pg组的形式
bluestore已经无法看到对象,对象直接在裸盘上
bluestore底层为lvm
清除ceph的逻辑卷,完全恢复该磁盘sdd
[root@ceph01 ceph]# ceph orch device zap ceph03.example.com /dev/sdd --force
如果ceph -s出现Degraded data redundancy
可以考虑删除osd crush视图
ceph osd crush remove osd.6
如果你移除三个盘,刚好三副本在这三个盘上
不要一次性加入或移除多个盘。
假如我有100个盘,又加入100个新磁盘
那么新磁盘不能不干活,所以会触发数据重平衡,大量占用资源,那么其他人读的时候会卡
[ceph: root@ceph03 /]# ceph orch device ls
Hostname Path Type Serial Size Health Ident Fault Available
ceph01.example.com /dev/sdb hdd 21.4G Unknown N/A N/A No
ceph01.example.com /dev/sdc hdd 21.4G Unknown N/A N/A No
ceph01.example.com /dev/sdd hdd 21.4G Unknown N/A N/A No
ceph02.example.com /dev/sdb hdd 21.4G Unknown N/A N/A No
ceph02.example.com /dev/sdc hdd 21.4G Unknown N/A N/A No
ceph02.example.com /dev/sdd hdd 21.4G Unknown N/A N/A No
ceph03.example.com /dev/sdb hdd 21.4G Unknown N/A N/A No
ceph03.example.com /dev/sdc hdd 21.4G Unknown N/A N/A YES
ceph03.example.com /dev/sdd hdd 21.4G Unknown N/A N/A No
[ceph: root@ceph03 /]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
获得所有可available的磁盘 (我感觉一个个加磁盘,比较踏实)
ceph 存储池
pool存储池可以做隔离,哪些人可以访问存储池,各个存储池可以策略不一样
存储池里面有pg。
存储池 分为 副本池与纠删码池
3副本池利用率只有3/1 (raid1)
纠删码的目的为节约空间 (raid3)
[root@ceph01 ceph]# ceph osd pool create pool1
pool 'pool1' created
[root@ceph01 ceph]# ceph osd pool ls
device_health_metrics
pool1
[root@ceph01 ceph]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94 flags hashpspool stripe_width 0
replicated副本 min_size 2 最小可用副本数为2个
如果副本小于2就不能访问
crush_rule 0 object_hash rjenkins 规则与hash算法
pg_mum 32 默认值 少于这个值根本体现不出分布式的优点
osd pool create <pool> [<pg_num:int>] [<pgp_num:int>] [replicated|erasure] [<erasure_code_profile>]
erasure 纠删码池,erasure_code_profile 算法规则
[root@ceph01 ceph]# ceph pg dump pgs_brief
dumped pgs_brief
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
2.1f active+clean [0,3,8] 0 [0,3,8] 0
2.1e active+clean [2,7,5] 2 [2,7,5] 2
2.1d active+clean [7,3,2] 7 [7,3,2] 7
2.1c active+clean [8,4,1] 8 [8,4,1] 8
2.1b active+clean [6,5,2] 6 [6,5,2] 6
2.1a active+clean [3,8,2] 3 [3,8,2] 3
2.19 active+clean [0,4,7] 0 [0,4,7] 0
2.18 active+clean [4,7,2] 4 [4,7,2] 4
2.17 active+clean [6,2,4] 6 [6,2,4] 6
2.16 active+clean [5,7,1] 5 [5,7,1] 5
2.15 active+clean [7,1,3] 7 [7,1,3] 7
2.14 active+clean [8,4,0] 8 [8,4,0] 8
2.13 active+clean [7,4,2] 7 [7,4,2] 7
2.12 active+clean [7,1,3] 7 [7,1,3] 7
2.11 active+clean [6,3,1] 6 [6,3,1] 6
2.10 active+clean [8,1,5] 8 [8,1,5] 8
2.f active+clean [8,4,0] 8 [8,4,0] 8
2.4 active+clean [1,7,3] 1 [1,7,3] 1
2.2 active+clean [5,1,8] 5 [5,1,8] 5
2.1 active+clean [2,6,3] 2 [2,6,3] 2
2.3 active+clean [5,7,1] 5 [5,7,1] 5
1.0 active+clean [3,7,2] 3 [3,7,2] 3
2.0 active+clean [3,6,0] 3 [3,6,0] 3
2.5 active+clean [8,0,4] 8 [8,0,4] 8
2.6 active+clean [1,6,3] 1 [1,6,3] 1
2.7 active+clean [3,7,2] 3 [3,7,2] 3
2.8 active+clean [3,7,0] 3 [3,7,0] 3
2.9 active+clean [1,4,8] 1 [1,4,8] 1
2.a active+clean [6,1,3] 6 [6,1,3] 6
2.b active+clean [8,5,2] 8 [8,5,2] 8
2.c active+clean [6,0,5] 6 [6,0,5] 6
2.d active+clean [3,2,7] 3 [3,2,7] 3
2.e active+clean [2,3,7] 2 [2,3,7] 2
osd可用给多个pg使用,osd最多承载200pg左右(经验)
pg数量决定pool的性能
pg数量多,数据在集群中就越分散
假如一个池两个pg,那么最多6个osd,都不重复。数据就只会存在这6osd上,
在加一个pg,那么就有9个,osd越多,数据越分散
读可用一起读,并发
坏盘问题也可也解决
100G的osd 最好用80G
[root@ceph01 ceph]# ceph pg dump pgs_brief 8
指定八个pg
故障域
三个副本放在三个不同的主机上
三个副本放在节点三个不同的osd上 (有可能是一个节点)
pgp
这pgp为排列方式,当扩大pg时,pgp未扩大,那么数据还是在原来的osd。如果pgp扩大,那么数据将会发生移动,到新的osd
如果pg_num 16 pgp_num 8
那么有8个pg排列是重复的,并没有用到新的osd
老版本,你光扩pg是没有用的,排列组合并没有增加。扩pgp增加排列组合,才有更多osd给你存数据
[root@ceph01 ceph]# ceph osd pool set pool1 pg_num 32
pg可用缩小,那么他会先减少排列组合,然后慢慢减小
一个存储池理论上可以占用整个集群
但是可以设置配额
ceph的存储池可以用在哪
rbd块存储
rgw对象存储
cephfs文件系统存储
[root@ceph01 ceph]# ceph osd pool application enable pool1 rgw
enabled application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 258 lfor 0/256/254 flags hashpspool stripe_width 0 application rgw
[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw
Error EPERM: Are you SURE? Disabling an application within a pool might result in loss of application functionality; pass --yes-i-really-mean-it to proceed anyway
[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw --yes-i-really-mean-it
disable application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]#
设置标签和取消标签,存了数据就别取消标签了
rgw并没有装,客户端不能通过http,上传下载,但是我可以在集群里面操作
[ceph: root@ceph03 /]# dd if=/dev/zero of=file bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0050853 s, 2.1 GB/s
[ceph: root@ceph03 /]# rados -p pool1 ls
[ceph: root@ceph03 /]# rados -p pool1 put file1 file
[ceph: root@ceph03 /]# rados -p pool1 ls
file1
[ceph: root@ceph03 /]#
自己创建了对象,这个不是那种文件。他省略了文件切块(在客户端切块,客户端在做)。他直接就是上传对象
[ceph: root@ceph03 /]# ceph osd map pool1 file1
osdmap e260 pool 'pool1' (2) object 'file1' -> pg 2.a086551 (2.11) -> up ([6,3,1], p6) acting ([6,3,1], p6)
查看对象来自哪个pg和osd
本身pg与osd映射关系就已经好了,存对象时,hash到对应pg,然后使用映射关系,存到osd
rados命令来调试,排错,上传的对象并不能使用。
总结
查看存储池信息
ceph osd pool ls 查看存储池名字
ceph osd pool detail 查看存储池详细信息
ceph osd tree 查看osd信息
ceph df 查看存储池使用情况
说明:每个存储池最多可以使用整个ceph集群的空间,最多能存多少数据取决于副本数
ceph osd df 查看权重,以及每个osd大小
weight权重
reweight 都是一样的磁盘所以都是1.0000
权重:1TB数据权重为1 服务器磁盘最好大小性能一样,好合作
ceph -s 查看集群状态
ceph osd pool create pool1 创建存储池
ceph osd pool application enable pool1 rgw 指定存储池类型
ceph pg dump pgs_brief 查看pg和pgp信息
ceph osd pool get pool1 all 查看存储池参数
ceph osd pool set pool1 size 2 修改参数值 (没数据的话,改这个副本数很快)
ceph osd pool get pool1 size 查看某一个参数的值
存储池中对象的操作
(测试使用)
rados -p pool1 put file1 /root/file 上传对象
rados -p pool1 get file1 /tmp/file.bak 获取对象
rados -p pool1 rm file1 删除对象
设置存储池配额 (重要)
[ceph: root@ceph03 /]# ceph -h | grep quota
osd pool get-quota <pool> obtain object or byte limits for pool
osd pool set-quota <pool> max_objects|max_bytes <val> set object or byte limit on pool
[ceph: root@ceph03 /]#
[ceph: root@ceph03 tmp]# ceph osd pool set-quota pool1 max_objects 4
set-quota max_objects = 4 for pool pool1
[ceph: root@ceph03 tmp]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects (current num objects: 1 objects)
max bytes : N/A
[ceph: root@ceph03 tmp]#
修改池子允许的对象配额
[ceph: root@ceph03 /]# rados -p pool1 put test4 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects (current num objects: 3 objects)
max bytes : N/A
超过最大值存储池直接卡死,而且对存储池任何操作都卡住
所以得扩大存储池限制
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 5
set-quota max_objects = 5 for pool pool1
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 0
set-quota max_objects = 0 for pool pool1
取消限制
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 120M
set-quota max_bytes = 125829120 for pool pool1
[ceph: root@ceph03 /]# rados -p pool1 put test11 file
[ceph: root@ceph03 /]# rados -p pool1 put test12 file
[ceph: root@ceph03 /]# rados -p pool1 put test13 file
[ceph: root@ceph03 /]# rados -p pool1 put test14 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: N/A
max bytes : 120 MiB (current num bytes: 136314884 bytes)
[ceph: root@ceph03 /]# rados -p pool1 put test15 file
设置字节配额
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 0
取消限制
[ceph: root@ceph03 /]# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 180 GiB 170 GiB 742 MiB 9.7 GiB 5.40
TOTAL 180 GiB 170 GiB 742 MiB 9.7 GiB 5.40
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 60 GiB
pool1 2 32 140 MiB 15 420 MiB 0.23 60 GiB
[ceph: root@ceph03 /]#
可以发现这个配额没有*3 (限制了一个副本大小,所以限制20G配额,三副本情况下,你限制了60G总空间大小)
因为实实在在的存了420
重命名池
ceph osd pool rename mqy supermao
查看对象映射的pg
ceph osd map poolname objectname
[ceph: root@ceph03 /]# ceph osd map pool1 test1
osdmap e285 pool 'pool1' (2) object 'test1' -> pg 2.bddbf0b9 (2.19) -> up ([0,4,7], p0) acting ([0,4,7], p0)
存储池快照
[ceph: root@ceph03 /]# ceph osd pool mksnap pool1 snap1
created pool pool1 snap snap1
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 286 lfor 0/256/254 flags hashpspool,pool_snaps stripe_width 0 application rgw
snap 1 'snap1' 2022-08-07T15:12:45.396812+0000
查看快照
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 ls
selected snap 1 'snap1'
test15
test12
test13
test6
test5
test10
test2
test9
test
test8
test7
test14
test11
test4
test3
从快照下载
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 get test2 iamsanp
selected snap 1 'snap1'
原卷丢失,可以从快照传回去
恢复快照
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]# rados -p pool1 rm test2
error removing pool1>test2: (2) No such file or directory
[ceph: root@ceph03 /]# rados -p pool1 rollback test2 snap1
rolled back pool pool1 to snapshot snap1
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]#
只能回滚指定,丢失的对象
防止误操作,你可以有新对象,不至于你一恢复,你新数据都无了。
所以只能让你恢复丢失的对象
快照可以创建多个
删除池
ceph osd pool rm pool1
[ceph: root@ceph03 /]# ceph osd pool create pool2
pool 'pool2' created
[ceph: root@ceph03 /]# ceph osd pool ls
device_health_metrics
pool1
pool2
[ceph: root@ceph03 /]# ceph osd pool rm pool2
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool pool2. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[ceph: root@ceph03 /]#
点击第二个,并点edit将global全局改为true,可以删除存储池
[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
pool 'pool2' removed
给存储池一个金身,不管怎么样的无法删,除非改变这个值
[ceph: root@ceph03 /]# ceph osd pool set pool1 nodelete true
set pool 2 nodelete to true
[ceph: root@ceph03 /]# ceph osd pool set pool1 nodelete true
set pool 2 nodelete to true
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[ceph: root@ceph03 /]# ceph osd pool set pool1 nodelete false
set pool 2 nodelete to false
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
pool 'pool1' removed