ceph 003 对osd操作 对存储池操作 存储池配额 存储池快照 pgp

主机被加入集群时,会自动被分配角色以达到集群的默认状态。(mon,mgr之类)
想要超过默认状态可以进行设置

ceph容器与客户端

ceph集群的客户端
需要
ceph-common 软件包
ceph.conf 配置文件
ceph.client.admin.keyring 秘钥文件

[root@ceph01 ceph]# ls
ceph.client.admin.keyring  ceph.conf  ceph.pub  rbdmap
[root@ceph01 ceph]# pwd
/etc/ceph

容器也拥有配置文件与秘钥

[root@ceph01 ceph]# cephadm shell
Inferring fsid cb8f4abe-14a7-11ed-a76d-000c2939fb75
Inferring config /var/lib/ceph/cb8f4abe-14a7-11ed-a76d-000c2939fb75/mon.ceph01.example.com/config
Using recent ceph image quay.io/ceph/ceph@sha256:c3336a5b10b069b127d1a66ef97d489867fc9c2e4f379100e5a06f99f137a420
[ceph: root@ceph01 /]# 
[ceph: root@ceph01 /]# cd /etc/ceph/
[ceph: root@ceph01 ceph]# ls
ceph.conf  ceph.keyring  rbdmap
[ceph: root@ceph01 ceph]# pwd
/etc/ceph

cephadm shell进入并创建临时容器。
容器里的配置文件来自宿主机,宿主机映射到容器
举例

[root@ceph01 ceph]# docker inspect f90bf6876862 | tail -n  7
            {
                "Type": "bind",
                "Source": "/etc/ceph/ceph.client.admin.keyring",
                "Destination": "/etc/ceph/ceph.keyring",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },

ceph03其他节点没有ceph.client.admin.keyring
所以无法访问,但是其他节点是有ceph.conf并可以进入临时容器

[ceph: root@ceph03 /]# ceph -s
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 AuthRegistry(0x7f525405eba0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 AuthRegistry(0x7f5258b76f90) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx




[root@ceph01 ceph]# ceph -s
cluster:
    id:     cb8f4abe-14a7-11ed-a76d-000c2939fb75
    health: HEALTH_OK

services:
    mon: 3 daemons, quorum ceph01.example.com,ceph02,ceph03 (age 41m)
    mgr: ceph02.alqzfq(active, since 41m), standbys: ceph01.example.com.wvuoii
    osd: 9 osds: 9 up (since 41m), 9 in (since 25h)

data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   9.1 GiB used, 171 GiB / 180 GiB avail
    pgs:     1 active+clean

quorum(仲裁) 默认mon半数以上才算存活
查帮助加mon之类角色
in为在集群里 up是在运行

[root@ceph01 ceph]# ceph orch -h | grep mon
orch daemon add [mon|mgr|rbd-mirror|crash|alertmanager|  Add daemon(s)
添加mgr
[root@ceph01 ceph]# ceph orch  daemon add mgr --placement=ceph03.example.com
Deployed mgr.ceph03.rujrrk on host 'ceph03.example.com'

osd

ceph orch ps 查看所有ceph服务

通过图形化可以精确看到哪个osd属于哪个磁盘,属于哪个服务
dashboard inventory

[root@ceph01 ceph]# ceph orch  daemon stop osd.8
Scheduled to stop osd.8 on host 'ceph03.example.com'
unknown状态
osd.8                          ceph03.example.com  stopped        3s ago     55s  <unknown>  quay.io/ceph/ceph:v15                     <unknown>     <unknown>     


[root@ceph01 ceph]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         0.17537  root default                              
-3         0.05846      host ceph01                           
0    hdd  0.01949          osd.0        up   1.00000  1.00000
1    hdd  0.01949          osd.1        up   1.00000  1.00000
2    hdd  0.01949          osd.2        up   1.00000  1.00000
-5         0.05846      host ceph02                           
3    hdd  0.01949          osd.3        up   1.00000  1.00000
4    hdd  0.01949          osd.4        up   1.00000  1.00000
5    hdd  0.01949          osd.5        up   1.00000  1.00000
-7         0.05846      host ceph03                           
6    hdd  0.01949          osd.6        up   1.00000  1.00000
7    hdd  0.01949          osd.7        up   1.00000  1.00000
8    hdd  0.01949          osd.8      down   1.00000  1.00000



[root@ceph01 ceph]# ceph orch daemon rm osd.8 --force
Removed osd.8 from host 'ceph03.example.com'
移除服务,会掉数据,很危险
容器对应容器被删除

[root@ceph01 ceph]# ceph osd rm osd.8
removed osd.8
删除osd
这里直接显示osd少一个了

存储驱动从filestore 换成了 bluestore
filestore时,可以看到对象,存在目录里以pg组的形式
bluestore已经无法看到对象,对象直接在裸盘上
bluestore底层为lvm

清除ceph的逻辑卷,完全恢复该磁盘sdd
[root@ceph01 ceph]# ceph orch device zap ceph03.example.com /dev/sdd --force


如果ceph -s出现Degraded data redundancy
可以考虑删除osd   crush视图
ceph osd  crush  remove   osd.6

如果你移除三个盘,刚好三副本在这三个盘上
不要一次性加入或移除多个盘。
假如我有100个盘,又加入100个新磁盘
那么新磁盘不能不干活,所以会触发数据重平衡,大量占用资源,那么其他人读的时候会卡

[ceph: root@ceph03 /]# ceph orch device ls
Hostname            Path      Type  Serial  Size   Health   Ident  Fault  Available  
ceph01.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph01.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    No         
ceph01.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No         
ceph03.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph03.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    YES         
ceph03.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No 



[ceph: root@ceph03 /]# ceph orch apply osd --all-available-devices 
Scheduled osd.all-available-devices update...
获得所有可available的磁盘    (我感觉一个个加磁盘,比较踏实)   

ceph 存储池

pool存储池可以做隔离,哪些人可以访问存储池,各个存储池可以策略不一样
存储池里面有pg。
存储池 分为 副本池与纠删码池
3副本池利用率只有3/1 (raid1)
纠删码的目的为节约空间 (raid3)

[root@ceph01 ceph]# ceph osd pool create pool1 
pool 'pool1' created
[root@ceph01 ceph]# ceph osd pool ls
device_health_metrics
pool1
[root@ceph01 ceph]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94 flags hashpspool stripe_width 0

replicated副本   min_size 2 最小可用副本数为2个
如果副本小于2就不能访问
crush_rule 0 object_hash rjenkins   规则与hash算法
pg_mum 32   默认值   少于这个值根本体现不出分布式的优点
osd pool create <pool> [<pg_num:int>] [<pgp_num:int>] [replicated|erasure] [<erasure_code_profile>] 
erasure 纠删码池,erasure_code_profile 算法规则

[root@ceph01 ceph]# ceph pg dump pgs_brief
dumped pgs_brief
PG_STAT  STATE         UP       UP_PRIMARY  ACTING   ACTING_PRIMARY
2.1f     active+clean  [0,3,8]           0  [0,3,8]               0
2.1e     active+clean  [2,7,5]           2  [2,7,5]               2
2.1d     active+clean  [7,3,2]           7  [7,3,2]               7
2.1c     active+clean  [8,4,1]           8  [8,4,1]               8
2.1b     active+clean  [6,5,2]           6  [6,5,2]               6
2.1a     active+clean  [3,8,2]           3  [3,8,2]               3
2.19     active+clean  [0,4,7]           0  [0,4,7]               0
2.18     active+clean  [4,7,2]           4  [4,7,2]               4
2.17     active+clean  [6,2,4]           6  [6,2,4]               6
2.16     active+clean  [5,7,1]           5  [5,7,1]               5
2.15     active+clean  [7,1,3]           7  [7,1,3]               7
2.14     active+clean  [8,4,0]           8  [8,4,0]               8
2.13     active+clean  [7,4,2]           7  [7,4,2]               7
2.12     active+clean  [7,1,3]           7  [7,1,3]               7
2.11     active+clean  [6,3,1]           6  [6,3,1]               6
2.10     active+clean  [8,1,5]           8  [8,1,5]               8
2.f      active+clean  [8,4,0]           8  [8,4,0]               8
2.4      active+clean  [1,7,3]           1  [1,7,3]               1
2.2      active+clean  [5,1,8]           5  [5,1,8]               5
2.1      active+clean  [2,6,3]           2  [2,6,3]               2
2.3      active+clean  [5,7,1]           5  [5,7,1]               5
1.0      active+clean  [3,7,2]           3  [3,7,2]               3
2.0      active+clean  [3,6,0]           3  [3,6,0]               3
2.5      active+clean  [8,0,4]           8  [8,0,4]               8
2.6      active+clean  [1,6,3]           1  [1,6,3]               1
2.7      active+clean  [3,7,2]           3  [3,7,2]               3
2.8      active+clean  [3,7,0]           3  [3,7,0]               3
2.9      active+clean  [1,4,8]           1  [1,4,8]               1
2.a      active+clean  [6,1,3]           6  [6,1,3]               6
2.b      active+clean  [8,5,2]           8  [8,5,2]               8
2.c      active+clean  [6,0,5]           6  [6,0,5]               6
2.d      active+clean  [3,2,7]           3  [3,2,7]               3
2.e      active+clean  [2,3,7]           2  [2,3,7]               2

osd可用给多个pg使用,osd最多承载200pg左右(经验)
pg数量决定pool的性能
pg数量多,数据在集群中就越分散
假如一个池两个pg,那么最多6个osd,都不重复。数据就只会存在这6osd上,
在加一个pg,那么就有9个,osd越多,数据越分散
读可用一起读,并发
坏盘问题也可也解决
100G的osd 最好用80G

[root@ceph01 ceph]# ceph pg dump pgs_brief 8
指定八个pg

故障域
三个副本放在三个不同的主机上
三个副本放在节点三个不同的osd上 (有可能是一个节点)
pgp

这pgp为排列方式,当扩大pg时,pgp未扩大,那么数据还是在原来的osd。如果pgp扩大,那么数据将会发生移动,到新的osd
如果pg_num 16 pgp_num 8
那么有8个pg排列是重复的,并没有用到新的osd
老版本,你光扩pg是没有用的,排列组合并没有增加。扩pgp增加排列组合,才有更多osd给你存数据

[root@ceph01 ceph]# ceph osd pool set  pool1 pg_num 32
pg可用缩小,那么他会先减少排列组合,然后慢慢减小

一个存储池理论上可以占用整个集群
但是可以设置配额

ceph的存储池可以用在哪
rbd块存储
rgw对象存储
cephfs文件系统存储

[root@ceph01 ceph]# ceph osd pool application enable pool1 rgw
enabled application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 258 lfor 0/256/254 flags hashpspool stripe_width 0 application rgw

[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw
Error EPERM: Are you SURE? Disabling an application within a pool might result in loss of application functionality; pass --yes-i-really-mean-it to proceed anyway
[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw --yes-i-really-mean-it
disable application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]# 
设置标签和取消标签,存了数据就别取消标签了

rgw并没有装,客户端不能通过http,上传下载,但是我可以在集群里面操作

[ceph: root@ceph03 /]# dd if=/dev/zero of=file   bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0050853 s, 2.1 GB/s
[ceph: root@ceph03 /]# rados -p pool1 ls
[ceph: root@ceph03 /]# rados -p pool1 put file1 file
[ceph: root@ceph03 /]# rados -p pool1 ls
file1
[ceph: root@ceph03 /]# 
自己创建了对象,这个不是那种文件。他省略了文件切块(在客户端切块,客户端在做)。他直接就是上传对象
[ceph: root@ceph03 /]# ceph osd map pool1 file1
osdmap e260 pool 'pool1' (2) object 'file1' -> pg 2.a086551 (2.11) -> up ([6,3,1], p6) acting ([6,3,1], p6)
查看对象来自哪个pg和osd
本身pg与osd映射关系就已经好了,存对象时,hash到对应pg,然后使用映射关系,存到osd


rados命令来调试,排错,上传的对象并不能使用。

总结
查看存储池信息
ceph  osd  pool ls   查看存储池名字
ceph  osd  pool detail  查看存储池详细信息 
ceph  osd  tree  查看osd信息
ceph  df 查看存储池使用情况
说明:每个存储池最多可以使用整个ceph集群的空间,最多能存多少数据取决于副本数

ceph osd  df 查看权重,以及每个osd大小
weight权重
reweight  都是一样的磁盘所以都是1.0000
权重:1TB数据权重为1    服务器磁盘最好大小性能一样,好合作

ceph  -s    查看集群状态
ceph  osd  pool  create  pool1   创建存储池
ceph osd   pool  application  enable  pool1  rgw   指定存储池类型
ceph  pg   dump  pgs_brief   查看pg和pgp信息
ceph  osd  pool  get  pool1  all   查看存储池参数
ceph  osd  pool  set  pool1  size 2  修改参数值  (没数据的话,改这个副本数很快)
ceph  osd  pool  get  pool1  size  查看某一个参数的值

存储池中对象的操作

(测试使用)
rados  -p  pool1  put  file1   /root/file    上传对象
rados  -p  pool1  get  file1  /tmp/file.bak   获取对象
rados  -p  pool1  rm file1    删除对象   

设置存储池配额 (重要)

[ceph: root@ceph03 /]# ceph  -h  |  grep  quota
osd pool get-quota <pool>                                                                                obtain object or byte limits for pool
osd pool set-quota <pool> max_objects|max_bytes <val>                                                    set object or byte limit on pool
[ceph: root@ceph03 /]# 

[ceph: root@ceph03 tmp]# ceph osd pool set-quota pool1 max_objects 4
set-quota max_objects = 4 for pool pool1
[ceph: root@ceph03 tmp]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects  (current num objects: 1 objects)
max bytes  : N/A
[ceph: root@ceph03 tmp]# 
修改池子允许的对象配额

[ceph: root@ceph03 /]# rados -p pool1 put test4 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects  (current num objects: 3 objects)
max bytes  : N/A
超过最大值存储池直接卡死,而且对存储池任何操作都卡住
所以得扩大存储池限制

[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 5
set-quota max_objects = 5 for pool pool1
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 0
set-quota max_objects = 0 for pool pool1
取消限制

[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 120M
set-quota max_bytes = 125829120 for pool pool1
[ceph: root@ceph03 /]# rados -p pool1 put test11 file
[ceph: root@ceph03 /]# rados -p pool1 put test12 file
[ceph: root@ceph03 /]# rados -p pool1 put test13 file
[ceph: root@ceph03 /]# rados -p pool1 put test14 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: N/A
max bytes  : 120 MiB  (current num bytes: 136314884 bytes)
[ceph: root@ceph03 /]# rados -p pool1 put test15 file
设置字节配额
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 0  
取消限制

[ceph: root@ceph03 /]# ceph df 
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    180 GiB  170 GiB  742 MiB   9.7 GiB       5.40
TOTAL  180 GiB  170 GiB  742 MiB   9.7 GiB       5.40

--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1      0 B        0      0 B      0     60 GiB
pool1                   2   32  140 MiB       15  420 MiB   0.23     60 GiB
[ceph: root@ceph03 /]# 
可以发现这个配额没有*3   (限制了一个副本大小,所以限制20G配额,三副本情况下,你限制了60G总空间大小)
因为实实在在的存了420

重命名池
ceph  osd  pool  rename  mqy  supermao

查看对象映射的pg
ceph  osd  map  poolname  objectname
[ceph: root@ceph03 /]# ceph osd map pool1 test1
osdmap e285 pool 'pool1' (2) object 'test1' -> pg 2.bddbf0b9 (2.19) -> up ([0,4,7], p0) acting ([0,4,7], p0)

存储池快照

[ceph: root@ceph03 /]# ceph osd pool mksnap pool1 snap1
created pool pool1 snap snap1
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 286 lfor 0/256/254 flags hashpspool,pool_snaps stripe_width 0 application rgw
    snap 1 'snap1' 2022-08-07T15:12:45.396812+0000


查看快照
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 ls
selected snap 1 'snap1'
test15
test12
test13
test6
test5
test10
test2
test9
test
test8
test7
test14
test11
test4
test3


从快照下载
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 get test2 iamsanp
selected snap 1 'snap1'
原卷丢失,可以从快照传回去


恢复快照
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]# rados -p pool1 rm test2
error removing pool1>test2: (2) No such file or directory
[ceph: root@ceph03 /]# rados -p pool1 rollback test2 snap1 
rolled back pool pool1 to snapshot snap1
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]# 


只能回滚指定,丢失的对象
防止误操作,你可以有新对象,不至于你一恢复,你新数据都无了。
所以只能让你恢复丢失的对象
快照可以创建多个


删除池
ceph   osd   pool  rm   pool1
[ceph: root@ceph03 /]# ceph osd pool create pool2
pool 'pool2' created
[ceph: root@ceph03 /]# ceph osd pool ls
device_health_metrics
pool1
pool2
[ceph: root@ceph03 /]# ceph osd pool rm pool2
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool pool2.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[ceph: root@ceph03 /]#                            


点击第二个,并点edit将global全局改为true,可以删除存储池

[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
pool 'pool2' removed

给存储池一个金身,不管怎么样的无法删,除非改变这个值

[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete true
set pool 2 nodelete to true

[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete true
set pool 2 nodelete to true
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete false
set pool 2 nodelete to false
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
pool 'pool1' removed
posted @ 2022-08-06 23:36  supermao12  阅读(352)  评论(0编辑  收藏  举报