ceph 003 对osd操作对存储池操作存储池配额存储池快照 pgp

主机被加入集群时，会自动被分配角色以达到集群的默认状态。（mon，mgr之类）
想要超过默认状态可以进行设置

ceph容器与客户端

ceph集群的客户端
需要
ceph-common 软件包
ceph.conf 配置文件
ceph.client.admin.keyring 秘钥文件

[root@ceph01 ceph]# ls
ceph.client.admin.keyring  ceph.conf  ceph.pub  rbdmap
[root@ceph01 ceph]# pwd
/etc/ceph

容器也拥有配置文件与秘钥

[root@ceph01 ceph]# cephadm shell
Inferring fsid cb8f4abe-14a7-11ed-a76d-000c2939fb75
Inferring config /var/lib/ceph/cb8f4abe-14a7-11ed-a76d-000c2939fb75/mon.ceph01.example.com/config
Using recent ceph image quay.io/ceph/ceph@sha256:c3336a5b10b069b127d1a66ef97d489867fc9c2e4f379100e5a06f99f137a420
[ceph: root@ceph01 /]# 
[ceph: root@ceph01 /]# cd /etc/ceph/
[ceph: root@ceph01 ceph]# ls
ceph.conf  ceph.keyring  rbdmap
[ceph: root@ceph01 ceph]# pwd
/etc/ceph

cephadm shell进入并创建临时容器。
容器里的配置文件来自宿主机，宿主机映射到容器
举例

[root@ceph01 ceph]# docker inspect f90bf6876862 | tail -n  7
            {
                "Type": "bind",
                "Source": "/etc/ceph/ceph.client.admin.keyring",
                "Destination": "/etc/ceph/ceph.keyring",
                "Mode": "z",
                "RW": true,
                "Propagation": "rprivate"
            },

ceph03其他节点没有ceph.client.admin.keyring
所以无法访问，但是其他节点是有ceph.conf并可以进入临时容器

[ceph: root@ceph03 /]# ceph -s
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.743+0000 7f5258b78700 -1 AuthRegistry(0x7f525405eba0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2022-08-06T12:05:05.746+0000 7f5258b78700 -1 AuthRegistry(0x7f5258b76f90) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx




[root@ceph01 ceph]# ceph -s
cluster:
    id:     cb8f4abe-14a7-11ed-a76d-000c2939fb75
    health: HEALTH_OK

services:
    mon: 3 daemons, quorum ceph01.example.com,ceph02,ceph03 (age 41m)
    mgr: ceph02.alqzfq(active, since 41m), standbys: ceph01.example.com.wvuoii
    osd: 9 osds: 9 up (since 41m), 9 in (since 25h)

data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   9.1 GiB used, 171 GiB / 180 GiB avail
    pgs:     1 active+clean

quorum(仲裁) 默认mon半数以上才算存活
查帮助加mon之类角色
in为在集群里 up是在运行

[root@ceph01 ceph]# ceph orch -h | grep mon
orch daemon add [mon|mgr|rbd-mirror|crash|alertmanager|  Add daemon(s)
添加mgr
[root@ceph01 ceph]# ceph orch  daemon add mgr --placement=ceph03.example.com
Deployed mgr.ceph03.rujrrk on host 'ceph03.example.com'

osd

ceph orch ps 查看所有ceph服务

通过图形化可以精确看到哪个osd属于哪个磁盘，属于哪个服务
dashboard inventory

[root@ceph01 ceph]# ceph orch  daemon stop osd.8
Scheduled to stop osd.8 on host 'ceph03.example.com'
unknown状态
osd.8                          ceph03.example.com  stopped        3s ago     55s  <unknown>  quay.io/ceph/ceph:v15                     <unknown>     <unknown>     


[root@ceph01 ceph]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         0.17537  root default                              
-3         0.05846      host ceph01                           
0    hdd  0.01949          osd.0        up   1.00000  1.00000
1    hdd  0.01949          osd.1        up   1.00000  1.00000
2    hdd  0.01949          osd.2        up   1.00000  1.00000
-5         0.05846      host ceph02                           
3    hdd  0.01949          osd.3        up   1.00000  1.00000
4    hdd  0.01949          osd.4        up   1.00000  1.00000
5    hdd  0.01949          osd.5        up   1.00000  1.00000
-7         0.05846      host ceph03                           
6    hdd  0.01949          osd.6        up   1.00000  1.00000
7    hdd  0.01949          osd.7        up   1.00000  1.00000
8    hdd  0.01949          osd.8      down   1.00000  1.00000



[root@ceph01 ceph]# ceph orch daemon rm osd.8 --force
Removed osd.8 from host 'ceph03.example.com'
移除服务，会掉数据，很危险
容器对应容器被删除

[root@ceph01 ceph]# ceph osd rm osd.8
removed osd.8
删除osd
这里直接显示osd少一个了

存储驱动从filestore 换成了 bluestore
filestore时，可以看到对象，存在目录里以pg组的形式
bluestore已经无法看到对象，对象直接在裸盘上
bluestore底层为lvm

清除ceph的逻辑卷，完全恢复该磁盘sdd
[root@ceph01 ceph]# ceph orch device zap ceph03.example.com /dev/sdd --force


如果ceph -s出现Degraded data redundancy
可以考虑删除osd   crush视图
ceph osd  crush  remove   osd.6

如果你移除三个盘，刚好三副本在这三个盘上
不要一次性加入或移除多个盘。
假如我有100个盘，又加入100个新磁盘
那么新磁盘不能不干活，所以会触发数据重平衡，大量占用资源，那么其他人读的时候会卡

[ceph: root@ceph03 /]# ceph orch device ls
Hostname            Path      Type  Serial  Size   Health   Ident  Fault  Available  
ceph01.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph01.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    No         
ceph01.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    No         
ceph02.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No         
ceph03.example.com  /dev/sdb  hdd           21.4G  Unknown  N/A    N/A    No         
ceph03.example.com  /dev/sdc  hdd           21.4G  Unknown  N/A    N/A    YES         
ceph03.example.com  /dev/sdd  hdd           21.4G  Unknown  N/A    N/A    No 



[ceph: root@ceph03 /]# ceph orch apply osd --all-available-devices 
Scheduled osd.all-available-devices update...
获得所有可available的磁盘    （我感觉一个个加磁盘，比较踏实）

ceph 存储池

pool存储池可以做隔离，哪些人可以访问存储池，各个存储池可以策略不一样
存储池里面有pg。
存储池分为副本池与纠删码池
3副本池利用率只有3/1 (raid1)
纠删码的目的为节约空间 (raid3)

[root@ceph01 ceph]# ceph osd pool create pool1 
pool 'pool1' created
[root@ceph01 ceph]# ceph osd pool ls
device_health_metrics
pool1
[root@ceph01 ceph]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 94 flags hashpspool stripe_width 0

replicated副本   min_size 2 最小可用副本数为2个
如果副本小于2就不能访问
crush_rule 0 object_hash rjenkins   规则与hash算法
pg_mum 32   默认值   少于这个值根本体现不出分布式的优点
osd pool create <pool> [<pg_num:int>] [<pgp_num:int>] [replicated|erasure] [<erasure_code_profile>] 
erasure 纠删码池，erasure_code_profile 算法规则

[root@ceph01 ceph]# ceph pg dump pgs_brief
dumped pgs_brief
PG_STAT  STATE         UP       UP_PRIMARY  ACTING   ACTING_PRIMARY
2.1f     active+clean  [0,3,8]           0  [0,3,8]               0
2.1e     active+clean  [2,7,5]           2  [2,7,5]               2
2.1d     active+clean  [7,3,2]           7  [7,3,2]               7
2.1c     active+clean  [8,4,1]           8  [8,4,1]               8
2.1b     active+clean  [6,5,2]           6  [6,5,2]               6
2.1a     active+clean  [3,8,2]           3  [3,8,2]               3
2.19     active+clean  [0,4,7]           0  [0,4,7]               0
2.18     active+clean  [4,7,2]           4  [4,7,2]               4
2.17     active+clean  [6,2,4]           6  [6,2,4]               6
2.16     active+clean  [5,7,1]           5  [5,7,1]               5
2.15     active+clean  [7,1,3]           7  [7,1,3]               7
2.14     active+clean  [8,4,0]           8  [8,4,0]               8
2.13     active+clean  [7,4,2]           7  [7,4,2]               7
2.12     active+clean  [7,1,3]           7  [7,1,3]               7
2.11     active+clean  [6,3,1]           6  [6,3,1]               6
2.10     active+clean  [8,1,5]           8  [8,1,5]               8
2.f      active+clean  [8,4,0]           8  [8,4,0]               8
2.4      active+clean  [1,7,3]           1  [1,7,3]               1
2.2      active+clean  [5,1,8]           5  [5,1,8]               5
2.1      active+clean  [2,6,3]           2  [2,6,3]               2
2.3      active+clean  [5,7,1]           5  [5,7,1]               5
1.0      active+clean  [3,7,2]           3  [3,7,2]               3
2.0      active+clean  [3,6,0]           3  [3,6,0]               3
2.5      active+clean  [8,0,4]           8  [8,0,4]               8
2.6      active+clean  [1,6,3]           1  [1,6,3]               1
2.7      active+clean  [3,7,2]           3  [3,7,2]               3
2.8      active+clean  [3,7,0]           3  [3,7,0]               3
2.9      active+clean  [1,4,8]           1  [1,4,8]               1
2.a      active+clean  [6,1,3]           6  [6,1,3]               6
2.b      active+clean  [8,5,2]           8  [8,5,2]               8
2.c      active+clean  [6,0,5]           6  [6,0,5]               6
2.d      active+clean  [3,2,7]           3  [3,2,7]               3
2.e      active+clean  [2,3,7]           2  [2,3,7]               2

osd可用给多个pg使用，osd最多承载200pg左右（经验）
pg数量决定pool的性能
pg数量多，数据在集群中就越分散
假如一个池两个pg，那么最多6个osd，都不重复。数据就只会存在这6osd上，
在加一个pg，那么就有9个，osd越多，数据越分散
读可用一起读，并发
坏盘问题也可也解决
100G的osd 最好用80G

[root@ceph01 ceph]# ceph pg dump pgs_brief 8
指定八个pg

故障域
三个副本放在三个不同的主机上
三个副本放在节点三个不同的osd上（有可能是一个节点）
pgp

这pgp为排列方式，当扩大pg时，pgp未扩大，那么数据还是在原来的osd。如果pgp扩大，那么数据将会发生移动，到新的osd
如果pg_num 16 pgp_num 8
那么有8个pg排列是重复的，并没有用到新的osd
老版本，你光扩pg是没有用的，排列组合并没有增加。扩pgp增加排列组合，才有更多osd给你存数据

[root@ceph01 ceph]# ceph osd pool set  pool1 pg_num 32
pg可用缩小，那么他会先减少排列组合，然后慢慢减小

一个存储池理论上可以占用整个集群
但是可以设置配额

ceph的存储池可以用在哪
rbd块存储
rgw对象存储
cephfs文件系统存储

[root@ceph01 ceph]# ceph osd pool application enable pool1 rgw
enabled application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 258 lfor 0/256/254 flags hashpspool stripe_width 0 application rgw

[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw
Error EPERM: Are you SURE? Disabling an application within a pool might result in loss of application functionality; pass --yes-i-really-mean-it to proceed anyway
[ceph: root@ceph03 /]# ceph osd pool application disable pool1 rgw --yes-i-really-mean-it
disable application 'rgw' on pool 'pool1'
[ceph: root@ceph03 /]# 
设置标签和取消标签，存了数据就别取消标签了

rgw并没有装，客户端不能通过http，上传下载，但是我可以在集群里面操作

[ceph: root@ceph03 /]# dd if=/dev/zero of=file   bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0050853 s, 2.1 GB/s
[ceph: root@ceph03 /]# rados -p pool1 ls
[ceph: root@ceph03 /]# rados -p pool1 put file1 file
[ceph: root@ceph03 /]# rados -p pool1 ls
file1
[ceph: root@ceph03 /]# 
自己创建了对象，这个不是那种文件。他省略了文件切块（在客户端切块，客户端在做）。他直接就是上传对象
[ceph: root@ceph03 /]# ceph osd map pool1 file1
osdmap e260 pool 'pool1' (2) object 'file1' -> pg 2.a086551 (2.11) -> up ([6,3,1], p6) acting ([6,3,1], p6)
查看对象来自哪个pg和osd
本身pg与osd映射关系就已经好了，存对象时，hash到对应pg，然后使用映射关系，存到osd


rados命令来调试，排错，上传的对象并不能使用。

总结
查看存储池信息
ceph  osd  pool ls   查看存储池名字
ceph  osd  pool detail  查看存储池详细信息 
ceph  osd  tree  查看osd信息
ceph  df 查看存储池使用情况
说明：每个存储池最多可以使用整个ceph集群的空间，最多能存多少数据取决于副本数

ceph osd  df 查看权重，以及每个osd大小
weight权重
reweight  都是一样的磁盘所以都是1.0000
权重：1TB数据权重为1    服务器磁盘最好大小性能一样，好合作

ceph  -s    查看集群状态
ceph  osd  pool  create  pool1   创建存储池
ceph osd   pool  application  enable  pool1  rgw   指定存储池类型
ceph  pg   dump  pgs_brief   查看pg和pgp信息
ceph  osd  pool  get  pool1  all   查看存储池参数
ceph  osd  pool  set  pool1  size 2  修改参数值  （没数据的话，改这个副本数很快）
ceph  osd  pool  get  pool1  size  查看某一个参数的值

存储池中对象的操作

（测试使用）
rados  -p  pool1  put  file1   /root/file    上传对象
rados  -p  pool1  get  file1  /tmp/file.bak   获取对象
rados  -p  pool1  rm file1    删除对象

设置存储池配额 (重要)

[ceph: root@ceph03 /]# ceph  -h  |  grep  quota
osd pool get-quota <pool>                                                                                obtain object or byte limits for pool
osd pool set-quota <pool> max_objects|max_bytes <val>                                                    set object or byte limit on pool
[ceph: root@ceph03 /]# 

[ceph: root@ceph03 tmp]# ceph osd pool set-quota pool1 max_objects 4
set-quota max_objects = 4 for pool pool1
[ceph: root@ceph03 tmp]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects  (current num objects: 1 objects)
max bytes  : N/A
[ceph: root@ceph03 tmp]# 
修改池子允许的对象配额

[ceph: root@ceph03 /]# rados -p pool1 put test4 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: 4 objects  (current num objects: 3 objects)
max bytes  : N/A
超过最大值存储池直接卡死，而且对存储池任何操作都卡住
所以得扩大存储池限制

[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 5
set-quota max_objects = 5 for pool pool1
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_objects 0
set-quota max_objects = 0 for pool pool1
取消限制

[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 120M
set-quota max_bytes = 125829120 for pool pool1
[ceph: root@ceph03 /]# rados -p pool1 put test11 file
[ceph: root@ceph03 /]# rados -p pool1 put test12 file
[ceph: root@ceph03 /]# rados -p pool1 put test13 file
[ceph: root@ceph03 /]# rados -p pool1 put test14 file
[ceph: root@ceph03 /]# ceph osd pool get-quota pool1
quotas for pool 'pool1':
max objects: N/A
max bytes  : 120 MiB  (current num bytes: 136314884 bytes)
[ceph: root@ceph03 /]# rados -p pool1 put test15 file
设置字节配额
[ceph: root@ceph03 /]# ceph osd pool set-quota pool1 max_bytes 0  
取消限制

[ceph: root@ceph03 /]# ceph df 
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    180 GiB  170 GiB  742 MiB   9.7 GiB       5.40
TOTAL  180 GiB  170 GiB  742 MiB   9.7 GiB       5.40

--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1      0 B        0      0 B      0     60 GiB
pool1                   2   32  140 MiB       15  420 MiB   0.23     60 GiB
[ceph: root@ceph03 /]# 
可以发现这个配额没有*3   （限制了一个副本大小，所以限制20G配额，三副本情况下，你限制了60G总空间大小）
因为实实在在的存了420

重命名池
ceph  osd  pool  rename  mqy  supermao

查看对象映射的pg
ceph  osd  map  poolname  objectname
[ceph: root@ceph03 /]# ceph osd map pool1 test1
osdmap e285 pool 'pool1' (2) object 'test1' -> pg 2.bddbf0b9 (2.19) -> up ([0,4,7], p0) acting ([0,4,7], p0)

存储池快照

[ceph: root@ceph03 /]# ceph osd pool mksnap pool1 snap1
created pool pool1 snap snap1
[ceph: root@ceph03 /]# ceph osd pool ls detail
pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 63 flags hashpspool stripe_width 0 pg_num_min 1 application mgr_devicehealth
pool 2 'pool1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 286 lfor 0/256/254 flags hashpspool,pool_snaps stripe_width 0 application rgw
    snap 1 'snap1' 2022-08-07T15:12:45.396812+0000


查看快照
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 ls
selected snap 1 'snap1'
test15
test12
test13
test6
test5
test10
test2
test9
test
test8
test7
test14
test11
test4
test3


从快照下载
[ceph: root@ceph03 /]# rados -p pool1 -s snap1 get test2 iamsanp
selected snap 1 'snap1'
原卷丢失，可以从快照传回去


恢复快照
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]# rados -p pool1 rm test2
error removing pool1>test2: (2) No such file or directory
[ceph: root@ceph03 /]# rados -p pool1 rollback test2 snap1 
rolled back pool pool1 to snapshot snap1
[ceph: root@ceph03 /]# rados -p pool1 rm test2
[ceph: root@ceph03 /]# 


只能回滚指定，丢失的对象
防止误操作，你可以有新对象，不至于你一恢复，你新数据都无了。
所以只能让你恢复丢失的对象
快照可以创建多个


删除池
ceph   osd   pool  rm   pool1
[ceph: root@ceph03 /]# ceph osd pool create pool2
pool 'pool2' created
[ceph: root@ceph03 /]# ceph osd pool ls
device_health_metrics
pool1
pool2
[ceph: root@ceph03 /]# ceph osd pool rm pool2
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool pool2.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[ceph: root@ceph03 /]#

点击第二个，并点edit将global全局改为true，可以删除存储池

[ceph: root@ceph03 /]# ceph osd pool rm pool2 pool2 --yes-i-really-really-mean-it
pool 'pool2' removed

给存储池一个金身，不管怎么样的无法删，除非改变这个值

[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete true
set pool 2 nodelete to true

[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete true
set pool 2 nodelete to true
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must unset nodelete flag for the pool first
[ceph: root@ceph03 /]# ceph osd pool set pool1  nodelete false
set pool 2 nodelete to false
[ceph: root@ceph03 /]# ceph osd pool rm pool1 pool1 --yes-i-really-really-mean-it
pool 'pool1' removed

posted @ 2022-08-06 23:36 supermao12 阅读(352) 评论(0) 编辑收藏举报

刷新页面返回顶部

ceph 003 对osd操作 对存储池操作 存储池配额 存储池快照 pgp

ceph容器与客户端

osd

ceph 存储池

设置存储池配额 (重要)

存储池快照

公告

ceph 003 对osd操作对存储池操作存储池配额存储池快照 pgp