Ceph CURSH Map和规则详解
Ceph CURSH map和规则详解
1、CRUSH map层次结构(示例)
ceph osd crush add-bucket datacenter0 datacenter
ceph osd crush add-bucket room0 room
ceph osd crush add-bucket rack0 rack
ceph osd crush add-bucket rack1 rack
ceph osd crush add-bucket rack2 rack
ceph osd crush move room0 datacenter=datacenter0
ceph osd crush move rack0 room=room0
ceph osd crush move rack1 room=room0
ceph osd crush move rack2 room=room0
ceph osd crush link node01 rack=rack0
ceph osd crush link node02 rack=rack1
ceph osd crush link node03 rack=rack2
查看示例树形结构
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.05867 datacenter datacenter0
-10 0.05867 room room0
-11 0.02928 rack rack0
-3 0.02928 host node01
0 hdd 0.00490 osd.0 up 1.00000 1.00000
3 hdd 0.00490 osd.3 up 1.00000 1.00000
6 hdd 0.01949 osd.6 up 1.00000 1.00000
-12 0.01469 rack rack1
-5 0.01469 host node02
1 hdd 0.00490 osd.1 up 1.00000 1.00000
4 hdd 0.00490 osd.4 up 1.00000 1.00000
7 hdd 0.00490 osd.7 up 1.00000 1.00000
-13 0.01469 rack rack2
-7 0.01469 host node03
2 hdd 0.00490 osd.2 up 1.00000 1.00000
5 hdd 0.00490 osd.5 up 1.00000 1.00000
8 hdd 0.00490 osd.8 up 1.00000 1.00000
-1 0.05867 root default
-3 0.02928 host node01
0 hdd 0.00490 osd.0 up 1.00000 1.00000
3 hdd 0.00490 osd.3 up 1.00000 1.00000
6 hdd 0.01949 osd.6 up 1.00000 1.00000
-5 0.01469 host node02
1 hdd 0.00490 osd.1 up 1.00000 1.00000
4 hdd 0.00490 osd.4 up 1.00000 1.00000
7 hdd 0.00490 osd.7 up 1.00000 1.00000
-7 0.01469 host node03
2 hdd 0.00490 osd.2 up 1.00000 1.00000
5 hdd 0.00490 osd.5 up 1.00000 1.00000
8 hdd 0.00490 osd.8 up 1.00000 1.00000
2、CRUSH规则
列出规则
ceph osd crush rule ls
规则转储
ceph osd crush rule dump {name}
增加简单规则
# ceph osd crush rule create-simple {rulename} {root} {bucket-type} {firstn|indep}
ceph osd crush rule create-simple deleteme default host firstn
增加复制规则
# ceph osd crush rule create-replicated <name> <root> <failure-domain> <class>
ceph osd crush rule create-replicated fast default host ssd
添加Erasure Code规则
ceph osd crush rule create-erasure {rulename} {profilename}
删除规则
ceph osd crush rule rm {name}
3、CURSH存储策略(示例)
设置存储类
# ceph osd crush set-device-class <class> <osdId> [<osdId>]
ceph osd crush rm-device-class osd.8 osd.7 osd.6
ceph osd crush set-device-class ssd osd.8 osd.7 osd.6
创建规则
# ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <device-class>:
ceph osd crush rule create-replicated slow default host hdd
ceph osd crush rule create-replicated fast default host ssd
储存池使用规则
# ceph osd pool set <poolname> crush_rule <rule-name>
ceph osd pool set mypool1 crush_rule slow
ceph osd pool set mypool2 crush_rule fast
CURSH map示例截取
...
rule slow {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule fast {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
...
4、编辑CRUSH map
获取CRUSH map
ceph osd getcrushmap -o {compiled-crushmap-filename}
解译CRUSH map
crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
编译CRUSH map
crushtool -c {decompiled-crush-map-filename} -o {compiled-crush-map-filename}
设置CRUSH map
ceph osd setcrushmap -i {compiled-crushmap-filename}
5、规则参数说明
rule <rulename> {
ruleset <ruleset>
type [replicated|erasure]
min_size <min-size>
max_size <max-size>
step take <bucket-name>
step select [choose|chooseleaf] [firstn|indep] <num> type <bucket-type>
step emit
}
-
type:规则类型,目前仅支持 replicated 和 erasure ,默认是 replicated 。
-
min_size:可以选择此规则的存储池最小副本数。
-
max_size:可以选择此规则的存储池最大副本数。
-
step take
:选取起始的桶名,并迭代到树底。 -
step choose firstn {num} type {bucket-type}:选择一定数量给定类型的桶,该值通常是该池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。
-
step chooseleaf firstn {num} type {bucket-type}, 选择给定类型的一个桶集合,并从该集合中的每个桶中选择一个叶子节点,其中的桶个数是池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。
-
step emit:输出当前值并清空堆栈。通常用于规则末尾,也适用于相同规则应用到不同树的情况。
-
choose : choose 在选择到预期类型的bucket后就到此结束,进行下一个select操作。
-
chooseleaf : chooseleaf 在选择到预期的bucket后会继续递归选到osd。
-
firstn 和indep : 都是深度优先遍历算法,主要区别在于如果选择的num为4,如果无法选够4个结果的时候 firstn 会返回[1,2,4] 这种结果,而indep会返回[1,2,CRUSH_ITEM_NONE,4], 一般情况下都是使用firstn。