Ceph CURSH Map和规则详解

Ceph CURSH map和规则详解

1、CRUSH map层次结构(示例)

ceph osd crush add-bucket datacenter0 datacenter
ceph osd crush add-bucket room0 room
ceph osd crush add-bucket rack0 rack
ceph osd crush add-bucket rack1 rack
ceph osd crush add-bucket rack2 rack

ceph osd crush move room0 datacenter=datacenter0
ceph osd crush move rack0 room=room0
ceph osd crush move rack1 room=room0
ceph osd crush move rack2 room=room0

ceph osd crush link node01 rack=rack0
ceph osd crush link node02 rack=rack1
ceph osd crush link node03 rack=rack2

查看示例树形结构

ID   CLASS  WEIGHT   TYPE NAME                STATUS  REWEIGHT  PRI-AFF
 -9         0.05867  datacenter datacenter0                            
-10         0.05867      room room0                                    
-11         0.02928          rack rack0                                
 -3         0.02928              host node01                           
  0    hdd  0.00490                  osd.0        up   1.00000  1.00000
  3    hdd  0.00490                  osd.3        up   1.00000  1.00000
  6    hdd  0.01949                  osd.6        up   1.00000  1.00000
-12         0.01469          rack rack1                                
 -5         0.01469              host node02                           
  1    hdd  0.00490                  osd.1        up   1.00000  1.00000
  4    hdd  0.00490                  osd.4        up   1.00000  1.00000
  7    hdd  0.00490                  osd.7        up   1.00000  1.00000
-13         0.01469          rack rack2                                
 -7         0.01469              host node03                           
  2    hdd  0.00490                  osd.2        up   1.00000  1.00000
  5    hdd  0.00490                  osd.5        up   1.00000  1.00000
  8    hdd  0.00490                  osd.8        up   1.00000  1.00000
 -1         0.05867  root default                                      
 -3         0.02928      host node01                                   
  0    hdd  0.00490          osd.0                up   1.00000  1.00000
  3    hdd  0.00490          osd.3                up   1.00000  1.00000
  6    hdd  0.01949          osd.6                up   1.00000  1.00000
 -5         0.01469      host node02                                   
  1    hdd  0.00490          osd.1                up   1.00000  1.00000
  4    hdd  0.00490          osd.4                up   1.00000  1.00000
  7    hdd  0.00490          osd.7                up   1.00000  1.00000
 -7         0.01469      host node03                                   
  2    hdd  0.00490          osd.2                up   1.00000  1.00000
  5    hdd  0.00490          osd.5                up   1.00000  1.00000
  8    hdd  0.00490          osd.8                up   1.00000  1.00000

2、CRUSH规则

列出规则

ceph osd crush rule ls

规则转储

ceph osd crush rule dump {name}

增加简单规则

# ceph osd crush rule create-simple {rulename} {root} {bucket-type} {firstn|indep}
ceph osd crush rule create-simple deleteme default host firstn

增加复制规则

# ceph osd crush rule create-replicated <name> <root> <failure-domain> <class>
ceph osd crush rule create-replicated fast default host ssd

添加Erasure Code规则

ceph osd crush rule create-erasure {rulename} {profilename}

删除规则

ceph osd crush rule rm {name}

3、CURSH存储策略(示例)

设置存储类

# ceph osd crush set-device-class <class> <osdId> [<osdId>]
ceph osd crush rm-device-class osd.8 osd.7 osd.6
ceph osd crush set-device-class ssd osd.8 osd.7 osd.6

创建规则

# ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <device-class>:
ceph osd crush rule create-replicated slow default host hdd
ceph osd crush rule create-replicated fast default host ssd

储存池使用规则

# ceph osd pool set <poolname> crush_rule <rule-name>
ceph osd pool set mypool1 crush_rule slow
ceph osd pool set mypool2 crush_rule fast

CURSH map示例截取

...
rule slow {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take default class hdd
        step chooseleaf firstn 0 type host
        step emit
}
rule fast {
        id 2
        type replicated
        min_size 1
        max_size 10
        step take default class ssd
        step chooseleaf firstn 0 type host
        step emit
}
...

4、编辑CRUSH map

获取CRUSH map

ceph osd getcrushmap -o {compiled-crushmap-filename}

解译CRUSH map

crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}

编译CRUSH map

crushtool -c {decompiled-crush-map-filename} -o {compiled-crush-map-filename}

设置CRUSH map

ceph osd setcrushmap -i  {compiled-crushmap-filename}

5、规则参数说明

rule <rulename> {
    ruleset <ruleset>
    type [replicated|erasure]
    min_size <min-size>
    max_size <max-size>
    step take <bucket-name>
    step select [choose|chooseleaf] [firstn|indep] <num> type <bucket-type>
    step emit
}
  • type:规则类型,目前仅支持 replicated 和 erasure ,默认是 replicated 。

  • min_size:可以选择此规则的存储池最小副本数。

  • max_size:可以选择此规则的存储池最大副本数。

  • step take :选取起始的桶名,并迭代到树底。

  • step choose firstn {num} type {bucket-type}:选择一定数量给定类型的桶,该值通常是该池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。

  • step chooseleaf firstn {num} type {bucket-type}, 选择给定类型的一个桶集合,并从该集合中的每个桶中选择一个叶子节点,其中的桶个数是池的拷贝份数。如果{num}==0,则选择pool-num-replicas个桶,若{num} > 0 && < pool-num-replicas个桶,则选择num。如果{num} < 0, 意味着选择pool-num-replicas - {|num|}。

  • step emit:输出当前值并清空堆栈。通常用于规则末尾,也适用于相同规则应用到不同树的情况。

  • choose : choose 在选择到预期类型的bucket后就到此结束,进行下一个select操作。

  • chooseleaf : chooseleaf 在选择到预期的bucket后会继续递归选到osd。

  • firstn 和indep : 都是深度优先遍历算法,主要区别在于如果选择的num为4,如果无法选够4个结果的时候 firstn 会返回[1,2,4] 这种结果,而indep会返回[1,2,CRUSH_ITEM_NONE,4], 一般情况下都是使用firstn

posted @ 2023-09-08 23:47  wanghongwei-dev  阅读(147)  评论(0编辑  收藏  举报