从swift-ring-builder命令说起,理解Ring的构建过程

 对siwft有些了解的人都知道,Ring是swift中非常核心的组件,它决定着数据如何在集群中分布。Swift根据设置的partition_power决定集群中的分区数量(2的partition_power次方),并根据一致性哈希算法将分区分配到不同的node上,并将数据分布到对应的分区上。

因此,构建Ring就成为swift初始化必须经历的过程。简单说来:

  • 新的Ring创建的过程:
  1. ring-builder根据device weight计算出每个设备上应该被分配的分区的数量。(2的partition_power次方得到分区总数,再根据weight和设备数进行分配)
  2. ring-builder将每个分区的副本分配到对应的device上。

 

  • 根据一个old ring创建新new ring的过程:
  1. 重新计算每个device上的分区数量;
  2. 收集需要被重新分配的分区:
    • 1)将被移除的device上的所有分区添加到gathered list;
    • 2)将由于添加新device而产生的需要被分配出去的分区添加到gathered list;
    • 3)将所有device上经过重新分配后多出来的分区添加到gathered list。
  3. 使用上述“新的Ring创建的过程”的方法分配gathered list中的分区到devices中。

那么swift-ring-builder命令又是如何执行的呢?本文简单旨在介绍swift-ring-builder命令,通过源码可以发现,swift-ring-builder命令的功能基本上都是通过RingBuilder实例的相关方法实现的,因此更加原理和细节的东西,将会在后续阅读RingBuilder的源码后再进行总结。So,莫喷我挂羊头卖狗肉啦 ^_~ 

 

1. swift-ring-builder 做了什么?

Rings是通过swift-ring-builder这个工具手动创建的,swift-ring-builder将分区与设备关联,并将该数据写入一个优化过的Python数据结构,压缩、序列化后写入磁盘,以供rings创建的数据可以被导入到服务器中。更新rings的机制非常简单,服务器通过检查创建rings的文件的最后更新日期来判断它和自己内存中的版本哪一个更新,从而决定是否需要重新载入rings创建数据。本段中所说的“Python数据结构”是一个如下所示的字典输出结构:

def to_dict(self):
        """
        Returns a dict that can be used later with copy_from to
        restore a RingBuilder. swift-ring-builder uses this to
        pickle.dump the dict to a file and later load that dict into
        copy_from.
        """
        return {'part_power': self.part_power,
                'replicas': self.replicas,
                'min_part_hours': self.min_part_hours,
                'parts': self.parts,
                'devs': self.devs,
                'devs_changed': self.devs_changed,
                'version': self.version,
                '_replica2part2dev': self._replica2part2dev,
                '_last_part_moves_epoch': self._last_part_moves_epoch,
                '_last_part_moves': self._last_part_moves,
                '_last_part_gather_start': self._last_part_gather_start,
                '_remove_devs': self._remove_devs}

swift-ring-builder命令的基本结构为:

swift-ring-builder <builder_file> <action> [params]

swift-ring-builder根据<action>执行相应的动作,生成builder file存储在<builder_file>指定的文件中,生成指导创建ring的文件xxx.ring.gz。在此之前,它会将原来的<builder_file>和xxx.ring.gz备份到backups文件夹中。

图1 swift-ring-builder创建的builder file和ring.gz

图2 swift-ring-builder备份的builder file和ring.gz

对<builder_file>的保存时非常重要的,因此你需要存储ring创建文件的多个副本。因为一旦ring创建文件完全丢失,就意味着我们需要重头完全重新创建一个ring,这样几乎所有的分区都会被分配到新的不同的设备上,因此数据副本也都会被移动到新的位置,造成大量数据迁移,导致系统在一段时间内不可用。
 
 
2. swift-ring-builder 命令
 
 swift-ring-builder中包含多种命令:
add 
create 
list_parts 
rebalance 
remove 
search 
set_info
set_min_part_hours
set_weight
set_replicas
validate
write_ring
接下来我们对这些命令进行罗列,并作出相关解释。英文的文档内容可以通过直接运行“swift-ring-builder”命令获得。
 
swift-ring-builder <builder_file>
    Shows information about the ring and the devices within.
   显示ring以及ring中设备的信息,swift-1.8.0中对device新增了一个region属性 swift
-ring-builder <builder_file> add z<zone>-<ip>:<port>/<device_name>_<meta> <weight> [z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ... Adds devices to the ring with the given information. No partitions will be assigned to the new device until after running 'rebalance'. This is so you can make multiple device changes and rebalance them all just once.
   使用给出的信息添加新的设备到ring上。add操作不会分配partitions到新的设备上,只有运行了'rebalance'命令后才会进行分区的分配。
   因此,这种机制可以允许你一次添加多个设备,并只执行一次rebalance实现对这些设备的分区分配。

swift-ring-builder <builder_file> create <part_power> <replicas> <min_part_hours>
    Creates <builder_file> with 2^<part_power> partitions and <replicas>.
    <min_part_hours> is number of hours to restrict moving a partition more
    than once.
   使用2的<part_power>次方个分区和<replicas>副本数创建<builder_file>.<min_part_hour>是一个分区被连续移动两次之间的最小时间间隔 swift
-ring-builder <builder_file> list_parts <search-value> [<search-value>] .. Returns a 2 column list of all the partitions that are assigned to any of the devices matching the search values given. The first column is the assigned partition number and the second column is the number of device matches for that partition. The list is ordered from most number of matches to least. If there are a lot of devices to match against, this command could take a while to run.
   返回一个两列的列表,包含与搜索值相匹配的所有设备的所有分区。
   第一列是关联的分区编号
  
第二列是与分区匹配的设备编号

   列表按匹配的编号大小从大到小排序,如果有很多设备与搜索符合,则这个命令需要多运行一会儿

swift-ring-builder <builder_file> rebalance
    Attempts to rebalance the ring by reassigning partitions that haven't been
    recently reassigned.
   r
ebalance命令尝试重新平衡环,通过重新分配分区最近没有被重新分配的分区。

swift-ring-builder <builder_file> remove <search-value> [search-value ...]
    Removes the device(s) from the ring. This should normally just be used for
    a device that has failed. For a device you wish to decommission, it's best
    to set its weight to 0, wait for it to drain all its data, then use this
    remove command. This will not take effect until after running 'rebalance'.
    This is so you can make multiple device changes and rebalance them all just
    once.
   remove命令将设备从ring中移除。一般情况下,这个命令应该仅用在那些失败的设备上。
  
如果你想将一个设备退役掉,那么最好的方式是将它的weight设置为0,待它将其上所有的数据都移走之后,再使用这个命令移除设备。
   remove操作不会重新分配partitions,只有运行了'rebalance'命令后才会进行分区的分配。因此,这种机制可以允许你一次添加删除个设备,并只执行一次rebalance实现对这些设备的分区分配。

swift-ring-builder <builder_file> search <search-value>
    Shows information about matching devices.
   显示匹配的设备的信息 swift
-ring-builder <builder_file> set_info <search-value> <ip>:<port>/<device_name>_<meta> [<search-value> <ip>:<port>/<device_name>_<meta>] ... For each search-value, resets the matched device's information. This information isn't used to assign partitions, so you can use 'write_ring' afterward to rewrite the current ring with the newer device information. Any of the parts are optional in the final <ip>:<port>/<device_name>_<meta> parameter; just give what you want to change. For instance set_info d74 _"snet: 5.6.7.8" would just update the meta data for device id 74.
   set_info命令会重新设置每一个与<search-value>相匹配的设备信息。这个信息不会用来重新分配分区,因此你可以使用'write_ring'来直接重写当前的ring。
   <ip>:<port>/<device_name>_<meta>参数的任意一个部分都是可选的,你只需要给出你需要更改的部分。
   比如,set_info d74 _"snet: 5.6.7.8"就仅仅会把id为74的设备的元数据更新为"snet: 5.6.7.8"

swift-ring-builder <builder_file> set_min_part_hours <hours>
    Changes the <min_part_hours> to the given <hours>. This should be set to
    however long a full replication/update cycle takes. We're working on a way
    to determine this more easily than scanning logs.
  
set_min_part_hours命令将<min_part_hours>设置为参数给定的<hours>.
   这个时间应该被设置的至少满足一个完整的replication/update周期。我们正在努力找到一个方法可以比看日志更简单的决定这个时间

swift-ring-builder <builder_file> set_weight <search-value> <weight>
    [<search-value> <weight] ...
    Resets the devices' weights. No partitions will be reassigned to or from
    the device until after running 'rebalance'. This is so you can make
    multiple device changes and rebalance them all just once.
  
重新设置设备的weight。set_weight操作后,设备上的partition不会重新分配,只有运行了'rebalance'命令后才会进行分区的分配。
   因此,这种机制可以允许你一次添加多个设备,并只执行一次rebalance实现对这些设备的分区分配。

 swift-ring-builder <builder_file> set_replicas <replicas>
    Changes the replica count to the given <replicas>. <replicas> may
    be a floating-point value, in which case some partitions will have
    floor(<replicas>) replicas and some will have ceiling(<replicas>)
    in the correct proportions.A rebalance is needed to make the change take effect.

    set_replicas命令用于使用参数中的<replicas>来设置副本数。
    <replicas>可以是一个浮点数,因此在一些场景中一些分区的副本数可能是floor(<replicas>),也可能是(<replicas>),这取决于正确的比例。

    需要执行一个rebalance命令来使副本设置生效。该命令是swift-1.8.0新增的。


swift-ring-builder <builder_file> validate
    Just runs the validation routines on the ring.
    仅运行builder的validate方法,使ring生效

swift
-ring-builder <builder_file> write_ring Just rewrites the distributable ring file. This is done automatically after a successful rebalance, so really this is only useful after one or more 'set_info' calls when no rebalance is needed but you want to send out the new device information.
   write_ring命令仅是用来重写分部环境下的ring文件。这个命令会在成功执行一个rebalance操作后呗自动执行。
   因此,它仅在你执行了一次或多次'set_info'命令,不想rebalance却想保留新信息时使用。

 

3. 参数格式

 在进行search设备的时候,<search_value>的格式如下:

d<device_id>z<zone>-<ip>:<port>/<device_name>_<meta>

这个格式中的任意一个部分都是可选的,例如:

z1               Matches devices in zone 1
z1-1.2.3.4       Matches devices in zone 1 with the ip 1.2.3.4
1.2.3.4          Matches devices in any zone with the ip 1.2.3.4
z1:5678          Matches devices in zone 1 using port 5678
:5678            Matches devices that use port 5678
/sdb1            Matches devices with the device name sdb1
_shiny           Matches devices with shiny in the meta data
_"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data
[::1]            Matches devices in any zone with the ip ::1
z1-[::1]:5678    Matches devices in zone 1 with ip ::1 and port 5678

下面是一个指定最精确的例子:

d74z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8"

 

4. 返回码含义

0 = operation successful
1 = operation completed with warnings
2 = error

 

 

posted @ 2013-05-10 18:55  YUKI小糖  阅读(5593)  评论(10编辑  收藏  举报