LVM snapshot

LVM没有backup,只有snapshot,它用snapshot实现volume backup功能。

目的:了解LVM snapshot原理,顺便看看cinder是怎么使用的。

阅读一

http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html

This allows the administrator to create a new block device which presents an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don’t want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device.

  1. 创建a new block device?
    Yes,但这个block device的size可以较小,这个依赖于在创建snapshot之后,你在原来的volume上修改的数据量(copy-on-write)。

    If the snapshot logical volume becomes full it will be dropped (become unusable) so it is vitally important to allocate enough space. The amount of space necessary is dependent on the usage of the snapshot, so there is no set recipe to follow for this. If the snapshot size equals the origin size, it will never overflow.

  2. 这个device在哪里?
    和原来的LV在同一个VG中。

LVM1和LVM2:

LVM1 has read-only snapshots. Read-only snapshots work by creating an exception table, which is used to keep track of which blocks have been changed. If a block is to be changed on the origin, it is first copied to the snapshot, marked as copied in the exception table, and then the new data is written to the original volume.

  1. LVM1 snapshot是read-only的
  2. exception table记录哪些block被改动过了
  3. LVM1 snapshot使用copy-on-write方式:当original volume的数据修改时,才会把对应的block拷贝到snapshot的block中。

In LVM2, snapshots are read/write by default. Read/write snapshots work like read-only snapshots, with the additional feature that if data is written to the snapshot, that block is marked in the exception table as used, and never gets copied from the original volume.

This opens up many new possibilities that were not possible with LVM1’s read-only snapshots.
One example is to snapshot a volume, mount the snapshot, and try an experimental program that change files on that volume. If you don’t like what it did, you can unmount the snapshot, remove it, and mount the original filesystem in its place.
It is also useful for creating volumes for use with Xen. You can create a disk image, then snapshot it and modify the snapshot for a particular domU instance. You can then create another snapshot of the original volume, and modify that one for a different domU instance. Since the only storage used by a snapshot is blocks that were changed on the origin or the snapshot, the majority of the volume is shared by the domU’s.

  1. LVM2 snapshot默认是read/write的
  2. 当往snapshot里面写数据时,exception table会把该数据对应的block标记为used;当读取snapshot里面该block对应的数据时,不会从original volume中读取

阅读二

http://www.tutonics.com/2012/12/lvm-guide-part-2-snapshots.html
这里包含了详细的例子和说明。

照着例子操作一下:

有3个VG,在stack-volumes-default上实验:

[admin@maqi-openstack ~]$ sudo vgs
  VG                        #PV #LV #SN Attr   VSize   VFree
  cinder-new-volume           1  12   3 wz--n- 100.00g 69.00g
  stack-volumes-default       1   0   0 wz--n-  10.01g 10.01g
  stack-volumes-lvmdriver-1   1   0   0 wz--n-  10.01g 10.01g
  1. 创建LV,名字为origin_lv,size为2G

    [admin@maqi-openstack ~]$ sudo lvcreate -n origin_lv --size 2G stack-volumes-default
    Logical volume "origin_lv" created.
    [admin@maqi-openstack ~]$ sudo vgs
    VG                        #PV #LV #SN Attr   VSize   VFree
    cinder-new-volume           1  12   3 wz--n- 100.00g 69.00g
    stack-volumes-default       1   1   0 wz--n-  10.01g  8.01g
    stack-volumes-lvmdriver-1   1   0   0 wz--n-  10.01g 10.01g
    [admin@maqi-openstack ~]$ sudo lvdisplay /dev/stack-volumes-default/origin_lv
    --- Logical volume ---
    LV Path                /dev/stack-volumes-default/origin_lv
    LV Name                origin_lv
    VG Name                stack-volumes-default
    LV UUID                zqTSrZ-DxoM-N0cf-x8JT-t8nS-fGyG-mIe29r
    LV Write Access        read/write
    LV Creation host, time maqi-openstack.novalocal, 2015-10-10 06:36:53 +0000
    LV Status              available
    
    # open                 0
    
    LV Size                2.00 GiB
    Current LE             512
    Segments               1
    Allocation             inherit
    Read ahead sectors     auto
    - currently set to     8192
    Block device           252:20
  2. 创建origin_lv的snapshot,名叫lv_snapshot,size为1G,用–snapshot(或-s)指定为snapshot

    注意:这个lv_snapshot和origin_lv是在同一个VG中的。

    [admin@maqi-openstack ~]$ sudo lvcreate -n lv_snapshot --size 1G --snapshot /dev/stack-volumes-default/origin_lv
    Logical volume "lv_snapshot" created.
    
    [admin@maqi-openstack ~]$ sudo lvdisplay /dev/stack-volumes-default/lv_snapshot
    --- Logical volume ---
    LV Path                /dev/stack-volumes-default/lv_snapshot
    LV Name                lv_snapshot
    VG Name                stack-volumes-default
    LV UUID                5UOUvl-kHYD-or6W-IZuq-0j9R-se85-HE1tGS
    LV Write Access        read/write
    LV Creation host, time maqi-openstack.novalocal, 2015-10-10 06:47:01 +0000
    LV snapshot status     active destination for origin_lv
    LV Status              available
    
    # open                 0
    
    LV Size                2.00 GiB
    Current LE             512
    COW-table size         1.00 GiB
    COW-table LE           256
    Allocated to snapshot  0.00%
    Snapshot chunk size    4.00 KiB
    Segments               1
    Allocation             inherit
    Read ahead sectors     auto
    - currently set to     8192
    Block device           252:23
    
    [admin@maqi-openstack ~]$ sudo lvdisplay /dev/stack-volumes-default/lv_snapshot -C
    LV          VG                    Attr       LSize Pool Origin    Data%  Meta%  Move Log Cpy%Sync Convert
    lv_snapshot stack-volumes-default swi-a-s--- 1.00g      origin_lv 0.00

    Note that the size you need to use will depend on what you’re doing. Bear in mind that snapshots use “copy on write”, so data is only copied to the snapshot when it deviates from the original data. So when you access data in the snapshot volume, you effectively access the original logical volume’s data unless: (a) the original volume’s data has changed, in which case you access a copy of the original which is stored in the snapshot volume, (b) you have written to the snapshot, in which case the modified data resides in the snapshot volume.
    Thus, you can create a snapshot of say, 100GiB using only a few GiB, because most of the “snapshot” data still resides on the original volume.

    这个lv_snapshot也可以挂载,再执行读写操作。

  3. 原来的origin_lv可以用lvconvert加上–merge参数恢复到做snapshot的状态:

    [admin@maqi-openstack ~]$ sudo lvconvert --merge /dev/stack-volumes-default/lv_snapshot
    Merging of volume lv_snapshot started.
    origin_lv: Merged: 100.0%
    Merge of snapshot into logical volume origin_lv has finished.
    Logical volume "lv_snapshot" successfully removed

    恢复之后,lv_snapshot会被删除。

    上面提到的blog中还有比较好的说明:

    Have you noticed that a snapshot still relies on the unchanged original data being OK?
    In other words, you can only use a snapshot to revert an original logical volume if the original still exists. So snapshots are a fantastic feature as you’ll see below, but they are not backups in the conventional sense of having a complete copy of all the original data.
    If you want a completely independent separate copy of your data at the time of a snapshot, you can take a copy in the same way you would for any other device, e.g. using the cp or dd commands.
    Note that this way of backing up data ensures a consistent view of the data is quickly taken at a specific point in time. This is important when backing up databases for example. Obviously creation of a consistent view requires tables to be locked, but the beauty of lvm is that they need only be locked for a very small amount of time (which makes it viable when compared to locking tables and waiting hours for data to be physically copied).

Openstack cinder中LVM相关的code

Snapshot
创建snapshot:

cinder/volume/drivers/lvm.py

def create_snapshot(self, snapshot):
    """Creates a snapshot."""

    self.vg.create_lv_snapshot(self._escape_snapshot(snapshot['name']),
                               snapshot['volume_name'],
                               self.configuration.lvm_type)

create_lv_snapshot在cinder/brick/local_dev/lvm.py

@utils.retry(putils.ProcessExecutionError)
def create_lv_snapshot(self, name, source_lv_name, lv_type='default'):
    """Creates a snapshot of a logical volume.

    :param name: Name to assign to new snapshot
    :param source_lv_name: Name of Logical Volume to snapshot
    :param lv_type: Type of LV (default or thin)

    """
    source_lvref = self.get_volume(source_lv_name)
    if source_lvref is None:
        LOG.error(_LE("Trying to create snapshot by non-existent LV: %s"),
                  source_lv_name)
        raise exception.VolumeDeviceNotFound(device=source_lv_name)
    cmd = ['lvcreate', '--name', name,
           '--snapshot', '%s/%s' % (self.vg_name, source_lv_name)]
    if lv_type != 'thin':
        size = source_lvref['size']
        cmd.extend(['-L', '%sg' % (size)])

    try:
        self._execute(*cmd,
                      root_helper=self._root_helper,
                      run_as_root=True)
    except putils.ProcessExecutionError as err:
        LOG.exception(_LE('Error creating snapshot'))
        LOG.error(_LE('Cmd     :%s'), err.cmd)
        LOG.error(_LE('StdOut  :%s'), err.stdout)
        LOG.error(_LE('StdErr  :%s'), err.stderr)

也是执行lvcreate --name lv_snapshot_name --snapshot -L source_lv_name

从snapshot创建volume:

cinder/volume/drivers/lvm.py

def create_volume_from_snapshot(self, volume, snapshot):
    """Creates a volume from a snapshot."""
    self._create_volume(volume['name'],
                        self._sizestr(volume['size']),
                        self.configuration.lvm_type,
                        self.configuration.lvm_mirrors)

    # Some configurations of LVM do not automatically activate
    # ThinLVM snapshot LVs.
    self.vg.activate_lv(snapshot['name'], is_snapshot=True)

    # copy_volume expects sizes in MiB, we store integer GiB
    # be sure to convert before passing in
    volutils.copy_volume(self.local_path(snapshot),
                         self.local_path(volume),
                         snapshot['volume_size'] * units.Ki,
                         self.configuration.volume_dd_blocksize,
                         execute=self._execute,
                         sparse=self.sparse_copy_volume)

volutils.copy_volume最终会调用cinder/volume/utils.py中的_copy_volume

def _copy_volume(prefix, srcstr, deststr, size_in_m, blocksize, sync=False,
                 execute=utils.execute, ionice=None, sparse=False):
    # Use O_DIRECT to avoid thrashing the system buffer cache
    extra_flags = []
    if check_for_odirect_support(srcstr, deststr, 'iflag=direct'):
        extra_flags.append('iflag=direct')

    if check_for_odirect_support(srcstr, deststr, 'oflag=direct'):
        extra_flags.append('oflag=direct')

    # If the volume is being unprovisioned then
    # request the data is persisted before returning,
    # so that it's not discarded from the cache.
    conv = []
    if sync and not extra_flags:
        conv.append('fdatasync')
    if sparse:
        conv.append('sparse')
    if conv:
        conv_options = 'conv=' + ",".join(conv)
        extra_flags.append(conv_options)

    blocksize, count = _calculate_count(size_in_m, blocksize)

    cmd = ['dd', 'if=%s' % srcstr, 'of=%s' % deststr,
           'count=%d' % count, 'bs=%s' % blocksize]
    cmd.extend(extra_flags)

    if ionice is not None:
        cmd = ['ionice', ionice] + cmd

    cmd = prefix + cmd

    # Perform the copy
    start_time = timeutils.utcnow()
    execute(*cmd, run_as_root=True)
    duration = timeutils.delta_seconds(start_time, timeutils.utcnow())

    # NOTE(jdg): use a default of 1, mostly for unit test, but in
    # some incredible event this is 0 (cirros image?) don't barf
    if duration < 1:
        duration = 1
    mbps = (size_in_m / duration)
    LOG.debug("Volume copy details: src %(src)s, dest %(dest)s, "
              "size %(sz).2f MB, duration %(duration).2f sec",
              {"src": srcstr,
               "dest": deststr,
               "sz": size_in_m,
               "duration": duration})
    LOG.info(_LI("Volume copy %(size_in_m).2f MB at %(mbps).2f MB/s"),
             {'size_in_m': size_in_m, 'mbps': mbps})

其最终是执行了dd命令,把这个dd命令打印出来,例如:

dd command: ['dd', 'if=/dev/mapper/stack--volumes--lvmdriver--1-_snapshot--258c44c8--175e--4da8--8976--c32b0b4fe8f3', u'of=/dev/mapper/stack--volumes--lvmdriver--1-volume--48d14b2d--b880--42e8--a0a9--7198a5ec0521', 'count=1024', 'bs=1M', 'iflag=direct', 'oflag=direct']

其中iflag/oflag的说明见这里这里,direct表示不使用系统的缓存。

Backup

创建volume snapshot的请求是有cinder-volume来处理的,而创建volume backup的请求是由cinder-backup来处理的。

我的环境里没有配置cinder-backup。查看另一个环境,ceph作为cinder-volume/cinder-backup的后端,对一个volume做backup。

这个请求首先由cinder-api处理(cinder/api/contrib/backup.py中的create方法),cinder-api调用cinder-backup的api(cinder/backup/api.py中的create方法),实际是调用cinder-backup的rpcapi(cinder/backup/rpcapi.py中的create_backup方法),发送rpc请求到rabbitmq server。这个请求最终由cinder/backup/manager.py中的create_backup方法处理。略过细节,最后处理的是:

backup_service = self.service.get_backup_driver(context)
self._get_driver(backend).backup_volume(context, backup,
                                        backup_service)

把相关信息打印出来:

(Pdb) backup_service
<cinder.backup.drivers.ceph.CephBackupDriver object at 0x7f636521d4d0>

(Pdb) self._get_driver(backend)
2015-10-11 21:00:02.930 20943 DEBUG cinder.backup.manager [req-a8d8d3b9-cad0-4f9e-95cc-88b559b41dbc 2c0e6c89ac1643ee9777196e567834da bbb00d8dd4194c638b0323e86cd1d03d - - -] Driver requested for volume_backend 'ceph'. _get_driver /home/openstack/workspace/cinder/cinder/backup/manager.py:142
2015-10-11 21:00:02.930 20943 DEBUG cinder.backup.manager [req-a8d8d3b9-cad0-4f9e-95cc-88b559b41dbc 2c0e6c89ac1643ee9777196e567834da bbb00d8dd4194c638b0323e86cd1d03d - - -] Manager requested for volume_backend 'ceph'. _get_manager /home/openstack/workspace/cinder/cinder/backup/manager.py:130
<cinder.volume.drivers.rbd.RBDDriver object at 0x7f636b23afd0>

(Pdb) self._get_driver(backend).backup_volume
2015-10-11 21:00:09.778 20943 DEBUG cinder.backup.manager [req-a8d8d3b9-cad0-4f9e-95cc-88b559b41dbc 2c0e6c89ac1643ee9777196e567834da bbb00d8dd4194c638b0323e86cd1d03d - - -] Driver requested for volume_backend 'ceph'. _get_driver /home/openstack/workspace/cinder/cinder/backup/manager.py:142
2015-10-11 21:00:09.778 20943 DEBUG cinder.backup.manager [req-a8d8d3b9-cad0-4f9e-95cc-88b559b41dbc 2c0e6c89ac1643ee9777196e567834da bbb00d8dd4194c638b0323e86cd1d03d - - -] Manager requested for volume_backend 'ceph'. _get_manager /home/openstack/workspace/cinder/cinder/backup/manager.py:130
<function backup_volume at 0x7f636a1f9320>

后面两个貌似看不出什么,但是搜索backup_volume的定义,会发现它实际上在cinder/volume中:

op@ubuntu-op:/home/openstack/workspace/cinder/cinder$ grep -r 'backup_volume' . | grep -v test | grep def
./volume/drivers/ibm/gpfs.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/glusterfs.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/sheepdog.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/lvm.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/vmware/vmdk.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/rbd.py:    def backup_volume(self, context, backup, backup_service):
./volume/drivers/scality.py:    def backup_volume(self, context, backup, backup_service):
./volume/driver.py:    def backup_volume(self, context, backup, backup_service):

由此可见:volume backup虽然由cinder-backup处理,但backup volume的方法实际还是在cinder-volume中。而cinder-volume中的backup_volume方法又调用了cinder-backup中的方法。(有点绕。。。因为最终要存储到对象存储后端上)

还是以lvm为例查看一下backup_volume方法:
cinder/volume/drivers/lvm.py

def backup_volume(self, context, backup, backup_service):
    """Create a new backup from an existing volume."""
    volume = self.db.volume_get(context, backup.volume_id)
    temp_snapshot = None
    previous_status = volume['previous_status']
    if previous_status == 'in-use':
        temp_snapshot = self._create_temp_snapshot(context, volume)
        backup.temp_snapshot_id = temp_snapshot.id
        backup.save()
        volume_path = self.local_path(temp_snapshot)
    else:
        volume_path = self.local_path(volume)

    try:
        with utils.temporary_chown(volume_path):
            with fileutils.file_open(volume_path) as volume_file:
                backup_service.backup(backup, volume_file)
    finally:
        if temp_snapshot:
            self._delete_snapshot(context, temp_snapshot)
            backup.temp_snapshot_id = None
            backup.save()
  1. 如果volume处于in-use状态,那么创建一个temp_snapshot。创建的方法和创建snapshot里面提到的一致。
  2. 得到volume_path
  3. change owner, 读入volume数据到volume_file
  4. 最终调用backup_service.backup(backup, volume_file)
  5. 删除temp_snapshot

所以第4步又调用了cinder-backup中的方法:

  • 对于ceph,调用的是cinder/backup/drivers/ceph.py中的backup
  • 对于lvm,没有cinder/backup/drivers/lvm.py,而cinder/backup/drivers.py中的backup是空的:

    @abc.abstractmethod
    def backup(self, backup, volume_file, backup_metadata=False)
    """Start a backup of a specified volume."""
    return

代码结构如图:
这里写图片描述
cinder/backup/drivers/实际上是cinder-backup的对象存储的driver。比如,如果backup出来的对象要存储在swift上,那么第4步会调用cinder/backup/drivers/swift.py中的backup方法。

所以对LVM的volume做backup,可以备份到不同的对象存储后端上。

其他

Is this how LVM snapshots work?

LVM snapshots are an example of a copy-on-write snapshot solution, as Evan said. How it works is a bit different from from Evan implied, but not by a whole lot.

When you have an LVM volume with no snapshots, writes to the volume happen as you’d expect. A block is changed, and that’s it.

As soon as you create a snapshot, LVM creates a pool of blocks. This pool also contains a full copy of the LVM metadata of the volume. When writes happen to the main volume such as updating an inode, the block being overwritten is copied to this new pool and the new block is written to the main volume. This is the ‘copy-on-write’. Because of this, the more data that gets changed between when a snapshot was taken and the current state of the main volume, the more space will get consumed by that snapshot pool.

When you mount the snapshot, the meta-data written when the snapshot was taken allows the mapping of snapshot pool blocks over changed blocks in the volume (or higher level snapshot). This way when an access comes for a specific block, LVM knows which block access. As far as the filesystem on that volume is concerned, there are no snapshots.

James pointed out one of the faults of this system. When you have multiple snapshots of the same volume, every time you write to a block in the main volume you potentially trigger writes in every single snapshot. This is because each snapshot maintains its own pool of changed blocks. Also, for long snapshot trees, accessing a snapshot can cause quite a bit of computation on the server to figure out which exact block needs to be served for an access.

When you dispose of a snapshot, LVM just drops the snapshot pool and updates the snapshot tree as needed. If the dropped snapshot is part of a snapshot tree, some blocks will be copied to lower level snapshot. If it is the lowest snapshot (or the only one), the pool just gets dropped and the operation is very fast.

posted on 2016-01-24 19:21  七里山塘边  阅读(832)  评论(0编辑  收藏  举报

导航