关于 Btrfs 中的 RAID1 —— 了解 btrfs 的 raid1

https://btrfs.wiki.kernel.org/index.php?title=FAQ
https://btrfs.wiki.kernel.org/index.php?title=SysadminGuide
http://marc.info/?l=linux-btrfs&m=132575546926358&w=2

邮件列表里面有个比较有意思的话题:
      Why does Btrfs allow raid1 with mismatched drives?

把里面的讨论和 btrfs wiki 中关于 RAID1 的解释总结了一下:

    * 创建

          mkfs.btrfs -m raid1 -d raid1 <small-disk> <large-disk> or
          mkfs.btrfs -m raid1 -d raid1 <large-disk> <small-disk>

       例子:

          # mkfs.btrfs -m raid1 -d raid1 /dev/mapper/mpathb /dev/mapper/mpathc

          WARNING! - Btrfs v0.20-rc1-37-g91d9eec IS EXPERIMENTAL
          WARNING! - see http://btrfs.wiki.kernel.org before using

          adding device /dev/mapper/mpathc id 2                        # 第 2 块盘的 devid 是 2,第 1 块对应 devid 1
          fs created label (null) on /dev/mapper/mpathb                   # label 放在第 1 块盘上
              nodesize 65536 leafsize 65536 sectorsize 65536 size 20.00GB
          Btrfs v0.20-rc1-37-g91d9eec

      The fs is created with the sum of the sizes of the ***two*** disks, though
      btrfs fi df shows RAID1 for metadata, system and data.

    * 查看空间

        btrfs fi df <mountpoint>

        # btrfs fi show --all-devices /dev/dm-4
            Label: none  uuid: cc6f9a3f-c13f-4f83-b152-c2bb5978cee1
            Total devices 2 FS bytes used 8.44MB
                devid    1 size 10.00GB used 2.03GB path /dev/dm-4    # 两块盘显示的 used 不一样
                devid    2 size 10.00GB used 2.01GB path /dev/dm-5    # devid 1 的盘比第 2 块多消耗一些空间
                                                           # 指定更多磁盘时,总是 devid 1 的盘多消耗了空间
        # mount /dev/mapper/mpathb /mnt
        # btrfs fi df /mnt                    ---------
        Data, RAID1: total=1.00GB, used=8.00MB            |
        Data: total=8.00MB, used=0.00                    |
        System, RAID1: total=8.00MB, used=64.00KB         |
        System: total=4.00MB, used=0.00                  |------ 这两个显示是一样的;btrfs fi df 显示的是不一样的
        Metadata, RAID1: total=1.00GB, used=384.00KB      |
        Metadata: total=8.00MB, used=0.00                |
        # mount /dev/mapper/mpathc /mnt                  |
        # btrfs fi df /mnt                    ---------
        Data, RAID1: total=1.00GB, used=8.00MB
        Data: total=8.00MB, used=0.00
        System, RAID1: total=8.00MB, used=64.00KB
        System: total=4.00MB, used=0.00
        Metadata, RAID1: total=1.00GB, used=384.00KB
        Metadata: total=8.00MB, used=0.00

    * 有没有类似 "cat /proc/mdstat" 可以查看 raid 盘状态信息的地方?
          当前更多的是通过 btrfs fi df 命令来查看,当然,需要对显示的值做一下计算比较来得到更多的关联信息

    * 为什么 df 不适用? Why is free space so complicated?

        You might think, "My whole disk is RAID-1, so why can't you just divide 
        everything by 2 and give me a sensible value in df?".

        If everything is RAID-1 (or RAID-0, or in general all the same RAID level), 
        then yes, we could give a sane and consistent value from df. However, we 
        have plans to allow per-subvolume and per-file RAID levels. In this case, 
        it becomes impossible to give a sensible estimate as to how much space there is left.

        For example, if you have one subvolume as "single", and one as RAID-1, then 
        the first subvolume will consume raw storage at the rate of one byte for each 
        byte of data written. The second subvolume will take two bytes of raw data for 
        each byte of data written. So, if we have 30GiB of raw space available, we could 
        store 30GiB of data on the first subvolume, or 15GiB of data on the second, and 
        there is no way of knowing which it will be until the user writes that data.

        So, in general, it is impossible to give an accurate estimate of the amount of 
        free space on any btrfs filesystem. Yes, this sucks. If you have a really good 
        idea for how to make it simple for users to understand how much space they've 
        got left, please do let us know, but also please be aware that the finest minds 
        in btrfs development have been thinking about this problem for at least a couple 
        of years, and we haven't found a simple solution yet.


    * RAID 1 的空间利用率 How much space do I get with unequal devices in RAID-1 mode?

        If your largest device is bigger than all of the others put together, then you 
        will get as much space as all the smaller devicess added together. Otherwise, 
        you get half of the space of all of your devices added together.

        For example, if you have disks of size 3TB, 1TB, 1TB, your largest disk is 3TB 
        and the sum of the rest is 2TB. In this case, your largest disk is bigger than 
        the sum of the rest, and you will get 2TB of usable space.

        If you have disks of size 3TB, 2TB, 2TB, then your largest disk is 3TB and the 
        sum of the rest of 4TB. In this case, your largest disk is smaller than the sum 
        of the rest, and you will get (3+2+2)/2 = 3.5TB of usable space. 

        不过,这里的 usable space 并不等同于 RAID1 的 usable space

        <Fabian Zeindl <fabian.zeindl@gmail.com>>
              > (assuming 1GB chunksize):
              >
              > if i create a raid-1, btrfs with a 3GB and a 7GB device, it will show me ~10GB free space,
              > after saving a 1GB file, i will have 8GB left (-1GB on each device)
              > after saving another 1GB, i will have 6GB left (--- " ----)
              > after saving another 1GB, it's "suddenly" full?
        <from Roman Kapusta <roman.kapusta@gmail.com>>
              you have still 4GB free of non RAID-1 (single) space, which is
            currently unavailable, but it is planned that BTRFS will support mixed
            storage:
                some files can be RAID-1, some files can be RAID-0 and rest is basic    需要好好读下代码
                (single) storage


    * 关于实现

        btrfs doesn't actually do "RAID-1" (in the sense that blocks with the
        same address on the two disks have identical contents).    
        Btrfs's "RAID" implementation bears only passing resemblance to traditional
        RAID implementations. Instead, btrfs replicates data on a per-chunk basis.
        If the filesystem is configured to use "RAID-1", for example, chunks are
        allocated in pairs, with each chunk of the pair being taken from a different
        block device. Data written to such a chunk pair will be duplicated across both chunks.

        <from Martin Steigerwald <Martin@lichtvoll.de>>
        "Allocating as many chunks as can fit across the drives" is also pretty
        clear to me. So if BTRFS can't allocate a new chunk on two devices, its
        full. To me it seems obvious that BTRFS will not break the RAID-1
        redundancy guarentee unless a drive fails.
        Thus when using a RAID-1 with two devices, the smaller one should define
        the maximum capacity of the device. But when you use a RAID-1 with one 500
        GB and two 250 GB drives, BTRFS can replicate each chunk on the 500 GB
        drive on *one* of the both 250 GB drives.                      这里跟上面对利用率的表述是一致的
        Thus is makes perfect sense to support differently sized drives in a BTRFS
        pool.                                                    支持不同大小的设备
        My own observations with a RAID-10 across 4 devices support this. I echod
        "1" > /sys/block/sdX/delete to remove one harddisk while a dd was running
        to the RAID. BTRFS used the remaining disks. On next reboot all disks
        where available again. While BTRFS didn't start rebalancing the RAID
        automatically a btrfs filesystem balance made it fill up the previously
        failed device until all devices had the same usage. This is also described
        in the sysadmin guide: So this is what you have to care for manually. If a
        drive failed, you have to balance the filesystem so that it creates
        replicas where they are missing.



* 需要搞清楚的问题
    * 生成 RAID1 的时候,会消耗哪些磁盘空间,以至于做成 RAID1 后还没有写数据,10GB 里面就消耗了 2GB 左右的空间?
    * RAID 1 中 devid 1 的盘上比其他盘多写了哪些信息?
    * 有多块不同大小的盘做 RAID 1 的时候,每块磁盘会怎样设置初始化?多余的空间是什么样的 profile?
          需要结合 mkfs 来看
       

posted on 2012-12-05 13:47  refrag  阅读(3598)  评论(1编辑  收藏  举报

导航