关于 Btrfs 中的 RAID1 —— 了解 btrfs 的 raid1
https://btrfs.wiki.kernel.org/index.php?title=FAQ
https://btrfs.wiki.kernel.org/index.php?title=SysadminGuide
http://marc.info/?l=linux-btrfs&m=132575546926358&w=2
邮件列表里面有个比较有意思的话题:
Why does Btrfs allow raid1 with mismatched drives?
把里面的讨论和 btrfs wiki 中关于 RAID1 的解释总结了一下:
* 创建
mkfs.btrfs -m raid1 -d raid1 <small-disk> <large-disk> or
mkfs.btrfs -m raid1 -d raid1 <large-disk> <small-disk>
例子:
# mkfs.btrfs -m raid1 -d raid1 /dev/mapper/mpathb /dev/mapper/mpathc WARNING! - Btrfs v0.20-rc1-37-g91d9eec IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/mapper/mpathc id 2 # 第 2 块盘的 devid 是 2,第 1 块对应 devid 1 fs created label (null) on /dev/mapper/mpathb # label 放在第 1 块盘上 nodesize 65536 leafsize 65536 sectorsize 65536 size 20.00GB Btrfs v0.20-rc1-37-g91d9eec The fs is created with the sum of the sizes of the ***two*** disks, though btrfs fi df shows RAID1 for metadata, system and data.
* 查看空间
btrfs fi df <mountpoint> # btrfs fi show --all-devices /dev/dm-4 Label: none uuid: cc6f9a3f-c13f-4f83-b152-c2bb5978cee1 Total devices 2 FS bytes used 8.44MB devid 1 size 10.00GB used 2.03GB path /dev/dm-4 # 两块盘显示的 used 不一样 devid 2 size 10.00GB used 2.01GB path /dev/dm-5 # devid 1 的盘比第 2 块多消耗一些空间 # 指定更多磁盘时,总是 devid 1 的盘多消耗了空间 # mount /dev/mapper/mpathb /mnt # btrfs fi df /mnt --------- Data, RAID1: total=1.00GB, used=8.00MB | Data: total=8.00MB, used=0.00 | System, RAID1: total=8.00MB, used=64.00KB | System: total=4.00MB, used=0.00 |------ 这两个显示是一样的;btrfs fi df 显示的是不一样的 Metadata, RAID1: total=1.00GB, used=384.00KB | Metadata: total=8.00MB, used=0.00 | # mount /dev/mapper/mpathc /mnt | # btrfs fi df /mnt --------- Data, RAID1: total=1.00GB, used=8.00MB Data: total=8.00MB, used=0.00 System, RAID1: total=8.00MB, used=64.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.00GB, used=384.00KB Metadata: total=8.00MB, used=0.00
* 有没有类似 "cat /proc/mdstat" 可以查看 raid 盘状态信息的地方?
当前更多的是通过 btrfs fi df 命令来查看,当然,需要对显示的值做一下计算比较来得到更多的关联信息
* 为什么 df 不适用? Why is free space so complicated?
You might think, "My whole disk is RAID-1, so why can't you just divide everything by 2 and give me a sensible value in df?". If everything is RAID-1 (or RAID-0, or in general all the same RAID level), then yes, we could give a sane and consistent value from df. However, we have plans to allow per-subvolume and per-file RAID levels. In this case, it becomes impossible to give a sensible estimate as to how much space there is left. For example, if you have one subvolume as "single", and one as RAID-1, then the first subvolume will consume raw storage at the rate of one byte for each byte of data written. The second subvolume will take two bytes of raw data for each byte of data written. So, if we have 30GiB of raw space available, we could store 30GiB of data on the first subvolume, or 15GiB of data on the second, and there is no way of knowing which it will be until the user writes that data. So, in general, it is impossible to give an accurate estimate of the amount of free space on any btrfs filesystem. Yes, this sucks. If you have a really good idea for how to make it simple for users to understand how much space they've got left, please do let us know, but also please be aware that the finest minds in btrfs development have been thinking about this problem for at least a couple of years, and we haven't found a simple solution yet.
* RAID 1 的空间利用率 How much space do I get with unequal devices in RAID-1 mode?
If your largest device is bigger than all of the others put together, then you will get as much space as all the smaller devicess added together. Otherwise, you get half of the space of all of your devices added together. For example, if you have disks of size 3TB, 1TB, 1TB, your largest disk is 3TB and the sum of the rest is 2TB. In this case, your largest disk is bigger than the sum of the rest, and you will get 2TB of usable space. If you have disks of size 3TB, 2TB, 2TB, then your largest disk is 3TB and the sum of the rest of 4TB. In this case, your largest disk is smaller than the sum of the rest, and you will get (3+2+2)/2 = 3.5TB of usable space.
不过,这里的 usable space 并不等同于 RAID1 的 usable space
<Fabian Zeindl <fabian.zeindl@gmail.com>> > (assuming 1GB chunksize): > > if i create a raid-1, btrfs with a 3GB and a 7GB device, it will show me ~10GB free space, > after saving a 1GB file, i will have 8GB left (-1GB on each device) > after saving another 1GB, i will have 6GB left (--- " ----) > after saving another 1GB, it's "suddenly" full? <from Roman Kapusta <roman.kapusta@gmail.com>> you have still 4GB free of non RAID-1 (single) space, which is currently unavailable, but it is planned that BTRFS will support mixed storage: some files can be RAID-1, some files can be RAID-0 and rest is basic 需要好好读下代码 (single) storage
* 关于实现
btrfs doesn't actually do "RAID-1" (in the sense that blocks with the same address on the two disks have identical contents). Btrfs's "RAID" implementation bears only passing resemblance to traditional RAID implementations. Instead, btrfs replicates data on a per-chunk basis. If the filesystem is configured to use "RAID-1", for example, chunks are allocated in pairs, with each chunk of the pair being taken from a different block device. Data written to such a chunk pair will be duplicated across both chunks. <from Martin Steigerwald <Martin@lichtvoll.de>> "Allocating as many chunks as can fit across the drives" is also pretty clear to me. So if BTRFS can't allocate a new chunk on two devices, its full. To me it seems obvious that BTRFS will not break the RAID-1 redundancy guarentee unless a drive fails. Thus when using a RAID-1 with two devices, the smaller one should define the maximum capacity of the device. But when you use a RAID-1 with one 500 GB and two 250 GB drives, BTRFS can replicate each chunk on the 500 GB drive on *one* of the both 250 GB drives. 这里跟上面对利用率的表述是一致的 Thus is makes perfect sense to support differently sized drives in a BTRFS pool. 支持不同大小的设备 My own observations with a RAID-10 across 4 devices support this. I echod "1" > /sys/block/sdX/delete to remove one harddisk while a dd was running to the RAID. BTRFS used the remaining disks. On next reboot all disks where available again. While BTRFS didn't start rebalancing the RAID automatically a btrfs filesystem balance made it fill up the previously failed device until all devices had the same usage. This is also described in the sysadmin guide: So this is what you have to care for manually. If a drive failed, you have to balance the filesystem so that it creates replicas where they are missing.
* 需要搞清楚的问题
* 生成 RAID1 的时候,会消耗哪些磁盘空间,以至于做成 RAID1 后还没有写数据,10GB 里面就消耗了 2GB 左右的空间?
* RAID 1 中 devid 1 的盘上比其他盘多写了哪些信息?
* 有多块不同大小的盘做 RAID 1 的时候,每块磁盘会怎样设置初始化?多余的空间是什么样的 profile?
需要结合 mkfs 来看