KVM 核心功能:磁盘虚拟化
1 磁盘虚拟化简介
QEMU-KVM 提供磁盘虚拟化,从虚拟机角度看其自身拥有的磁盘即是实际的物理磁盘。实际上,虚拟机读写的磁盘数据保存在 host 上的物理磁盘。
QEMU-KVM 主要有如下几种方式虚拟磁盘:
-
本地存储虚拟机镜像文件。
-
host 上物理磁盘或磁盘分区。
-
LVM(Logical Volume Management),逻辑分区。
-
NFS(Network File System),网络文件系统。
-
GFS(Gluster File System),分布式文件系统。
2 磁盘虚拟化配置
本节针对常用的虚拟磁盘方式进行介绍,包括本地存储虚拟机镜像文件和 LVM 逻辑分区:
2.1 本地存储镜像
本地存储镜像文件,首先需要在本地创建镜像文件,然后指定本地镜像文件为虚拟机的磁盘。
通过 qemu-img 命令创建镜像文件,qemu-img 是编译安装完 QEMU 即默认自带的软件程序,常用的 qemu-img 选项有 create 和 info,create 用来创建镜像 img,info 用来查看镜像信息:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo.img -o preallocation=off 10G Formatting 'lianhua_demo.img', fmt=raw size=10737418240 preallocation=off real 0m0.040s user 0m0.015s sys 0m0.015s [lianhua@host ~]$ qemu-img info lianhua_demo.img image: lianhua_demo.img file format: raw virtual size: 10G (10737418240 bytes) disk size: 0
如上所示,创建了一个名为 lianhua_demo.img 的格式为 raw 的镜像,该镜像大小为 10G。但是,使用 info 查看镜像信息时,镜像的实际大小为 0(disk size)。这是由于 raw 格式的镜像可以指定自己为稀疏文件,如果是稀疏文件,那么只在写数据到镜像的时候才会真正为其分配空间。
qemu-img 的 preallocation 选项可以指定是否预分配空间,它有三个值,off/full 和 falloc。off 表示禁止预分配空间;full 表示为镜像预分配空间,预分配的方式是给镜像逐字节写 0;falloc 表示预分配磁盘空间给镜像文件,但不往镜像文件中写数据。比较上述三种分配方式,如下:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_full.img -o preallocation=full 10G Formatting 'lianhua_demo_on.img', fmt=raw size=10737418240 preallocation=full real 0m22.955s user 0m0.013s sys 0m8.930s
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_falloc.img -o preallocation=falloc 10G Formatting 'lianhua_demo_falloc.img', fmt=raw size=10737418240 preallocation=falloc real 0m8.256s user 0m0.008s sys 0m8.114s [lianhua@host ~]$ du -h lianhua_demo*.img 11G lianhua_demo_falloc.img 0 lianhua_demo.img 11G lianhua_demo_full.img
镜像格式有多种,除了 raw 外,还有常用的磁盘格式 qcow2,vdi 等。
镜像分配完磁盘空间后,使用 qemu-kvm 创建虚拟机,并且将镜像文件作为虚拟机的磁盘,命令如下:
[lianhua@host ~]$ /usr/libexec/qemu-kvm -m 1024 -smp 2 -hda lianhua_demo_falloc.img -monitor stdio WARNING: Image format was not specified for 'lianhua_demo_falloc.img' and probing guessed raw. Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted. Specify the 'raw' format explicitly to remove the restrictions. QEMU 2.6.0 monitor - type 'help' for more information (qemu) VNC server running on '::1;5900' (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:1237 id "" Bus 0, device 1, function 1: IDE controller: PCI device 8086:7010 BAR4: I/O at 0xc040 [0xc04f]. id "" Bus 0, device 1, function 3: Bridge: PCI device 8086:7113 IRQ 9. id "" Bus 0, device 3, function 0: Ethernet controller: PCI device 8086:100e IRQ 11. BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff]. BAR1: I/O at 0xc000 [0xc03f]. BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe]. id ""
可以看出,镜像在虚拟机中的 pci 号为 00:01:1,且该设备为 IDE 设备。qemu-kvm 的 -hda 选项将镜像文件作为虚拟机的第一个 IDE 设备,在虚拟机中表现为 /dev/hda 设备或 /dev/sda 设备(驱动不同,表现的设备名称不同),更多磁盘选项配置可以查看 qemu-kvm 的 man 文档。
2.2 LVM 逻辑分区
LVM 逻辑分区,首先需要使用 LVM 创建 volume,然后将此 volume attach 到虚拟机上作为虚拟机的磁盘。
在 OpenStack 平台上创建 LVM volume,如下:
[root@host ~]# pvdisplay --- Physical volume --- PV Name /dev/loop2 VG Name cinder-volumes PV Size 602.34 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 154199 Free PE 146519 Allocated PE 7680 PV UUID pTkQ5Z-zNdc-LRrn-qWAX-13D6-bhbG-DdcGFD [root@host ~]# vgdisplay --- Volume group --- VG Name cinder-volumes System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1447 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 602.34 GiB PE Size 4.00 MiB Total PE 154199 Alloc PE / Size 7680 / 30.00 GiB Free PE / Size 146519 / 572.34 GiB VG UUID Mrrh1r-qKQw-bCgW-0WXi-d5Bd-OiVV-cTBrg5 [root@host ~]# lvdisplay --- Logical volume --- LV Path /dev/cinder-volumes/volume-c34555f0-fd26-42fe-a3b2-86098b590be2 LV Name volume-c34555f0-fd26-42fe-a3b2-86098b590be2 VG Name cinder-volumes LV UUID n81Af6-cWEe-LvAm-wjA3-vgKD-RxgV-qMtIq6 LV Write Access read/write LV Creation host, time host.localdomain, 2020-08-02 00:47:30 +0800 LV Status available # open 1 LV Size 26.00 GiB Current LE 6656 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:0
volume 创建成功后将 volume attach 到虚拟机上作为虚拟机的磁盘:
[root@host ~]# openstack volume list +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+ | ID | Display Name | Status | Size | Attached to | +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+ | c34555f0-fd26-42fe-a3b2-86098b590be2 | lianhua-vm1-vol | in-use | 26 | Attached to lianhua-vm1-vol on /dev/vdb | +--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
volume attach 到虚拟机上,在虚拟机中的磁盘设备名为 /dev/vdb。进入虚拟机,查看该磁盘设备的详细信息:
[root@lianhua-vm1:/home/robot] # fdisk -l | grep vdb Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors [root@lianhua-vm1:/home/robot] # lspci ... 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device [root@lianhua-vm1:/home/robot] # lspci -s 00:0b.0 -vvv 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device Subsystem: Red Hat, Inc. Device 0002 Physical Slot: 11 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at 1000 [size=64] Region 1: Memory at c0004000 (32-bit, non-prefetchable) [size=4K] Region 4: Memory at c0000000 (64-bit, prefetchable) [size=16K] Capabilities: [98] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00000800 Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> BAR=0 offset=00000000 size=00000000 Capabilities: [70] Vendor Specific Information: VirtIO: Notify BAR=4 offset=00003000 size=00001000 multiplier=00000004 Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg BAR=4 offset=00002000 size=00001000 Capabilities: [50] Vendor Specific Information: VirtIO: ISR BAR=4 offset=00001000 size=00001000 Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg BAR=4 offset=00000000 size=00001000 Kernel driver in use: virtio-pci
不同于 -hda 指定的磁盘设备,这里的磁盘设备名以 vd 开头,这是因为它们是通过 virtio 半虚拟化方式分配的磁盘设备,从上例可以看出磁盘设备 vdb 的 pci 号为 00:0b:0,其使用的驱动为 virtio-pci。
在 libvirt XML 文件的 devices 标签下定义 disk, 实现使用 virtio 半虚拟化方式分配磁盘设备:
<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/> <target dev='vdb' bus='virtio'/> <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </disk>
同理,也可以在 qemu-kvm 的 device 和 drive 选项下指定 virtio-blk-pci 参数实现 virtio 半虚拟化方式分配磁盘设备:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# /usr/libexec/qemu-kvm -m 1024 -smp 2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -monitor stdio
3 磁盘虚拟化环境部署
根据上两节的描述,部署一个简单的环境实现磁盘虚拟化及磁盘文件共享,部署环境如下:
-
使用 virtio 半虚拟化方式指定镜像文件实现磁盘虚拟化,虚拟出的磁盘设备名为 vda。
-
使用 virtio 半虚拟化方式指定 volume 实现磁盘虚拟化,虚拟出的磁盘设备名为 vdb。
-
在虚拟机内部使用 LVM 分割磁盘设备 vdb 为 lv volume,并将 volume 指定为文件系统。
-
使用 NFS 方式共享虚拟机的文件系统。
示意图如下:
1) 查看 virtio 的 libvirt XML 配置:
<devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/> <target dev='vdb' bus='virtio'/> <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial> <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/> </disk> </devices>
vda 磁盘设备所使用的镜像文件为 /var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk,它在虚拟机的磁盘设备名为 /dev/vda,且 pci 号为 00:06:0。
使用 qemu-info 查看 disk 镜像文件:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# qemu-img info disk image: disk file format: qcow2 virtual size: 40G (42949672960 bytes) disk size: 1.7G cluster_size: 65536 backing file: /var/lib/nova/instances/_base/52968ae0bfbfeef835844ee0b97be5e45d382e4c Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false
可以看出,disk 分配的虚拟磁盘容量为 40G,而它现在占的磁盘空间是 1.7G。
vdb 为 volume 分配的磁盘设备,在虚拟机中的磁盘设备名为 /dev/vdb,且 pci 号为 00:0b:0。
进入虚拟机查看磁盘设备是否分配:
[root@lianhua-vm1:/home/robot] # fdisk -l | grep vd Disk /dev/vda: 40 GiB, 42949672960 bytes, 83886080 sectors # 这里 vda 的磁盘容量为 40G /dev/vda1 * 2048 83886046 83883999 40G 83 Linux Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors [root@lianhua-vm1:/home/robot] # lspci | grep block 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
虚拟机中成功分配磁盘设备,且从 vda 中分出磁盘分区 vda1 给操作系统的文件系统使用。
2) 查看虚拟机内磁盘设备 vdb 分割的 lv volume:
[root@lianhua-vm1:/home/robot] # pvdisplay --- Physical volume --- PV Name /dev/vdb VG Name lianhua-vm1-vol PV Size 26.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 6655 Free PE 1405 Allocated PE 5250 PV UUID OqdKmO-PspN-0ZKe-M0l4-0vGD-cY7k-VjZvTJ [root@lianhua-vm1:/home/robot] # vgdisplay --- Volume group --- VG Name lianhua-vm1-vol System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 7 VG Access read/write VG Status resizable MAX LV 0 Cur LV 6 Open LV 6 Max PV 0 Cur PV 1 Act PV 1 VG Size <26.00 GiB PE Size 4.00 MiB Total PE 6655 Alloc PE / Size 5250 / <20.51 GiB Free PE / Size 1405 / <5.49 GiB VG UUID JcVrao-YnJ7-mRpK-8Rxc-i07i-WVH4-aVgAoD [root@lianhua-vm1:/home/robot] # lvdisplay --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_sys LV Name provider_sys VG Name lianhua-vm1-vol LV UUID C6byt7-5cby-h2RT-xcLg-OJU0-Qq1E-27G6jB LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800 LV Status available # open 1 LV Size <9.77 GiB Current LE 2500 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0 --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_lianhua LV Name provider_lianhua VG Name lianhua-vm1-vol LV UUID vfeZg8-PKVR-kKxv-yidf-rQqp-A7De-CvXqws LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800 LV Status available # open 1 LV Size 4.88 GiB Current LE 1250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:1 --- Logical volume --- LV Path /dev/lianhua-vm1-vol/provider_log LV Name provider_log VG Name lianhua-vm1-vol LV UUID mHdD60-QjSy-sRlz-GLmK-CFIM-l42c-QGthXa LV Write Access read/write LV Creation host, time lianhua-vm1, 2020-08-02 00:50:12 +0800 LV Status available # open 1 LV Size 1000.00 MiB Current LE 250 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:2
3) 指定 lv 的文件系统为 log/sys/lianhua,并且通过 NFS 的方式共享文件系统:
[root@lianhua-vm1:/home/robot] # df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 40G 7.0G 31G 19% / /dev/mapper/lianhua-vm1-vol-provider_sys 9.1G 37M 8.6G 1% /mnt/sys /dev/mapper/lianhua-vm1-vol-provider_lianhua 4.6G 20M 4.3G 1% /mnt/lianhua /dev/mapper/lianhua-vm1-vol-provider_log 922M 18M 838M 3% /mnt/log
(看这里详细了解 NFS)
进入 VM2 查看文件系统是否共享成功:
[root@lianhua-vm2:/mnt/log] # ls [root@lianhua-vm2:/mnt/log] # mkdir lianhua [root@lianhua-vm2:/mnt/log] # ls lianhua [root@lianhua-vm1:/mnt/log] # ls lianhua
文件共享成功,环境部署完毕。
芝兰生于空谷,不以无人而不芳。