Setup vhost-user over OVS-DPDK in SUSE
This article shows how to setup a vhost port in your vm with virtio device, implement a zero copy path for your virtual machines.
1. Backgroud
2. Pre-requirement
3. Host Setup
3.1 cpu pin
3.2 numa
3.3 huge page
3.4 vfio
4. OVS+DPDK
4.1 dpdk
4.2 ovs
5 VM setup
5.1 QEMU usuage
5.2 Libvirt usuage
6. Common issues
7. Reference
1. Background
dpdk, which is data plane development kit, it is a set of libraries to accelerate packet processing by running in userspace level, it is also used a bunch of other optimizations, like cpu affinaity, numa, huge page and so on.
This document is more of a tutorial that should guide you through all of the steps of installing DPDK and Open vSwitch from the packages, and then setup vhost-user ports for a vm running by qemu command line or libvirt, it will also record some comman issues.
2. Pre-requirement
There are a couple things need to know before running dpdk
2.1, make sure your nic supports dpdk first
2.2
cpu pin, numa, and huge page support
2.3 software requirment
kernel > 3.2
glibc > 2.7
3. Host Setup
3.1 cpu core mask
3.2. numa
This is not necessary, uma could also work
We want to make sure to run the VM on the same NUMA node as ovs-vswitchd and as the backing NIC.
To determine which NUMA node a PCI device (NIC) is on, you can cat /sys/class/net/eth<#>/device/numa_node to see either a 0 or 1.
3.3. huge page setup
3.3.1 grub command line
default_hugepagesz=1G hugepagesz=1G hugepages=4
grub2-mkconfig -o /boot/grub2/grub.cfg
3.3.2 dynamic
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
On a NUMA machine, pages should be allocated explicitly on separate nodes:
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
# echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# echo 1024 > /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages
3.3.3. mount huge pages before use
mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
# mkdir /dev/hugepages1G
# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G
but since I did not bind the right dpdk driver,
# mkdir /dev/hugepages2M
# mount -t hugetlbfs -o pagesize=2M none /dev/hugepages2M
it could also be made permanet by /etc/fstab
nodev /mnt/huge hugetlbfs defaults 0 0
for 1G page_size
nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0
make sure restart libvirtd
3.4 VFIO
make sure your cpu supports vt-d and iommu first.
VFIO is preferred for latest DPDK, because it has better performance than UIO
Once your hardware supports it, just add below in grub command line
4. OVS+DPDK
DPDK will work as a library for OVS, current SLES disable it, however opensuse included dpdk for ovs.
Will discuss with Network team further for the reason why SLE disabled it.
For verify, you could run below command:
linux-3txe:~ # ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.10.1
DPDK 18.02.2
4.1.1 DPDK setup
bind device, make sure the module is already loaded in kernel
modprobe igb_uio
dpdk_devbind --status
dpdk_devbind --bind=igb_uio 0000:02:00.0
dpdk_devbind --unbind 0000:02:00.0
For vfio-pci
modprobe vfio
modprobe vfio_pci
Sometime, need to bind special driver from vender directly.
4.2 OVS setup
For OVS that does not include dpdk,
need to build from source
For thoses includes DPDK
ovs-vsctl --no-wait init
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x6
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
4.3 Setup for VM
4.3.1 Add a bridge
ovs-vsctl add-br ovs-br0 -- set bridge ovs-br0 datapath_type=netdev
4.3.2 Add a dpdk port
ovs-vsctl add-port br0 dpdk-p0 \
-- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:01:00.0
Some NICs (i.e. Mellanox ConnectX-3) have only one PCI address associated with multiple ports. Using a PCI device like above won’t work. Instead, below usage is suggested:
$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
4.3.3 add vhost-user port qemu>2.2
Open vSwitch provides two types of vHost User ports:
- vhost-user (dpdkvhostuser)
vhost-user-client (dpdkvhostuserclient)
vHost User uses a client-server model. The server manages the vHost User sockets,
and the client connects to the server. Depending on which port type you use, dpdkvhostuser or dpdkvhostuserclient, a different configuration of the client-servermodel is used.
For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This means if OVS dies, all VMs must be restarted. On the other hand, for vhost-user-client ports, OVS acts as the client and QEMU the server. This means OVS can die and be restarted without issue, and it is also possible to restart an instance itself. For this reason, vhost-user-client ports are the preferred type for all known use cases; the only limitation is that vhost-user client mode ports require QEMU version 2.7. Ports of type vhost-user are currently deprecated and will be removed in a future release.
For vhost-user
ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1
type=dpdkvhostuser -- set Interface vhost-user1
For vhost-user-client
ovs-vsctl add-port br0 dpdkvhostclient0 \
-- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \
options:vhost-server-path=/tmp/dpdkvhostclient0
5 Guest VM
5.1 . qemu command line
qemu-system-x86_64 -name KVM-VPX -cpu host -enable-kvm -m 4096M \
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem \
-mem-prealloc -smp sockets=1,cores=2 -drive file=<absolute-path-to-disc-image-file>,if=none,id=drive-ide0-0-0,format=<disc-image-format> \
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-netdev type=tap,id=hostnet0,script=no,downscript=no,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:d1:ae,bus=pci.0,addr=0x3 \
-chardev socket,id=char0,path=</usr/local/var/run/openvswitch/vhost-user1> \
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on
--nographic
5.2. libvirt
Two main parts need be
5.2.1 set up huge page
<memoryBacking>
<hugepages>
<page size=’1048576’ unit=’KiB’/>
</hugepages>
</memoryBacking>
5.2.2 insert vhost user
<interface type=’vhostuser’>
<mac address=’52:54:00:55:55:56’/>
<source type=’unix’ path=’/var/run/openvswitch/vhost-user1’ mode=’client’/>
<model type=’virtio’/>
<address type=’pci’ domain=’0x0000’ bus=’0x00’ slot=’0x04’ function=’0x0’/>
</interface>
5.2.3 other optimization:
<vcpu placement=’static’>6</vcpu>
<cputune>
<shares>4096</shares>
<vcpupin vcpu=’0’ cpuset=’0’/>
<vcpupin vcpu=’1’ cpuset=’2’/>
<vcpupin vcpu=’2’ cpuset=’4’/>
<vcpupin vcpu=’3’ cpuset=’6’/>
<emulatorpin cpuset=’0,2,4,6’/>
</cputune>
<numatune>
<memory mode=’strict’ nodeset=’0’/>
</numatune>
5.3 Guest VM setup
vhost user port for sure, but also needs isolate cpu and huge page
Here is the VM kernel command line, set in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"
6. Comman issues:
6.1 must setup huge pages and mount them
6.2 dpdk port is necessary for ovs
6.3 ovs need includes dpdk library
6.4 rte driver should included in dpdk
mlx,ixgbe...
6.5 issue, vhost socket permission issue
7. Reference:
https://wiki.qemu.org/Documentation/vhost-user-ovs-dpdk
http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/
https://github.com/qemu/qemu/blob/master/tests/vhost-user-test.c
https://github.com/openvswitch/ovs/blob/master/Documentation/intro/install/dpdk.rst
http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/