Setup vhost-user over OVS-DPDK in SUSE

 

This article shows how to setup a vhost port in your vm with virtio device, implement a zero copy path for your virtual machines. 

1. Backgroud

2. Pre-requirement

3. Host Setup

3.1 cpu pin

3.2 numa

3.3 huge page

3.4 vfio

4. OVS+DPDK

4.1 dpdk

4.2 ovs

5 VM setup

5.1 QEMU usuage

5.2 Libvirt usuage

6. Common issues

7. Reference

 

1. Background

dpdk, which is data plane development kit, it is a set of libraries to accelerate packet processing by running in userspace level, it is also used a bunch of other optimizations, like cpu affinaity, numa, huge page and so on.

This document is more of a tutorial that should guide you through all of the steps of installing DPDK and Open vSwitch from the packages, and then setup vhost-user ports  for a vm running by qemu command line or libvirt, it will also record  some comman issues.

 

2. Pre-requirement 

There are a couple things need to know before running dpdk

2.1, make sure your nic supports dpdk first 

https://core.dpdk.org/supported/

2.2

cpu pin, numa, and huge page support

 

2.3 software requirment

kernel > 3.2

glibc > 2.7

3. Host Setup

3.1 cpu core mask

3.2. numa

This is not necessary, uma could also work

We want to make sure to run the VM on the same NUMA node as ovs-vswitchd and as the backing NIC.

To determine which NUMA node a PCI device (NIC) is on, you can cat /sys/class/net/eth<#>/device/numa_node to see either a 0 or 1.

3.3. huge page setup

3.3.1 grub command line

default_hugepagesz=1G hugepagesz=1G hugepages=4

grub2-mkconfig -o /boot/grub2/grub.cfg

3.3.2 dynamic

echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

On a NUMA machine, pages should be allocated explicitly on separate nodes:

echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

# echo 4 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

# echo 1024 > /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages

3.3.3. mount huge pages before use

mkdir /mnt/huge

mount -t hugetlbfs nodev /mnt/huge

# mkdir /dev/hugepages1G

# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G

but since I did not bind the right dpdk driver,

# mkdir /dev/hugepages2M

# mount -t hugetlbfs -o pagesize=2M none /dev/hugepages2M

it could also be made permanet by /etc/fstab

 nodev /mnt/huge hugetlbfs defaults 0 0

for 1G page_size

nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0

make sure restart libvirtd

3.4 VFIO

make sure your cpu supports vt-d and iommu first.

VFIO is preferred for latest DPDK, because it has better performance than UIO

Once your hardware supports it, just add below in grub command line

 

4. OVS+DPDK

DPDK will work as a library for OVS, current SLES disable it, however opensuse included dpdk for ovs.

Will discuss with Network team further for the reason why SLE disabled it.

 

For verify, you could run below command:

linux-3txe:~ # ovs-vswitchd --version

ovs-vswitchd (Open vSwitch) 2.10.1

DPDK 18.02.2    

4.1.1 DPDK setup

bind device, make sure the module is already loaded in kernel

modprobe igb_uio

dpdk_devbind --status

dpdk_devbind --bind=igb_uio 0000:02:00.0

dpdk_devbind --unbind 0000:02:00.0

For vfio-pci

modprobe vfio

modprobe vfio_pci

Sometime, need to bind special driver from vender directly.

4.2 OVS setup

For OVS that does not include dpdk, 

need to build from source

For thoses includes DPDK

ovs-vsctl --no-wait init

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x6

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true

4.3 Setup for VM

4.3.1 Add a bridge

ovs-vsctl add-br ovs-br0 -- set bridge ovs-br0 datapath_type=netdev

4.3.2  Add a dpdk port

ovs-vsctl add-port br0 dpdk-p0 \

   -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:01:00.0

Some NICs (i.e. Mellanox ConnectX-3) have only one PCI address associated with multiple ports. Using a PCI device like above won’t work. Instead, below usage is suggested:

$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \

    options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"

4.3.3 add vhost-user port  qemu>2.2

Open vSwitch provides two types of vHost User ports:

  • vhost-user (dpdkvhostuser)
  • vhost-user-client (dpdkvhostuserclient)

vHost User uses a client-server model. The server manages the vHost User sockets, 

and the client connects to the server. Depending on which port type you use, dpdkvhostuser or dpdkvhostuserclient, a different configuration of the client-servermodel is used.

For vhost-user ports, Open vSwitch acts as the server and QEMU the client. This means if OVS dies, all VMs must be restarted. On the other hand, for vhost-user-client ports, OVS acts as the client and QEMU the server. This means OVS can die and be restarted without issue, and it is also possible to restart an instance itself. For this reason, vhost-user-client ports are the preferred type for all known use cases; the only limitation is that vhost-user client mode ports require QEMU version 2.7. Ports of type vhost-user are currently deprecated and will be removed in a future release.

For vhost-user

ovs-vsctl add-port ovs-br0 vhost-user1 -- set Interface vhost-user1 

type=dpdkvhostuser -- set Interface vhost-user1

For vhost-user-client

ovs-vsctl add-port br0 dpdkvhostclient0 \

    -- set Interface dpdkvhostclient0 type=dpdkvhostuserclient \

       options:vhost-server-path=/tmp/dpdkvhostclient0

5 Guest VM 

5.1 . qemu command line

qemu-system-x86_64 -name KVM-VPX -cpu host -enable-kvm -m 4096M \

-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem \

-mem-prealloc -smp sockets=1,cores=2 -drive file=<absolute-path-to-disc-image-file>,if=none,id=drive-ide0-0-0,format=<disc-image-format> \

-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \

-netdev type=tap,id=hostnet0,script=no,downscript=no,vhost=on \

-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:d1:ae,bus=pci.0,addr=0x3 \

-chardev socket,id=char0,path=</usr/local/var/run/openvswitch/vhost-user1> \

-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on 

--nographic

5.2. libvirt

Two main parts need be 

5.2.1 set up huge page

<memoryBacking>

    <hugepages>

      <page size=’1048576’ unit=’KiB’/>

    </hugepages>

  </memoryBacking>

5.2.2 insert vhost user 

<interface type=’vhostuser’>

      <mac address=’52:54:00:55:55:56’/>

      <source type=’unix’ path=’/var/run/openvswitch/vhost-user1’ mode=’client’/>

      <model type=’virtio’/>

      <address type=’pci’ domain=’0x0000’ bus=’0x00’ slot=’0x04’ function=’0x0’/>

    </interface>

5.2.3 other optimization:

 <vcpu placement=’static’>6</vcpu>

  <cputune>

    <shares>4096</shares>

    <vcpupin vcpu=’0’ cpuset=’0’/>

    <vcpupin vcpu=’1’ cpuset=’2’/>

    <vcpupin vcpu=’2’ cpuset=’4’/>

    <vcpupin vcpu=’3’ cpuset=’6’/>

    <emulatorpin cpuset=’0,2,4,6’/>

  </cputune>

  <numatune>

    <memory mode=’strict’ nodeset=’0’/>

  </numatune>

5.3 Guest VM setup

vhost user port for sure, but also needs isolate cpu and huge page

Here is the VM kernel command line, set in /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G hugepages=2 isolcpus=1,2,3"

6. Comman issues:

6.1 must setup huge pages and mount them

6.2 dpdk port is necessary for ovs

6.3 ovs need includes dpdk library

6.4 rte driver should included in dpdk

mlx,ixgbe...

6.5 issue, vhost socket permission issue

 

7. Reference:

https://wiki.qemu.org/Documentation/vhost-user-ovs-dpdk

http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/

https://github.com/qemu/qemu/blob/master/tests/vhost-user-test.c

https://github.com/openvswitch/ovs/blob/master/Documentation/intro/install/dpdk.rst

http://docs.openvswitch.org/en/latest/topics/dpdk/vhost-user/

posted @ 2019-01-15 03:34  lyan_tech  阅读(550)  评论(0编辑  收藏  举报