virsh - brctl - common networking configurations used by libvirt


http://wiki.libvirt.org/page/Networking


This page provides an introduction to the common networking configurations used by libvirt based applications. This information applies to all hypervisors, whether Xen, KVM or another. For additional information consult the libvirt network architecture docs.

The two common setups are "virtual network" or "shared physical device". The former is identical across all distributions and available out-of-the-box. The latter needs distribution specific manual configuration.

NAT forwarding (aka "virtual networks")

Host configuration

Every standard libvirt installation provides NAT based connectivity to virtual machines out of the box. This is the so called 'default virtual network'. You can verify that it is available with

# virsh net-list --all
Name                 State      Autostart 
-----------------------------------------
default              active     yes

If it is missing, then the example XML config can be reloaded & activated


# virsh net-define /usr/share/libvirt/networks/default.xml
Network default defined from /usr/share/libvirt/networks/default.xml
# virsh net-autostart default
Network default marked as autostarted
# virsh net-start default
Network default started

When the libvirt default network is running, you will see an isolated bridge device. This device explicitly does *NOT* have any physical interfaces added, since it uses NAT + forwarding to connect to outside world. Do not add interfaces

# brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.000000000000	yes

Libvirt will add iptables rules to allow traffic to/from guests attached to the virbr0 device in the INPUT, FORWARD, OUTPUT and POSTROUTING chains. It will also attempt to enable ip_forward. Some other applications may disable it, so the best option is to add the following to /etc/sysctl.conf

 net.ipv4.ip_forward = 1

If you are already running dnsmasq on your machine, please see libvirtd and dnsmasq.

Guest configuration

Once the host configuration is complete, a guest can be connected to the virtual network based on the network name. E.g. to connect a guest to the 'default' virtual network, you need to edit the domain configuration file for this guest:

  virsh edit <guest>

where <guest> is the name or uuid of the guest. Add the following snippet of XML to the config file:

  <interface type='network'>
     <source network='default'/>
     <mac address='00:16:3e:1a:b3:4a'/>
  </interface>

N.B. the MAC address is optional and will be automatically generated if omitted.

Applying modifications to the network

Sometimes, one needs to edit the network definition and apply the changes on the fly. The most common scenario for this is adding new static MAC+IP mappings for the network's DHCP server. If you edit the network with "virsh net-edit", any changes you make won't take effect until the network is destroyed and re-started, which unfortunately will cause a all guests to lose network connectivity with the host until their network interfaces are explicitly re-attached.

virsh net-update

Fortunately, many changes to the network configuration (including the aforementioned addition of a static MAC+IP mapping for DHCP) can be done with "virsh net-update", which can be told to enact the changes immediately. For example, to add a DHCP static host entry to the network named "default" mapping MAC address 53:54:00:00:01 to IP address 192.168.122.45 and hostname "bob", you could use this command:

    virsh net-update default add ip-dhcp-host \
          "<host mac='52:54:00:00:00:01' \
           name='bob' ip='192.168.122.45' />" \
           --live --config

Along with the "add" subcommand, virsh net-update also has a "delete" sub-command as well as "modify" (for some items), "add-first", and "add-last".

The config items in a network that can be changed with virsh net-update are:

   ip-dhcp-host
   ip-dhcp-range (add/delete only, no modify)
   forward-interface (add/delete only)
   portgroup
   dns-host
   dns-txt
   dns-srv

In each case, the final argument on the commandline (aside from "--live --config") should be the XML section that you want to add/modify or delete. For example, the proper XML for "virsh net-update default add forward-interface" would be something like "<interface dev='eth20'/>" (note the careful use of quotes - due to the XML containing spaces and shell redirection characters, you must put quotes around the entire XML snippet, but this means that any quotes within the XML must either be single quotes, or be escaped with a backslash.)

Arbitrary changes to the network

Although the most common cases of changing network config can be handled with "virsh net-update", there are some parts of the config that can't be modified in this way, and in those cases you will be left with all running guests detached from the network after it is restarted. In order to solve this problem, one possible approach would be to use a script to re-attach all interfaces on all machines after the network has been started.

An example of such script (which worked at one time in the past, and may still work, but has been reported to *not* work by at least one user) is available here. Links to an updated/verified operational script are welcome.

Forwarding Incoming Connections

By default, guests that are connected via a virtual network with <forward mode='nat'/> can make any outgoing network connection they like. Incoming connections are allowed from the host, and from other guests connected to the same libvirt network, but all other incoming connections are blocked by iptables rules.

If you would like to make a service that is on a guest behind a NATed virtual network publicly available, you can setup libvirt's "hook" script for qemu to install the necessary iptables rules to forward incoming connections to the host on any given port HP to port GP on the guest GNAME:

1) Determine a) the name of the guest "G" (as defined in the libvirt domain XML), b) the IP address of the guest "I", c) the port on the guest that will receive the connections "GP", and d) the port on the host that will be forwarded to the guest "HP".

(To assure that the guest's IP address remains unchanged, you can either configure the guest OS with static ip information, or add a <host> element inside the <dhcp> element of the network that is used by your guest. See the libvirt network XML documentation address section for defails and an example.)

2) Stop the guest if it's running.

3) Create the file /etc/libvirt/hooks/qemu (or add the following to an already existing hook script), with contents similar to the following (replace GNAME, IP, GP, and HP appropriately for your setup):

Use the basic script below or see an "advanced" version, which can handle several different machines and port mappings here (improvements are welcome) or here's a python script which does a similar thing and is easy to understand and configure (improvements are welcome):

#!/bin/bash
# used some from advanced script to have multiple ports: use an equal number of guest and host ports

# Update the following variables to fit your setup
Guest_name=GUEST_NAME
Guest_ipaddr=GUEST_IP
Host_ipaddr=HOST_IP
Host_port=(  'HOST_PORT1' 'HOST_PORT2' )
Guest_port=( 'GUEST_PORT1' 'GUEST_PORT2' )

length=$(( ${#Host_port[@]} - 1 ))
if [ "${1}" = "${Guest_name}" ]; then
   if [ "${2}" = "stopped" ] || [ "${2}" = "reconnect" ]; then
       for i in `seq 0 $length`; do
               iptables -t nat -D PREROUTING -d ${Host_ipaddr} -p tcp --dport ${Host_port[$i]} -j DNAT --to ${Guest_ipaddr}:${Guest_port[$i]}
               iptables -D FORWARD -d ${Guest_ipaddr}/32 -p tcp -m state --state NEW -m tcp --dport ${Guest_port[$i]} -j ACCEPT
       done
   fi
   if [ "${2}" = "start" ] || [ "${2}" = "reconnect" ]; then
       for i in `seq 0 $length`; do
               iptables -t nat -A PREROUTING -d ${Host_ipaddr} -p tcp --dport ${Host_port[$i]} -j DNAT --to ${Guest_ipaddr}:${Guest_port[$i]}
               iptables -I FORWARD -d ${Guest_ipaddr}/32 -p tcp -m state --state NEW -m tcp --dport ${Guest_port[$i]} -j ACCEPT
       done
   fi
fi

4) chmod +x /etc/libvirt/hooks/qemu

5) Restart the libvirtd service.

6) Start the guest.

(NB: This method is a hack, and has one annoying flaw in versions of libvirt prior to 0.9.13 - if libvirtd is restarted while the guest is running, all of the standard iptables rules to support virtual networks that were added by libvirtd will be reloaded, thus changing the order of the above FORWARD rule relative to a reject rule for the network, hence rendering this setup non-working until the guest is stopped and restarted. Thanks to the new "reconnect" hook in libvirt-0.9.13 and newer (which is used by the above script if available), this flaw is not present in newer versions of libvirt (however, this hook script should still be considered a hack).

Bridged networking (aka "shared physical device")

Host configuration

The NAT based connectivity is useful for quick & easy deployments, or on machines with dynamic/sporadic networking connectivity. More advanced users will want to use full bridging, where the guest is connected directly to the LAN. The instructions for setting this up vary by distribution, and even by release.

Important Note: Unfortunately, wireless interfaces cannot be attached to a Linux host bridge, so if your connection to the external network is via a wireless interface ("wlanX"), you will not be able to use this mode of networking for your guests.

Important Note: If, after trying to use the bridge interface, you find your network link becomes dead and refuses to work again, it might be that the router/switch upstream is blocking "unauthorized switches" in the network (for example, by detecting BPDU packets). You'll have to change its configuration to explicitly allow the host machine/network port as a "switch".

Fedora/RHEL Bridging

This outlines how to setup briding using standard network initscripts

Disabling Xen's network scripts

If using Xen it is recommended to disable its network munging by editing /etc/xen/xend-config.sxp and changing the line

 (network-script network-bridge)

To be

 (network-script /bin/true)
Disabling NetworkManager

As of the time of writing (Fedora 12), NetworkManager still does not support bridging, so it is necessary to use "classic" network initscripts for the bridge, and to explicitly mark them as independent from NetworkManager (the "NM_CONTROLLED=no" lines in the scripts below).

If desired, you can also completely disable the NetworkManager:

# chkconfig NetworkManager off
# chkconfig network on
# service NetworkManager stop
# service network start
Creating network initscripts

In the /etc/sysconfig/network-scripts directory it is neccessary to create 2 config files. The first (ifcfg-eth0) defines your physical network interface, and says that it will be part of a bridge:

# cat > ifcfg-eth0 <<EOF
DEVICE=eth0
HWADDR=00:16:76:D6:C9:45
ONBOOT=yes
BRIDGE=br0
NM_CONTROLLED=no
EOF

Obviously change the HWADDR to match your actual NIC's address. You may also wish to configure the device's MTU here using e.g. MTU=9000.

The second config file (ifcfg-br0) defines the bridge device:

# cat > ifcfg-br0 <<EOF
DEVICE=br0
TYPE=Bridge
BOOTPROTO=dhcp
ONBOOT=yes
DELAY=0
NM_CONTROLLED=no
EOF

WARNING: The line TYPE=Bridge is case-sensitive - it must have uppercase 'B' and lower case 'ridge'

After changing this restart networking (or simply reboot)

 # service network restart

The final step is to disable netfilter on the bridge:

 # cat >> /etc/sysctl.conf <<EOF
 net.bridge.bridge-nf-call-ip6tables = 0
 net.bridge.bridge-nf-call-iptables = 0
 net.bridge.bridge-nf-call-arptables = 0
 EOF
 # sysctl -p /etc/sysctl.conf

It is recommended to do this for performance and security reasons. See Fedora bug #512206. Alternatively you can configure iptables to allow all traffic to be forwarded across the bridge:

# echo "-I FORWARD -m physdev --physdev-is-bridged -j ACCEPT" > /etc/sysconfig/iptables-forward-bridged
# lokkit --custom-rules=ipv4:filter:/etc/sysconfig/iptables-forward-bridged
# service libvirtd reload

You should now have a "shared physical device", to which guests can be attached and have full LAN access

 # brctl show
 bridge name     bridge id               STP enabled     interfaces
 virbr0          8000.000000000000       yes
 br0             8000.000e0cb30550       yes             eth0

Note how this bridge is completely independant of the virbr0. Do *NOT* attempt to attach a physical device to 'virbr0' - this is only for NAT connectivity

Debian/Ubuntu Bridging

This outlines how to setup bridging using standard network interface config files

Disabling NetworkManager

Stop network manager

 sudo stop network-manager

Create an override file for the upstart job:

 echo "manual" | sudo tee /etc/init/network-manager.override

from https://help.ubuntu.com/community/NetworkManager#Disabling_NetworkManager

Altering the interface config

First take down the interface you wish to bridge

 ifdown eth0

Edit /etc/network/interfaces and find the config for the physical interface, which looks something like

 allow-hotplug eth0
 iface eth0 inet static
        address 192.168.2.4
        netmask 255.255.255.0
        network 192.168.2.0
        broadcast 192.168.2.255
        gateway 192.168.2.2

Remove the 'allow-hotplug eth0' line, replacing it with 'auto br0', and change the next line with iface name to 'br0', so it now starts with

 auto br0
 iface br0 inet static

And then define the interface as being a bridge and specify its ports

       bridge_ports eth0
       bridge_stp on
       bridge_maxwait 0
       bridge_fd 0

Note: bridge_stp may cause issues with passing DHCP information to/from guest machines. If you expirence problems, try omitting the "bridge_stp on" line and restart the interface.

The complete config should now look like

 auto br0
 iface br0 inet static
         address 192.168.2.4
         netmask 255.255.255.0
         network 192.168.2.0
         broadcast 192.168.2.255
         gateway 192.168.2.2
         bridge_ports eth0
         bridge_stp on
         bridge_maxwait 0

The interface can now be started with

 ifup br0

Finally add the '/etc/sysctl.conf' settings

net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0

And then load the settings with

 sysctl -p /etc/sysctl.conf  

1) To ensure that the bridge sysctl settings get loaded on boot, add this line to '/etc/rc.local' just before the 'exit 0' line. This is a work around for Ubuntu bug #50093. 2) Also to stop Circumventing Path MTU Discovery issues with MSS Clamping

 *** Sample rc.local file ***
 /sbin/sysctl -p /etc/sysctl.conf
 iptables -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu
 exit 0

To verify that the changes have taken affect, please run

 tail /proc/sys/net/bridge/*
 iptables -L  FORWARD

You should now have a "shared physical device", to which guests can be attached and have full LAN access

 # brctl show
 bridge name     bridge id               STP enabled     interfaces
 virbr0          8000.000000000000       yes
 br0             8000.000e0cb30550       yes             eth0

Note how this bridge is completely independant of the virbr0. Do *NOT* attempt to attach a physical device to 'virbr0' - this is only for NAT connectivity

Guest configuration

In order to let your virtual machines use this bridge, their configuration should include the interface definition as described in Bridge to LAN. In essence you are specifying the bridge name to connect to. Assuming a shared physical device where the bridge is called "br0", the following guest XML would be used:

 <interface type='bridge'>
    <source bridge='br0'/>
    <mac address='00:16:3e:1a:b3:4a'/>
    <model type='virtio'/>   # try this if you experience problems with VLANs
 </interface>

NB, the mac address is optional and will be automatically generated if omitted.

To edit the virtual machine's configuration, use:

virsh edit <VM name>

For more information, see the FAQ entry at:

http://wiki.libvirt.org/page/FAQ#Where_are_VM_config_files_stored.3F_How_do_I_edit_a_VM.27s_XML_config.3F

PCI Passthrough of host network devices

It is possible to directly assign a host's PCI network device to a guest. One pre-requisite for doing this assignment is that the host must support either the Intel VT-d or AMD IOMMU extensions. There are two methods of setting up assignment of a PCI device to a guest:

Assignment with <hostdev>

This is the traditional method of assigning any generic PCI device to a guest. It's covered well in the following guide:

libvirt PCI Device Assignment

Assignment with <interface type='hostdev'> (SRIOV devices only)

SRIOV network cards provide multiple "Virtual Functions" (VF) that can each be individually assigned to a guest using PCI device assignment, and each will behave as a full physical network device. This permits many guests to gain the performance advantage of direct PCI device assignment, while only using a single slot on the physical machine.

These VFs can be assigned to guests in the traditional manner using <hostdev>, however that method ends up being problematic because (unlike regular network devices) SRIOV VF network devices do not have permanent unique MAC addresses, but are instead given a new and different random MAC address each time the host OS is rebooted. The result will be that even if the guest is assigned the same VF each time, any time the host is rebooted the guest will see that its network adapter has a new MAC address, which will lead to the guest believing there is new hardware connected, requiring re-configuration of the guest's network settings.

It is possible for the host to set the MAC address prior to assigning the VF to the guest, but there is no provision for this in the <hostdev> settings (since <hostdev> is for a generic PCI device, it knows nothing of function-specific items like MAC address). In order to solve this problem, libvirt-0.9.10 added a new <interface type='hostdev'> (documented here). This new type of interface device behaves as a hybrid of an <interface> and a <hostdev> - libvirt will first do any network-specific hardware/switch initialization indicated (such as setting the MAC address, and/or associating with an 802.1Qbh switch), then perform the PCI device assignment to the guest.

In order to use <interface type='hostdev'>, you must have an SRIOV-capable network card, host hardware that supports either the Intel VT-d or AMD IOMMU extensions, and you must learn the PCI address of the VF that you wish to assign (see this document for instructions on how to do that).

Once you have verified/learned the above information, you can edit your guest's domain configuration to have a device entry like the following:

 ...
 <devices>
   ...
   <interface type='hostdev' managed='yes'>
     <source>
       <address type='pci' domain='0x0' bus='0x00' slot='0x07' function='0x0'/>
     </source>
     <mac address='52:54:00:6d:90:02'>
     <virtualport type='802.1Qbh'>
       <parameters profileid='finance'/>
     </virtualport>
   </interface>
   ...
 </devices>

(Note that if you do not provide a mac address, one will be automatically generated, just as with any other type of interface device. Also, the <virtualport> element is only used if you are connecting to an 802.11Qgh hardware switch (802.11Qbg (a.k.a. "VEPA") switches are currently not supported in this mode)).

When the guest starts, it should see a network device of the type provided by the physical adapter, with the configured MAC address. This MAC address will remain unchanged across guest and host reboots.

Assignment from a pool of SRIOV VFs in a libvirt <network> definition

Hard coding the PCI address of a particular VF into a guest's configuration has two serious limitations:

1) The specified VF must be available any time the guest is started, implying that the administrator must permanently assign each VF to a single guest (or modify the configuration of a guest to specify a currently unused VF's PCI address each time the guest is started).

2) If the guest is moved to another host, that host must have exactly the same hardware in the same location on the PCI bus (or, again, the guest configuration must be modified prior to start).

Starting with libvirt 0.10.0, it is possible to avoid both of these problems by creating a libvirt network with a device pool containing all the VFs of an SR-IOV device, then configuring the guest to reference this network; each time the guest is started, a single VF will be allocated from the pool and assigned to the guest; when the guest is stopped, the VF will be returned to the pool for use by another guest.

The following is an example network definition that will make available a pool of all VFs for the SR-IOV adapter with its PF (Physical Function) at "eth3' on the host:

 <network>
   <name>passthrough</name>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth3'/>
   </forward>
 </network>

To use this network, place the above text in, e.g., /tmp/passthrough.xml (replaceing "eth3" with the netdev name of your own SR-IOV device's PF), then execute the following commands:

  virsh net-define /tmp/passthrough.xml
  virsh net-autostart passthrough
  virsh net-start passthrough.

Although only a single device is shown, libvirt will automatically derive the list of all VFs associated with that PF the first time a guest is started with an interface definition like the following:

  <interface type='network'>
    <source network='passthrough'>
  </interface>

You can verify this by running "virsh net-dumpxml passthrough" after starting the first guest that uses the network; you will get output similar to the following:

 <network connections='1'>
   <name>passthrough</name>
   <uuid>a6b49429-d353-d7ad-3185-4451cc786437</uuid>
   <forward mode='hostdev' managed='yes'>
     <pf dev='eth3'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x1'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x3'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x5'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x10' function='0x7'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x1'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x3'/>
     <address type='pci' domain='0x0000' bus='0x02' slot='0x11' function='0x5'/>
   </forward>
 </network>

Other networking docs/links



posted @ 2016-07-25 15:41  张同光  阅读(162)  评论(0编辑  收藏  举报