Problem
VLAN devices do not support scatter-gather
This means the that each skb needs to be linearised and thus cloned if they are trasmitted on a VLAN device
Cloning results in the original fragments being released
This breaks Xen's netfront/netback flow-control
Result
A guess can flood dom0 with packets
Very effective DoS attack on dom0 and other domUs
Work-Around
Use the credit scheduler to limit the rate of a domU's virtual interface to something close to the rate of the physical interface:
vif = [ "mac=00:16:36:6c:81:ae,bridge=eth4.100, script=vif-bridge,rate=950Mb/s" ]
Still uses quite a lot of dom0 CPU if domU sends a lot of packets
But the DoS is mitigated
Partial Solution
scatter-gather enabled VLAN interfaces
Problem is resolved for VLANS with supported physical devices
Still a problem for any other device that doesn't support scatter-gather
Patches
Included in v2.6.26-rc4
"Propagate selected feature bits to VLAN devices" and;
"Use bitmask of feature flags instead of seperate feature bit" by Patrick McHardy.
"igb: allow vlan devices to use TSO and TCP CSUM offload" by Jeff Kirsher
Patches for other drivers have also been merged
Problem 2: Bonding and Lack of Queues
Problem
The default queue on bond devices is no queue
This is because it is a software device, and generally queuing doesn't make sense on software devices
qdiscs default the queue-length of their device
Result
It was observed that netperf TCP STREAM only achieves 45-50Mbit/s when controlled by a class with a ceiling of 450Mbit/s
A 10x degredation!
Solution 1a
Set the queue length of the bonding device before adding qdiscs
ip link set txqueuelen 1000 dev bond0
Solution 1b
Set the queue length of the qdisc explicitly
tc qdisc add dev bond0 parent 1:100 handle 1100: pfifo limit 1000
Problem 3: TSO and Lack of Accounting Accuracy
Problem
Problem
If a packet is significantly larger than the MTU of the class, it is accounted as being approximately the size of the MTU.
And the giants counter for the class is incremented
In this case, the default MTU is 2047 bytes
But TCP Segmentation Offload (TSO) packets can be much larger: 64kbytes
By default Xen domUs will use TSO
Result
The result similar to no bandwidth control of TCP
Workaround 1
Disable TSO in the guest, but the guest can re-enable it
# ethtool -k eth0 | grep "tcp segmentation offload"
tcp segmentation offload: on
# ethtool -K eth0 tso off
# ethtool -k eth0 | grep "tcp segmentation offload"
tcp segmentation offload: off
Workaround 2
Set the MTU of classes to 40000
Large enough to give sufficient accuracy
Larger values will result in a loss of accuracy when accounting smaller packets
#tc class add dev peth2 parent 1:1 classid 1:101 rate 10Mbit ceil 950Mbit mtu 40000
Solution
Account for large packets
Instead of truncating the index, use rtab values multiple times
rtab[255] * (index >> 8) + rtab[index & 0xFF]
"Make HTB scheduler work with TSO" by Ranjit Manomohan was included in 2.6.23-rc1