nvgre
GRE RFC2784 工作原理
Structure of a GRE Encapsulated Packet
A GRE encapsulated packet has the form:
---------------------------------
| |
| Delivery Header |
| |
---------------------------------
| |
| GRE Header |
| |
---------------------------------
| |
| Payload packet |
| |
---------------------------------
This specification is generally concerned with the structure of the
GRE header, although special consideration is given to some of the
issues surrounding IPv4 payloads.
GRE Header
The GRE packet header has the form:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|C| Reserved0 | Ver | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum (optional) | Reserved1 (Optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Key and Sequence Number Extensions to GRE RFC2890
Extensions to GRE Header
The GRE packet header[1] has the following format:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|C| Reserved0 | Ver | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum (optional) | Reserved1 (Optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The proposed GRE header will have the following format:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|C| |K|S| Reserved0 | Ver | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum (optional) | Reserved1 (Optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key (optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (Optional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Key Present (bit 2)
If the Key Present bit is set to 1, then it indicates that the
Key field is present in the GRE header. Otherwise, the Key
field is not present in the GRE header.
Sequence Number Present (bit 3)
If the Sequence Number Present bit is set to 1, then it
indicates that the Sequence Number field is present.
Otherwise, the Sequence Number field is not present in the GRE
header.
The Key and the Sequence Present bits are chosen to be
compatible with RFC 1701 [2].
NVGRE RFC 7637
NVGRE是一个由RFC 2784定义和RFC 2890扩展的通道协议。 微软的blog
Outer Ethernet Header: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Destination MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |(Outer)Destination MAC Address | (Outer)Source MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Source MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethertype 0x0800 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Outer IPv4 Header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| HL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol 0x2F | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ GRE Header:
key was set to 1
Protocol Type field in the GRE header is set to 0x6558
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner Ethernet Header +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Inner) Destination MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |(Inner)Destination MAC Address | (Inner)Source MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Inner) Source MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethertype 0x0800 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Inner IPv4 Header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| HL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Original IP Payload | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: GRE Encapsulation Frame Format
The best starting place is to first layout the addressing scheme for IP addresses and subnets that you'd like to virtualize. When configuring Hyper-V Network Virtualization, there are two types of IP Addresses that you'll be interacting with:
- Provider Addresses (PA) - these are unique IP addresses assigned to each Hyper-V host that are routable across the physical network infrastructure. I like to think of "PA" addresses as "Physical Addresses", because they are assigned to physical Hyper-V hosts. Each Hyper-V host requires at least one PA to be assigned.
- Customer Addresses (CA) - these are unique IP addresses assigned to each Virtual Machine that will be participating on a virtualized network. I like to think of "CA" addresses as "Container Addresses", because they are the IP Addresses assigned to each VM "container" for use by the guest operating system running inside that VM. Using NVGRE, multiple CA's for VMs running on a Hyper-V host can be tunneled using a single PA on that Hyper-V host. CA's must be unique across all VMs on the same virtualized network, but CA's do not need to be unique across virtualized networks (such as in multi-tenant scenarios where each customer's VMs are isolated on separate virtualized networks).
Let's look at a simple example of NVGRE with two Hyper-V hosts using PA's and CA's:
In this example, you'll note that each Hyper-V host is assigned one PA address ( e.g., 192.168.x.x ) used for tunneling NVGRE traffic across two physical subnets ( e.g., 192.168.1.x/24 and 192.168.2.x/24 ) on the physical network. In addition, each VM is assigned a CA address ( e.g., 10.x.x.x ) that is unique within each virtualized network and is tunneled inside the NVGRE tunnel between hosts.
To separate the traffic between the two virtualized networks, the GRE headers on the tunneled packets include a GRE Key that provides a unique Virtual Subnet ID ( e.g., 5001 and 6001 ) for each virtualized network.
Based on this configuration, we have two virtualized networks ( e.g., the "Red" network and the "Blue" network ) that are isolated from one another as separate IP networks and extended across two physical Hyper-V hosts located on two different physical subnets.
Once you have the following defined for your environment in a worksheet, you're ready to move on to the next steps in configuring Hyper-V Network Virtualization:
- PA's for each Hyper-V Host
- CA's for each Virtual Machine
- Virtual Subnet ID's for each subnet to be virtualized
Neutron 理解 (3): Open vSwitch + GRE/VxLAN 组网 [Netruon Open vSwitch + GRE/VxLAN Virutal Network]
Tunneling And Network Virtualization: NVGRE, VXLAN
Demo:
Using GRE Tunnels with Open vSwitch
普通的GRE 应该是需要arp代理吗?
script: gre.sh
#!/bin/bash # sudo apt install bridge-utils REMOTE_IP=$1 SUBNET=$2 # HOST1: 192.168.0.1, HOST2: 192.169.0.1 GREIP=$3 # HOST1: 10.10.10.1, HOST2: 10.10.10.2 R_GREIP=$4 # HOST1: 10.10.10.2, HOST2: 10.10.10.1 DEV=$5 LOCAL_IP=`ip addr show $DEV| awk '/inet /{split($2,a,"/"); print a[1]}'` sudo ip tunnel add gre1 mode gre remote $REMOTE_IP local $LOCAL_IP ttl 255 sudo ip link set gre1 up sudo ip addr add $GREIP/24 dev gre1 # sudo ip route add ${SUBNET%.*}.0/24 via $R_GREIP dev gre1 # 不能工作 sudo ip route add ${SUBNET%.*}.0/24 dev gre1 #为 gre 添加ip sudo echo 1 > /proc/sys/net/ipv4/ip_forward #让服务器支持转发 # HOST1 sudo iptables -t nat -A POSTROUTING -d ${SUBNET%.*}.0/24 -j SNAT --to $GREIP#否则访问 ${SUBNET%.*}.0/24网段不通 # HOST2 iptables -t nat -A POSTROUTING -s $GREIP -d ${SUBNET%.*}.0/24 -j SNAT --to $SUBNET #否则192.168.1.X等机器访问10.1.1.x网段不通 iptables -A FORWARD -s $GREIP -m state --state NEW -m tcp -p tcp --dport 3306 -j DROP #禁止直接访问线上的3306,防止内网被破 sudo brctl addbr br1 # sudo ifconfig br1 192.169.0.7/24 sudo ip link set br1 up # sudo brctl addif br1 gre1 # 不能工作 sudo ip link add type veth sudo ifconfig veth0 ${SUBNET%.*}.7/24 up sudo ifconfig veth0 mtu 1450 sudo ifconfig veth1 up sudo ifconfig veth1 mtu 1450 sudo brctl addif br1 veth1 ip route show
on host 1: $ ./gre.sh 10.0.0.52 192.168.0.1 10.10.10.1 10.10.10.2 ens3
on host 2: $ ./gre.sh 10.0.0.32 192.169.0.1 10.10.10.2 10.10.10.1 ens3
on host 1
sudo ovs-vsctl add-br br0 sudo ovs-vsctl add-port br0 tep0 -- set interface tep0 type=internal sudo ifconfig tep0 192.168.200.20 netmask 255.255.255.0 sudo ovs-vsctl add-br br2 sudo ovs-vsctl add-port br2 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.200.21 route
# ip link add br0 type bridge
sudo ip tuntap add mode tap
sudo ifconfig tap0 192.168.200.20 netmask 255.255.255.0
sudo ip link set tap0 up
sudo ip link set br1 up
sudo brctl addif br1 tap0
sudo brctl addif br1 ens3 # 该命令会导致网络访问不了
sudo ip link add type veth sudo ifconfig veth0 192.167.0.6/24 up sudo ifconfig veth0 mtu 1450 sudo ifconfig veth1 up sudo ifconfig veth1 mtu 1450 sudo ovs-vsctl add-port br2 veth1
$ sudo ovs-vsctl add-port br0 ens3 # 该命令会导致网络访问不了
on host 2
sudo ovs-vsctl add-br br0 sudo ovs-vsctl add-port br0 tep0 -- set interface tep0 type=internal sudo ifconfig tep0 192.168.200.21 netmask 255.255.255.0 sudo ovs-vsctl add-br br2 sudo ovs-vsctl add-port br2 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.200.20 route
$ sudo ovs-vsctl show ffb98c3f-a7a4-4287-b84a-c7c2b2616c72 Bridge "br0" Port "tep0" Interface "tep0" type: internal Port "br0" Interface "br0" type: internal Bridge "br2" Port "br2" Interface "br2" type: internal Port "gre0" Interface "gre0" type: gre options: {remote_ip="192.168.200.21"} ovs_version: "2.5.2" $ route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default localhost 0.0.0.0 UG 0 0 0 ens3 10.0.0.0 * 255.255.255.0 U 0 0 0 ens3 169.254.169.254 localhost 255.255.255.255 UGH 0 0 0 ens3 192.168.200.0 * 255.255.255.0 U 0 0 0 tep0
$ sudo ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether fa:16:3e:88:b0:29 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/ether b6:98:ba:ee:7d:b6 brd ff:ff:ff:ff:ff:ff 4: br2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/ether a2:58:66:5a:94:4a brd ff:ff:ff:ff:ff:ff 5: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/ether 3e:2f:8d:26:56:47 brd ff:ff:ff:ff:ff:ff 6: tep0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/ether 62:32:8c:1d:2b:99 brd ff:ff:ff:ff:ff:ff 7: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/gre 0.0.0.0 brd 0.0.0.0 8: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 9: gre_sys@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65490 qdisc pfifo_fast master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000 link/ether a6:ee:6f:a2:0e:22 brd ff:ff:ff:ff:ff:ff
$ sudo ip addr show [7/841] 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000 link/ether fa:16:3e:88:b0:29 brd ff:ff:ff:ff:ff:ff inet 10.0.0.54/24 brd 10.0.0.255 scope global ens3 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe88:b029/64 scope link valid_lft forever preferred_lft forever 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1 link/ether b6:98:ba:ee:7d:b6 brd ff:ff:ff:ff:ff:ff 4: br2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1 link/ether a2:58:66:5a:94:4a brd ff:ff:ff:ff:ff:ff 5: br0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1 link/ether 3e:2f:8d:26:56:47 brd ff:ff:ff:ff:ff:ff 6: tep0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1 link/ether 62:32:8c:1d:2b:99 brd ff:ff:ff:ff:ff:ff inet 192.168.200.20/24 brd 192.168.200.255 scope global tep0 valid_lft forever preferred_lft forever inet6 fe80::6032:8cff:fe1d:2b99/64 scope link valid_lft forever preferred_lft forever 7: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1 link/gre 0.0.0.0 brd 0.0.0.0 8: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 9: gre_sys@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65490 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 1000 link/ether a6:ee:6f:a2:0e:22 brd ff:ff:ff:ff:ff:ff inet6 fe80::a4ee:6fff:fea2:e22/64 scope link valid_lft forever preferred_lft forever
$ sudo ovs-vsctl add-port br0 ens3 # 该命令会导致网络访问不了
sudo ip link add type veth sudo ifconfig veth0 192.167.0.6/24 up sudo ifconfig veth0 mtu 1450 sudo ifconfig veth1 up sudo ifconfig veth1 mtu 1450 sudo ovs-vsctl add-port br2 veth1
$ ip link help ... TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | macvtap | bridge | bond | ipoib | ip6tnl | ipip | sit | vxlan | gre | gretap | ip6gre | ip6gretap | vti | nlmon | bond_slave | ipvlan | geneve | bridge_slave | vrf }
深入理解 GRE tunnel GRE 与IPIP的区别。 ipip tunnel 是端对端的,通信也就只能是点对点的,而 GRE tunnel 却可以进行多播。
该ppt中内置了GRE和IPIP的包, 可供大家分析。