gre网络细节
一、OpenStack网络设备的命名规律:
1、TenantA的router和Linux网络命名空间qrouter名称
root@controller:~# neutron --os-tenant-name TenantA --os-username UserA --os-password password --os-auth-url=http://localhost:5000/v2.0 router-list --field id --field name +--------------------------------------+-----------+ | id | name | +--------------------------------------+-----------+ | 680944ad-679c-4fe8-ae4b-258cd8ac337f | tenant-R1 | +--------------------------------------+-----------+
root@network:~# ip netns qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f
即租户的虚拟路由器ID号和qrouter命名相对应。
2、TenantA的network和Linux网络命名空间qdhcp名称
root@controller:~# neutron --os-tenant-name TenantA --os-username UserA --os-password password --os-auth-url=http://localhost:5000/v2.0 net-list --field id --field name +--------------------------------------+-------------+ | id | name | +--------------------------------------+-------------+ | 7c22bbd9-166c-4610-9a3d-3b8b92c77518 | tenantA-Net | | c8699820-7c6d-4441-9602-3425f2c630ec | Ext-Net | +--------------------------------------+-------------+
root@network:~# ip netns qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f
租户虚拟网络的ID号,与qdhcp命名相对应。
3、TenantA网络端口和其它的网络设备的名称
root@controller:~# neutron --os-tenant-name TenantA --os-username UserA --os-password password --os-auth-url=http://localhost:5000/v2.0 port-list +--------------------------------------+------+-------------------+----------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+----------------------------------------------------------------------------------+ | 1653ec91-ad7d-40d9-b777-f74aec697026 | | fa:16:3e:51:a2:97 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.9"} | | 2df7c3ed-dfbb-480d-9cd3-fdefa079e66a | | fa:16:3e:da:41:49 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.3"} | | 81388454-30e0-45e4-b3dd-b7b2e8dbf067 | | fa:16:3e:f7:e6:9c | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.1"} | | d7233b80-9d4b-4ef6-a60d-19b3be661069 | | fa:16:3e:75:e0:5a | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.10"} | +--------------------------------------+------+-------------------+----------------------------------------------------------------------------------+
IP地址为10.0.0.9的虚拟机(ID为bec0b963-99c0-4a56-ae04-936d47e173eb)端口为1653ec91-ad7d-40d9-b777-f74aec697026,那么与之相连的网络设备tab ,qbr,qvb,qvo的命名都是加上port ID的前缀11个字符。
验证:
通过查看libvirt XML定义文件/var/lib/nova/instances/<instance-id>/libvirt.xml可以看到qbr和tap。
<interface type="bridge"> <mac address="fa:16:3e:51:a2:97"/> <model type="virtio"/> <driver name="qemu"/> <source bridge="qbr1653ec91-ad"/> //虚机TAP设备所挂接的linux bridge <target dev="tap1653ec91-ad"/> //虚机所连接的interface </interface>
通过virsh list查看qbr连接qvb和tap
root@compute1:~# brctl show bridge name bridge id STP enabled interfaces qbr1653ec91-ad 8000.22ca68904e2f no qvb1653ec91-ad tap1653ec91-ad qbrd7233b80-9d 8000.964cf783c9e1 no qvbd7233b80-9d tapd7233b80-9d virbr0 8000.000000000000 yes
同理,qr加上内部网关IP10.0.0.1的端口ID号前缀就是qrouter下的设备名了。
qg加上路由网关10.1.101.80端口号的前缀就是qrouter下的qg设备名了。
tap加上内网dhcp10.0.0.3的端口ID号前缀就是qdhcp下的设备名了。
可以使用下面这些命令验证:
root@controller:~# neutron port-list +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+ | 1653ec91-ad7d-40d9-b777-f74aec697026 | | fa:16:3e:51:a2:97 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.9"} | | 2df7c3ed-dfbb-480d-9cd3-fdefa079e66a | | fa:16:3e:da:41:49 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.3"} | | 81388454-30e0-45e4-b3dd-b7b2e8dbf067 | | fa:16:3e:f7:e6:9c | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.1"} | | accd8dbf-0f16-4aec-b797-bbb33abcdc83 | | fa:16:3e:97:ee:cb | {"subnet_id": "ef86e785-8cec-486a-b67f-dcbba5311293", "ip_address": "10.100.0.103"} | | bfe7eaa4-26bc-4fe9-9da2-550abf44beaa | | fa:16:3e:e1:00:41 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.83"} | | d7233b80-9d4b-4ef6-a60d-19b3be661069 | | fa:16:3e:75:e0:5a | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.10"} | | eb60f9c4-2ddb-49ee-8b78-2fc2564a7600 | | fa:16:3e:78:39:e9 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.80"} | | f6812a11-c4ce-4880-8566-2206afcc612a | | fa:16:3e:9e:75:a2 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.82"} | +--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------+
root@network:~# ip netns exec qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) qg-eb60f9c4-2d Link encap:Ethernet HWaddr fa:16:3e:78:39:e9 inet addr:10.1.101.80 Bcast:10.1.101.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe78:39e9/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:31953 errors:0 dropped:0 overruns:0 frame:0 TX packets:372 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4158911 (4.1 MB) TX bytes:40876 (40.8 KB) qr-81388454-30 Link encap:Ethernet HWaddr fa:16:3e:f7:e6:9c inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fef7:e69c/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:882 errors:0 dropped:0 overruns:0 frame:0 TX packets:832 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:93440 (93.4 KB) TX bytes:96206 (96.2 KB)
root@network:~# ip netns exec qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3456 (3.4 KB) TX bytes:3456 (3.4 KB) tap2df7c3ed-df Link encap:Ethernet HWaddr fa:16:3e:da:41:49 inet addr:10.0.0.3 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:feda:4149/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:117 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:11176 (11.1 KB) TX bytes:5865 (5.8 KB)
二、系统环境
环境见OpenStack三个节点icehouse-gre模式部署
1、系统中的网络设备:
root@controller:~# nova list --all-tenant +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ | f467ba96-09c4-4eb7-b79c-5391f326c7d1 | vm001 | ACTIVE | - | Running | tenantA-Net=10.0.0.10 | | bec0b963-99c0-4a56-ae04-936d47e173eb | vm002 | ACTIVE | - | Running | tenantA-Net=10.0.0.9 | +--------------------------------------+-------+--------+------------+-------------+-----------------------+ root@controller:~# neutron net-list +--------------------------------------+-------------+----------------------------------------------------+ | id | name | subnets | +--------------------------------------+-------------+----------------------------------------------------+ | 7c22bbd9-166c-4610-9a3d-3b8b92c77518 | tenantA-Net | c37d8ed0-372e-4b24-9ba2-897c38c6ddbf 10.0.0.0/24 | | c8699820-7c6d-4441-9602-3425f2c630ec | Ext-Net | 2c4155c9-5a2e-471c-a4d8-40a86b45ab0a 10.1.101.0/24 | +--------------------------------------+-------------+----------------------------------------------------+ root@controller:~# neutron subnet-list +--------------------------------------+------+---------------+-------------------------------------------------+ | id | name | cidr | allocation_pools | +--------------------------------------+------+---------------+-------------------------------------------------+ | 2c4155c9-5a2e-471c-a4d8-40a86b45ab0a | | 10.1.101.0/24 | {"start": "10.1.101.80", "end": "10.1.101.100"} | | c37d8ed0-372e-4b24-9ba2-897c38c6ddbf | | 10.0.0.0/24 | {"start": "10.0.0.2", "end": "10.0.0.254"} | +--------------------------------------+------+---------------+-------------------------------------------------+ root@controller:~# neutron router-list +--------------------------------------+-----------+-----------------------------------------------------------------------------+ | id | name | external_gateway_info | +--------------------------------------+-----------+-----------------------------------------------------------------------------+ | 680944ad-679c-4fe8-ae4b-258cd8ac337f | tenant-R1 | {"network_id": "c8699820-7c6d-4441-9602-3425f2c630ec", "enable_snat": true} | +--------------------------------------+-----------+-----------------------------------------------------------------------------+ root@controller:~# neutron port-list +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+ | 1653ec91-ad7d-40d9-b777-f74aec697026 | | fa:16:3e:51:a2:97 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.9"} | | 2df7c3ed-dfbb-480d-9cd3-fdefa079e66a | | fa:16:3e:da:41:49 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.3"} | | 81388454-30e0-45e4-b3dd-b7b2e8dbf067 | | fa:16:3e:f7:e6:9c | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.1"} | | bfe7eaa4-26bc-4fe9-9da2-550abf44beaa | | fa:16:3e:e1:00:41 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.83"} | | d7233b80-9d4b-4ef6-a60d-19b3be661069 | | fa:16:3e:75:e0:5a | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.10"} | | eb60f9c4-2ddb-49ee-8b78-2fc2564a7600 | | fa:16:3e:78:39:e9 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.80"} | | f6812a11-c4ce-4880-8566-2206afcc612a | | fa:16:3e:9e:75:a2 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.82"} | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
root@network:~# ip netns qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f
一个外部网络Ext-Net,它的子网是2c4155c9-5a2e-471c-a4d8-40a86b45ab0a,网段为10.1.101.0/24,分配池是10.1.101.80到10.1.101.100。
有一个租户网络tenantA-Net(TenantA的网络,ID号为7c22bbd9-166c-4610-9a3d-3b8b92c77518,对应着qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518),它的子网是c37d8ed0-372e-4b24-9ba2-897c38c6ddbf,网段为10.0.0.0/24,分配池为10.0.0.2到10.0.0.254。
TenantA有一个私有路由器tenant-R1(ID号为680944ad-679c-4fe8-ae4b-258cd8ac337f,对应着qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f)
2、系统中的端口号
root@controller:~# neutron port-list +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+ | 1653ec91-ad7d-40d9-b777-f74aec697026 | | fa:16:3e:51:a2:97 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.9"} | | 2df7c3ed-dfbb-480d-9cd3-fdefa079e66a | | fa:16:3e:da:41:49 | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.3"} | | 81388454-30e0-45e4-b3dd-b7b2e8dbf067 | | fa:16:3e:f7:e6:9c | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.1"} | | bfe7eaa4-26bc-4fe9-9da2-550abf44beaa | | fa:16:3e:e1:00:41 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.83"} | | d7233b80-9d4b-4ef6-a60d-19b3be661069 | | fa:16:3e:75:e0:5a | {"subnet_id": "c37d8ed0-372e-4b24-9ba2-897c38c6ddbf", "ip_address": "10.0.0.10"} | | eb60f9c4-2ddb-49ee-8b78-2fc2564a7600 | | fa:16:3e:78:39:e9 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.80"} | | f6812a11-c4ce-4880-8566-2206afcc612a | | fa:16:3e:9e:75:a2 | {"subnet_id": "2c4155c9-5a2e-471c-a4d8-40a86b45ab0a", "ip_address": "10.1.101.82"} | +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
root@network:~# ip netns qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f root@network:~# ip netns exec qrouter-680944ad-679c-4fe8-ae4b-258cd8ac337f ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) qg-eb60f9c4-2d Link encap:Ethernet HWaddr fa:16:3e:78:39:e9 inet addr:10.1.101.80 Bcast:10.1.101.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe78:39e9/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:32619 errors:0 dropped:0 overruns:0 frame:0 TX packets:374 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4280629 (4.2 MB) TX bytes:40960 (40.9 KB) qr-81388454-30 Link encap:Ethernet HWaddr fa:16:3e:f7:e6:9c inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fef7:e69c/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:1012 errors:0 dropped:0 overruns:0 frame:0 TX packets:914 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:106266 (106.2 KB) TX bytes:108626 (108.6 KB) root@network:~# ip netns exec qdhcp-7c22bbd9-166c-4610-9a3d-3b8b92c77518 ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3456 (3.4 KB) TX bytes:3456 (3.4 KB) tap2df7c3ed-df Link encap:Ethernet HWaddr fa:16:3e:da:41:49 inet addr:10.0.0.3 Bcast:10.0.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:feda:4149/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:126 errors:0 dropped:0 overruns:0 frame:0 TX packets:50 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:12344 (12.3 KB) TX bytes:6595 (6.5 KB)
neutron port-list出来一共7个端口
显然1653ec91-ad7d-40d9-b777-f74aec697026(10.0.0.9)和d7233b80-9d4b-4ef6-a60d-19b3be661069(10.0.0.10)是虚拟机vm002和vm001的私有IP地址端口(虚拟机tap网络设备端口)。
端口f6812a11-c4ce-4880-8566-2206afcc612a(10.1.101.82)和bfe7eaa4-26bc-4fe9-9da2-550abf44beaa(10.1.101.83)是两个浮动IP。
端口81388454-30e0-45e4-b3dd-b7b2e8dbf067(10.0.0.1)和端口eb60f9c4-2ddb-49ee-8b78-2fc2564a7600(10.1.101.80)是qrouter上面的网络端口。分别作TenantA的网络环境中,子网(c37d8ed0-372e-4b24-9ba2-897c38c6ddbf,网段为10.0.0.0/24)的网关qr-81388454-30和外网通道qg-eb60f9c4-2d。【多个网络对应多个qrouter,即qr和qg设备】
端口2df7c3ed-dfbb-480d-9cd3-fdefa079e66a(10.0.0.3)是qdhcp上面的网络端口tap2df7c3ed-df,为TenantA的网络环境中,子网(c37d8ed0-372e-4b24-9ba2-897c38c6ddbf,网段为10.0.0.0/24)动态分配私有IP地址,提供子网dhcp服务。【多个子网对应多个qdhcp,即tap设备】
3、网络节点上的linux网桥和OVS网桥:
root@network:~# brctl show
bridge name bridge id STP enabled interfaces
root@network:~# ovs-vsctl show 1c921779-83ff-4493-8def-df53783ebae2 Bridge br-ex Port "qg-eb60f9c4-2d" Interface "qg-eb60f9c4-2d" type: internal Port "eth2" Interface "eth2" Port br-ex Interface br-ex type: internal Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal Port "tap2df7c3ed-df" tag: 10 Interface "tap2df7c3ed-df" type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "qr-81388454-30" tag: 10 Interface "qr-81388454-30" type: internal Bridge br-tun Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "gre-0a00011f" Interface "gre-0a00011f" type: gre options: {in_key=flow, local_ip="10.0.1.21", out_key=flow, remote_ip="10.0.1.31"} Port "gre-0a000129" Interface "gre-0a000129" type: gre options: {in_key=flow, local_ip="10.0.1.21", out_key=flow, remote_ip="10.0.1.41"} Port br-tun Interface br-tun type: internal ovs_version: "2.0.2"
可以看出网络节点没有运行虚拟机,所以linux网桥为空。
OVS网桥br-int上面有qrouter的qr端口和qdhcp的tap端口;
OVS网桥br-ex上面有qrouter的qg端口,并且br-ex与物理网卡eth2相连;
OVS网桥br-tun只是patch网桥br-int和构建隧道平面。
4、compute节点上的linux网桥和OVS网桥:
root@compute1:~# virsh list Id Name State ---------------------------------------------------- 2 instance-00000029 running 3 instance-00000028 running root@compute1:~# brctl show bridge name bridge id STP enabled interfaces qbr1653ec91-ad 8000.22ca68904e2f no qvb1653ec91-ad tap1653ec91-ad qbrd7233b80-9d 8000.964cf783c9e1 no qvbd7233b80-9d tapd7233b80-9d virbr0 8000.000000000000 yes root@compute1:~# ovs-vsctl show ///查询和更新ovs-vswitchd的配置 14b9e1b3-2d80-4380-92b0-f585cf9f74f7 Bridge br-tun //OVS Tunnel 桥br-tun Port "gre-0a000129" //端口,连接GRE Tunnel Interface "gre-0a000129" type: gre options: {in_key=flow, local_ip="10.0.1.31", out_key=flow, remote_ip="10.0.1.41"} //GRE Tunnel 是点到点之间建立的,这头的IP是10.0.1.31,那头的IP地址是10.0.1.41 Port "gre-0a000115" //端口,连接GRE Tunnel Interface "gre-0a000115" type: gre options: {in_key=flow, local_ip="10.0.1.31", out_key=flow, remote_ip="10.0.1.21"} //GRE Tunnel 是点到点之间建立的,这头的IP是10.0.1.31,那头的IP地址是10.0.1.21 Port br-tun Interface br-tun type: internal Port patch-int //端口patch-int,用来连接桥br-int Interface patch-int type: patch options: {peer=patch-tun} Bridge br-int //OVS integration网桥 br-int fail_mode: secure Port "qvod7233b80-9d" //端口,用来连接【一个虚拟网卡的TAP设备连接的linux bridge】 tag: 1 Interface "qvod7233b80-9d" Port "qvo1653ec91-ad" //端口,用来连接【一个虚拟网卡的TAP设备连接的linux bridge】 tag: 1 Interface "qvo1653ec91-ad" Port patch-tun //端口,用来连接br-tun Interface patch-tun type: patch options: {peer=patch-int} //和桥 br-tun上的patch-int是对等端口 Port br-int Interface br-int type: internal ovs_version: "2.0.2"
root@compute1:~# ovs-ofctl show br-tun ///查询和更新OpenFlow交换机和控制器 OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d63ebd331948 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE 1(patch-int): addr:9a:0f:cb:ab:46:7a //端口 patch-int的ID 是 1 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 2(gre-0a000115): addr:e2:01:f1:7d:a5:af //端口 gre-0a000115的ID 是 2 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max 3(gre-0a000129): addr:8e:b1:ce:5f:51:9b config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br-tun): addr:d6:3e:bd:33:19:48 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
可以看出计算节点compute节点上面运行2个虚拟机。
可以看到qbr1653ec91-adlinux网桥,qvb1653ec91-ad端口和tap1653ec91-ad端口。
OVS网桥br-int上有qvo端口,
OVS网桥br-tun只是patch网桥br-int和构建隧道平面。
三、虚拟机中数据流
下图是典型的Neutron-OVS-GRE网络模式图。
有两个计算节点Compute-01和Compute-02和一个网络节点。
1、网络设备简介
tap:是vm连接qbr的接口,在qbr上。与此对应的是vm上的虚拟网卡。
qbr:就是Linux bridge
qvb:veth pari bridge side。qvb和qvo是qbr和ovs之间连通的一对接口。qvb是这对接口中在qbr那边的接口。
qvo:veth pair openvswitch side。qvb和qvo是qbr和ovs之间连通的一对接口。qvb是这对接口中在ovs那边的接口。
qr:l3 agent managed port,是router端的port。
qg:l3 agent managed port,是gateway端的port。
2、虚拟机通外网(虚拟机数据从计算节点到网络节点到外网)
假设物理计算节点Compute-02上面的虚拟机VM-003的网卡eth0上有网络数据包向外部物理路由器网关10.1.101.254发出,那么数据流如下:
数据依次经过tap设备;Linux Bridge设备qbr;qvb和qvo虚拟网络设备;到达物理计算节点的OVS网桥br-int上,被打上VLAN ID Tag;br-int将数据包attach到计算节点Compute02的OVS网桥br-tun上,将VLAN ID转化为Tunnel ID;数据包再从计算节点Compute-02的OVS网桥br-tun与网络节点Network-node上的OVS网桥br-tun构成的网络隧道穿过(要通过物理网卡)将Tunnel ID转化为VLAN ID,交付到网络节点的OVS网桥br-int上;网络节点上的br-int通过qr设备借助Linux网络命名空间qrouter连通br-ex上的qg设备(这个过程router的NAT表将fixed IP地址转化为floating IP地址),将数据包交付到OVS网桥br-ex上;最后br-ex通过网络节点的外部物理网卡eth2把数据包送达到外部路由器网关。
3、计算节点中虚拟机之间数据流
(1) 同一个host上同一个子网内虚拟机之间的通信:
因为br-int是个虚拟的二层交换机,比如TenantA的两个虚拟机vm001和vm002可以经过br-int桥直接通信,不需要通过br-tun。
(2)不同主机上同一个子网内虚拟机通信:
Compute1的虚拟机发出的数据包,经过qbr到达br-int,被打上vlan ID;到达br-tun,将VLAN ID转化为Tunnel ID,从GRE Tunnel发出,到达compute2节点。
(3)虚拟机发送DHCP请求
compute节点数据包从br-int到br-tun通过GRE隧道到network节点br-tun,再经过br-int到qdhcp,qdhcp返回其fixed IP地址,原路返回。
4、分别介绍计算节点和网络节点的网络设备
计算节点:
(1)与虚拟机相连的tap设备
每个虚拟机都有一个虚拟网卡eth0,eth0和主机上的一个TAP设备连接,该TAP设备直接挂载在一个linux bridge qbr上,qbr和br-int相连。其实理想的情况下,tap设备能和br-int直接相连就好了,如图中绿色框所示。因为OpenStack要借助TAP设备的iptables rules实现安全组,但是TAP和OVS网桥br-int直接连接的话,br-int不兼容iptables规则,所以OpenStack就用了一个变通的权宜之计,多加了一层linux bridge。导致OVS br-int和linux 网桥都是二层桥,但同时出现了。
Neutron使用tap设备的iptables来实现Security groups
查看虚拟机vm002的tap设备上的iptables:
root@compute1:~# iptables -S |grep tap1653ec91-ad -A neutron-openvswi-FORWARD -m physdev --physdev-out tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-sg-chain -A neutron-openvswi-FORWARD -m physdev --physdev-in tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-sg-chain -A neutron-openvswi-INPUT -m physdev --physdev-in tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-o1653ec91-a -A neutron-openvswi-sg-chain -m physdev --physdev-out tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-i1653ec91-a -A neutron-openvswi-sg-chain -m physdev --physdev-in tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-o1653ec91-a
OpenStack Neutron在neutron-openvswi-sg-chain上实现security groups。
使用默认安全组的情况:
neutron-openvswi-i1653ec91-a控制进入虚拟机的traffic
root@compute1:~# iptables -S |grep neutron-openvswi-i1653ec91-a -N neutron-openvswi-i1653ec91-a -A neutron-openvswi-i1653ec91-a -m state --state INVALID -j DROP -A neutron-openvswi-i1653ec91-a -m state --state RELATED,ESTABLISHED -j RETURN -A neutron-openvswi-i1653ec91-a -p udp -m udp -m multiport --dports 1:65535 -j RETURN -A neutron-openvswi-i1653ec91-a -s 10.0.0.10/32 -j RETURN -A neutron-openvswi-i1653ec91-a -p icmp -j RETURN -A neutron-openvswi-i1653ec91-a -p tcp -m tcp -m multiport --dports 1:65535 -j RETURN -A neutron-openvswi-i1653ec91-a -s 10.0.0.3/32 -p udp -m udp --sport 67 --dport 68 -j RETURN -A neutron-openvswi-i1653ec91-a -j neutron-openvswi-sg-fallback -A neutron-openvswi-sg-chain -m physdev --physdev-out tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-i1653ec91-a
neutron-openvswi-o1653ec91-a控制从虚拟机出去的traffic
root@compute1:~# iptables -S |grep neutron-openvswi-o1653ec91-a -N neutron-openvswi-o1653ec91-a -A neutron-openvswi-INPUT -m physdev --physdev-in tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-o1653ec91-a -A neutron-openvswi-o1653ec91-a -p udp -m udp --sport 68 --dport 67 -j RETURN -A neutron-openvswi-o1653ec91-a -j neutron-openvswi-s1653ec91-a -A neutron-openvswi-o1653ec91-a -p udp -m udp --sport 67 --dport 68 -j DROP -A neutron-openvswi-o1653ec91-a -m state --state INVALID -j DROP -A neutron-openvswi-o1653ec91-a -m state --state RELATED,ESTABLISHED -j RETURN -A neutron-openvswi-o1653ec91-a -j RETURN -A neutron-openvswi-o1653ec91-a -j neutron-openvswi-sg-fallback -A neutron-openvswi-sg-chain -m physdev --physdev-in tap1653ec91-ad --physdev-is-bridged -j neutron-openvswi-o1653ec91-a
添加一条security group规则允许使用TCP 22端口
root@controller:~# neutron --os-tenant-name TenantA --os-username UserA --os-password password --os-auth-url=http://localhost:5000/v2.0 security-group-rule-create --protocol tcp --port-range-min 22 --port-range-max 22 --direction ingress default Created a new security_group_rule: +-------------------+--------------------------------------+ | Field | Value | +-------------------+--------------------------------------+ | direction | ingress | | ethertype | IPv4 | | id | be3d6a06-be6b-4f51-b1a5-294ad2a0a261 | | port_range_max | 22 | | port_range_min | 22 | | protocol | tcp | | remote_group_id | | | remote_ip_prefix | | | security_group_id | 8bd8fb6b-7141-4900-8321-390cc1a5d999 | | tenant_id | 60a10cd7a61b493d910eabd353c07567 | +-------------------+--------------------------------------+
那么tap设备的iptables会出现下面变化:
Connection to neutron failed: [Errno 111] Connection refused root@compute1:~# iptables -S | grep 22 -A FORWARD -d 192.168.122.0/24 -o virbr0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -s 192.168.122.0/24 -i virbr0 -j ACCEPT -A neutron-openvswi-i1653ec91-a -p tcp -m tcp --dport 22 -j RETURN -A neutron-openvswi-id7233b80-9 -p tcp -m tcp --dport 22 -j RETURN
(2)OVS一体化网桥br-int
br-int是OpenvSwitch创建的虚拟网桥,但在实际运行中它充当着虚拟交换机的角色。br-int上的端口tap设备将宿主机上的虚拟机连接到同一网络交换层上。再透过本机OVS网桥br-tun的互联协议可以将OpenStack系统架构中所有节点的br-int组织成一个更大的虚拟交换机BR-INT{compuer-01-br-int + compuer-02-br-int….}。
每一个使用neutron net-create 命令创建的network都有一个新的vlan ID。见ovsl-vsctl show命令显示结果中的Port的tag值。
br-int处理从VM进出的traffic的vlan ID。
(3)OVS通道网络br-tun
br-tun是OVS创建的虚拟网桥,它的作用是向下直接与br-int连接作为网络数据的进出口;对上通过特定的通信协议与各个节点上的br-tun相连构成一个扁平的通信/通道层。如果把所有的br-int构建的抽象层定义为虚拟二层网络,那么所有的br-tun构成的抽象层便是虚拟三层网络了。
br-tun使用OpenFlow规则处理vlan ID和Tunnel ID 的转换
从下面OpenFlow rule tables可见两种ID的转化过程:
root@compute1:~# ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d63ebd331948
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
1(patch-int): addr:9a:0f:cb:ab:46:7a //端口patch-int的ID是 1
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(gre-0a000115): addr:e2:01:f1:7d:a5:af //端口gre-0a000115的ID是 2
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(gre-0a000129): addr:8e:b1:ce:5f:51:9b //端口gre-0a000129的ID是 3
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-tun): addr:d6:3e:bd:33:19:48
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
root@compute1:~# ovs-ofctl dump-flows br-tun NXST_FLOW reply (xid=0x4): cookie=0x0, duration=99058.105s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=1,in_port=3 actions=resubmit(,2)//从端口3即gre-0a000129进来的traffic会被重新执行table 2的rule cookie=0x0, duration=164986.43s, table=0, n_packets=303, n_bytes=29712, idle_age=7626, hard_age=65534, priority=1,in_port=1 actions=resubmit(,1)//从端口1即patch-int进来的traffic重新执行table1 cookie=0x0, duration=164981.72s, table=0, n_packets=188, n_bytes=28694, idle_age=7626, hard_age=65534, priority=1,in_port=2 actions=resubmit(,2)//从端口2即gre-0a000115进来的traffic重新执行table2 cookie=0x0, duration=164986.109s, table=0, n_packets=4, n_bytes=320, idle_age=65534, hard_age=65534, priority=0 actions=drop cookie=0x0, duration=164985.783s, table=1, n_packets=257, n_bytes=25328, idle_age=7626, hard_age=65534, priority=1,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)//重新执行table 20的rule cookie=0x0, duration=164985.31s, table=1, n_packets=46, n_bytes=4384, idle_age=7631, hard_age=65534, priority=1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21) cookie=0x0, duration=164979.109s, table=2, n_packets=188, n_bytes=28694, idle_age=7626, hard_age=65534, priority=1,tun_id=0x2 actions=mod_vlan_vid:1,resubmit(,10)//从neutron node来的traffic,打上vlan ID 1,重新执行table 10 的rule cookie=0x0, duration=164984.991s, table=2, n_packets=8, n_bytes=648, idle_age=65534, hard_age=65534, priority=0 actions=drop cookie=0x0, duration=164984.676s, table=3, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop cookie=0x0, duration=164984.395s, table=10, n_packets=188, n_bytes=28694, idle_age=7626, hard_age=65534, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1///学习规则table20,从port1即patch-int发出 cookie=0x0, duration=164984.067s, table=20, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=resubmit(,21)//重新执行table 21的rule cookie=0x0, duration=164979.293s, table=21, n_packets=36, n_bytes=3576, idle_age=7631, hard_age=65534, dl_vlan=1 actions=strip_vlan,set_tunnel:0x2,output:3,output:2//去掉vlan ID,打上tunnel ID 2即neutron节点的tunnel ID,从端口2即gre端口发出 cookie=0x0, duration=164983.75s, table=21, n_packets=10, n_bytes=808, idle_age=65534, hard_age=65534, priority=0 actions=drop
网络节点:
(1)OVS通道网桥br-tun
它与计算节点上的br-tun作用相同,只是作为通道层用于连接别的物理节点。唯一不同的是这个br-tun连接的是网络节点的br-int,网络节点br-int与计算节点的br-int区别较大。
(2)OVS一体化网桥br-int
br-int是OVS创建的虚拟网桥,也起到了虚拟交换机的作用。上面主要有两类设备:一类是tap设备,另一类是qr设备。
linux网络命名空间qdhcp和qrouter均由l3-agent所创建,用来隔离管理租户的虚拟网络和路由。
br-int的tap设备,ip地址为xxx.xxx.xxx.3与dnsmasq进程构成dhcp,为新创建的虚拟机动态分配私有IP地址。
br-int上的qr设备,IP地址一般为xxx.xxx.xxx.1与br-ex的qg设备构成qrouter,为租户网络做路由转发,通过qg打通租户内部的虚拟网络和外部的物理网络。
(3)OVS外部网桥
br-ex是OVS创建的虚拟网桥,网桥上有qg设备端口,它是打通租户网络和外部网络的重要通道。另外br-ex与物理网卡(图中是eth2)相连,通往internet网络。
http://docs.openstack.org/admin-guide-cloud/content/under_the_hood_openvswitch.html