k8s单机容器网络(20250216)
k8s单机容器网络(20250216)
Linux 容器能看见的“网络栈”,实际上是被隔离在它自己的 Network Namespace 当中的。
“网络栈”,就包括了:网卡(Network Interface)、回环设备(Loopback Device)、路由表(Routing Table)和 iptables 规则。
Veth Pair 设备
Veth Pair 设备的特点是:它被创建出来后,总是以两张虚拟网卡(Veth Peer)的形式成对出现的。并且,从其中一个“网卡”发出的数据包,可以直接出现在与它对应的另一张“网卡”上,哪怕这两个“网卡”在不同的 Network Namespace 里。
Microsoft Windows [版本 10.0.26100.2894]
(c) Microsoft Corporation。保留所有权利。
C:\Users\admin>ssh root@192.168.117.207
root@192.168.117.207's password:
Last login: Mon Feb 17 08:19:32 2025
[root@k8s-master ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1ee1263b4193 cbb01a7bd410 "/coredns -conf /etc…" 1 second ago Up 1 second k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
829516e501fa registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 3 seconds ago Up 2 seconds k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
e0c8a6330d0e 9344fce2372f "/usr/local/bin/kube…" 7 seconds ago Up 6 seconds k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
255fea7d86a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 8 seconds ago Up 7 seconds k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
36c5922e79eb registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 9 seconds ago Up 8 seconds k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
1cfe981dc26a a0eed15eed44 "etcd --advertise-cl…" 23 seconds ago Up 23 seconds k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
17717a8530ef 6fc5e6b7218c "kube-scheduler --au…" 23 seconds ago Up 23 seconds k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
e0df13dfff62 8a9000f98a52 "kube-apiserver --ad…" 24 seconds ago Up 23 seconds k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
6a21496a57a4 138fb5a3a2e3 "kube-controller-man…" 24 seconds ago Up 23 seconds k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
5631104357a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
562543f7a8d6 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
16dbdd75513f registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 25 seconds ago Up 25 seconds k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
5bfab6a1a042 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 26 seconds ago Up 25 seconds k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
[root@k8s-master ~]# docker start nginx-1
nginx-1
[root@k8s-master ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
063af5a1782b 17e960f4e39c "start_runit" 12 seconds ago Up 12 seconds k8s_calico-node_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_66
133fda8d5c2f cbb01a7bd410 "/coredns -conf /etc…" 22 seconds ago Up 21 seconds k8s_coredns_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_12
2cad37aaa64d 08c1b67c88ce "/usr/bin/kube-contr…" 22 seconds ago Up 22 seconds k8s_calico-kube-controllers_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_11
245ed185ea4a registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 27 seconds ago Up 26 seconds k8s_POD_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_8
60a93585eea1 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 27 seconds ago Up 27 seconds k8s_POD_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_9
1ee1263b4193 cbb01a7bd410 "/coredns -conf /etc…" 45 seconds ago Up 45 seconds k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
829516e501fa registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 47 seconds ago Up 46 seconds k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
e0c8a6330d0e 9344fce2372f "/usr/local/bin/kube…" 51 seconds ago Up 50 seconds k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
255fea7d86a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 52 seconds ago Up 51 seconds k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
36c5922e79eb registry.aliyuncs.com/google_containers/pause:3.8 "/pause" 53 seconds ago Up 52 seconds k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
1cfe981dc26a a0eed15eed44 "etcd --advertise-cl…" About a minute ago Up About a minute k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
17717a8530ef 6fc5e6b7218c "kube-scheduler --au…" About a minute ago Up About a minute k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
e0df13dfff62 8a9000f98a52 "kube-apiserver --ad…" About a minute ago Up About a minute k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
6a21496a57a4 138fb5a3a2e3 "kube-controller-man…" About a minute ago Up About a minute k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
5631104357a5 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
562543f7a8d6 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
16dbdd75513f registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
5bfab6a1a042 registry.aliyuncs.com/google_containers/pause:3.8 "/pause" About a minute ago Up About a minute k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
d85077c98a69 nginx "/docker-entrypoint.…" 18 hours ago Up 12 seconds 80/tcp nginx-1
[root@k8s-master ~]# docker exec -it nginx-1 /bin/bash
root@d85077c98a69:/# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.2 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet)
RX packets 14 bytes 1252 (1.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@d85077c98a69:/# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 172.17.0.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
#宿主机
root@d85077c98a69:/# exit
exit
[root@k8s-master ~]# ifconfig
cali6632e2eedff: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
RX packets 3 bytes 125 (125.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 770 (770.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cali7b6489f2f47: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
RX packets 3 bytes 125 (125.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 770 (770.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
calieaec58fb34e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 1000 (Ethernet)
RX packets 3 bytes 125 (125.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 770 (770.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:5fff:fe05:698c prefixlen 64 scopeid 0x20<link>
ether 02:42:5f:05:69:8c txqueuelen 0 (Ethernet)
RX packets 3 bytes 125 (125.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 770 (770.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.117.207 netmask 255.255.255.0 broadcast 192.168.117.255
inet6 fe80::20c:29ff:fe96:278c prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:96:27:8c txqueuelen 1000 (Ethernet)
RX packets 554 bytes 64561 (63.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 596 bytes 65850 (64.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 49719 bytes 16290594 (15.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49719 bytes 16290594 (15.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth6881202: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::408f:cdff:fe98:623a prefixlen 64 scopeid 0x20<link>
ether 42:8f:cd:98:62:3a txqueuelen 0 (Ethernet)
RX packets 3 bytes 167 (167.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18 bytes 1566 (1.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@k8s-master ~]#
[root@k8s-master ~]# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02425f05698c no veth6881202
[root@k8s-master ~]#
这就使得 Veth Pair 常常被用作连接不同 Network Namespace 的“网线”。
我们启动了一个叫作 nginx-1 的容器
这个容器里有一张叫作 eth0 的网卡,它正是一个 Veth Pair 设备在容器里的这一端。
通过 route 命令查看 nginx-1 容器的路由表,我们可以看到,这个 eth0 网卡是这个容器里的默认路由设备;所有对 172.17.0.0/16 网段的请求,也会被交给 eth0 来处理(第二条 172.17.0.0 路由规则)。
通过宿主机 ifconfig 命令的输出,你可以看到,nginx-1 容器对应的 Veth Pair 设备,在宿主机上是一张虚拟网卡。它的名字叫作veth6881202
并且,通过 brctl show 的输出,你可以看到这张网卡被“插”在了 docker0 上。
如果我们再在这台宿主机上启动另一个 Docker 容器,比如 nginx-2
[root@k8s-master ~]# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02425f05698c no veth6881202
[root@k8s-master ~]# docker run -d --name nginx-2 nginx
e3b1a33fa82952f99bdf47e1451d05d83a9686cb006798744d2e593f02cf65c8
[root@k8s-master ~]# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02425f05698c no veth40408f3
veth6881202
[root@k8s-master ~]#
查看容器ip
[root@k8s-master ~]#
[root@k8s-master ~]# docker inspect nginx-1
[
{
"Id": "d85077c98a69846efe9bf17c4b1b4efb2152ec2078f5de483edc524c674eed76",
"Created": "2025-02-16T06:21:15.681636573Z",
"Path": "/docker-entrypoint.sh",
----------
"Links": null,
"Aliases": null,
"MacAddress": "02:42:ac:11:00:02",
"DriverOpts": null,
"NetworkID": "5ce1ccec1789844b6a4712acd0c8d6f0ef9fba840c00f53be667a0dd6fbae39c",
"EndpointID": "786e7d287ca79fda20dc3895bb64b9830a99f1989538fd503e9f877e4ad574f3",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": null
}
}
}
}
]
ip为"IPAddress": "172.17.0.2",
进入nginx-2来ping nginx-1(curl也行)
[root@k8s-master ~]# docker exec -it nginx-2 /bin/bash
root@e3b1a33fa829:/# ping 172.17.0.2
bash: ping: command not found
root@e3b1a33fa829:/# yum -y install ping
bash: yum: command not found
root@e3b1a33fa829:/# apt-get install -y iputils-ping
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package iputils-ping
root@e3b1a33fa829:/# curl http://172.17.0.2
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@e3b1a33fa829:/#
当你在 nginx-1 容器里访问 nginx-2 容器的 IP 地址(比如 ping 172.17.0.3)的时候,这个目的 IP 地址会匹配到 nginx-1 容器里的第二条路由规则。可以看到,这条路由规则的网关(Gateway)是 0.0.0.0,这就意味着这是一条直连规则,即:凡是匹配到这条规则的 IP 包,应该经过本机的 eth0 网卡,通过二层网络直接发往目的主机。
这个 eth0 网卡,是一个 Veth Pair,它的一端在这个 nginx-1 容器的 Network Namespace 里,而另一端则位于宿主机上(Host Namespace),并且被“插”在了宿主机的 docker0 网桥上。
一旦一张虚拟网卡被“插”在网桥上,它就会变成该网桥的“从设备”。从设备会被“剥夺”调用网络协议栈处理数据包的资格,从而“降级”成为网桥上的一个端口。而这个端口唯一的作用,就是接收流入的数据包,然后把这些数据包的“生杀大权”(比如转发或者丢弃),全部交给对应的网桥。
在收到这些 ARP 请求之后,docker0 网桥就会扮演二层交换机的角色,把 ARP 广播转发到其他被“插”在 docker0 上的虚拟网卡上。这样,同样连接在 docker0 上的 nginx-2 容器的网络协议栈就会收到这个 ARP 请求,从而将 172.17.0.3 所对应的 MAC 地址回复给 nginx-1 容器。
有了这个目的 MAC 地址,nginx-1 容器的 eth0 网卡就可以将数据包发出去。
被限制在 Network Namespace 里的容器进程,实际上是通过 Veth Pair 设备 + 宿主机网桥的方式,实现了跟同其他容器的数据交换。
当一个容器试图连接到另外一个宿主机时,比如:ping 10.168.0.3,它发出的请求数据包,首先经过 docker0 网桥出现在宿主机上。然后根据宿主机的路由表里的直连路由规则(10.168.0.0/24 via eth0)),对 10.168.0.3 的访问请求就会交给宿主机的 eth0 处理。
这个数据包就会经宿主机的 eth0 网卡转发到宿主机网络上,最终到达 10.168.0.3 对应的宿主机上。当然,这个过程的实现要求这两台宿主机本身是连通的
当你遇到容器连不通“外网”的时候,你都应该先试试 docker0 网桥能不能 ping 通,然后查看一下跟 docker0 和 Veth Pair 设备相关的 iptables 规则是不是有异常,往往就能够找到问题的答案了。
veth pair: 虚拟1 - docker0 - 虚拟2,每个上面都有一个地址,虚拟1,2不需要解析包,网桥docker0来解析,并做转发操作
“跨主通信”问题
如果在另外一台宿主机(比如:10.168.0.3)上,也有一个 Docker 容器。那么,我们的 nginx-1 容器又该如何访问它呢?
在 Docker 的默认配置下,一台宿主机上的 docker0 网桥,和其他宿主机上的 docker0 网桥,没有任何关联,它们互相之间也没办法连通。所以,连接在这些网桥上的容器,自然也没办法进行通信了。
如果我们通过软件的方式,创建一个整个集群“公用”的网桥,然后把集群里的所有容器都连接到这个网桥上,不就可以相互通信了吗?
构建这种容器网络的核心在于:我们需要在已有的宿主机网络上,再通过软件构建一个覆盖在已有宿主机网络之上的、可以把所有容器连通在一起的虚拟网络。所以,这种技术就被称为:Overlay Network(覆盖网络)。
Overlay Network 本身,可以由每台宿主机上的一个“特殊网桥”共同组成。比如,当 Node 1 上的 Container 1 要访问 Node 2 上的 Container 3 的时候,Node 1 上的“特殊网桥”在收到数据包之后,能够通过某种方式,把数据包发送到正确的宿主机,比如 Node 2 上。而 Node 2 上的“特殊网桥”在收到数据包后,也能够通过某种方式,把数据包转发给正确的容器,比如 Container 3。
甚至,每台宿主机上,都不需要有一个这种特殊的网桥,而仅仅通过某种方式配置宿主机的路由表,就能够把数据包转发到正确的宿主机上。
这里的关键在于,容器要想跟外界进行通信,它发出的 IP 包就必须从它的 Network Namespace 里出来,来到宿主机上。而解决这个问题的方法就是:为容器创建一个一端在容器里充当默认网卡、另一端在宿主机上的 Veth Pair 设备。