k8s单机容器网络(20250216)

k8s单机容器网络(20250216)

​ Linux 容器能看见的“网络栈”,实际上是被隔离在它自己的 Network Namespace 当中的。

​ “网络栈”,就包括了:网卡(Network Interface)、回环设备(Loopback Device)、路由表(Routing Table)和 iptables 规则。

Veth Pair 设备

​ Veth Pair 设备的特点是:它被创建出来后,总是以两张虚拟网卡(Veth Peer)的形式成对出现的。并且,从其中一个“网卡”发出的数据包,可以直接出现在与它对应的另一张“网卡”上,哪怕这两个“网卡”在不同的 Network Namespace 里。

Microsoft Windows [版本 10.0.26100.2894]
(c) Microsoft Corporation。保留所有权利。

C:\Users\admin>ssh root@192.168.117.207
root@192.168.117.207's password:
Last login: Mon Feb 17 08:19:32 2025
[root@k8s-master ~]# docker ps
CONTAINER ID   IMAGE                                               COMMAND                  CREATED          STATUS          PORTS     NAMES
1ee1263b4193   cbb01a7bd410                                        "/coredns -conf /etc…"   1 second ago     Up 1 second               k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
829516e501fa   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 3 seconds ago    Up 2 seconds              k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
e0c8a6330d0e   9344fce2372f                                        "/usr/local/bin/kube…"   7 seconds ago    Up 6 seconds              k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
255fea7d86a5   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 8 seconds ago    Up 7 seconds              k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
36c5922e79eb   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 9 seconds ago    Up 8 seconds              k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
1cfe981dc26a   a0eed15eed44                                        "etcd --advertise-cl…"   23 seconds ago   Up 23 seconds             k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
17717a8530ef   6fc5e6b7218c                                        "kube-scheduler --au…"   23 seconds ago   Up 23 seconds             k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
e0df13dfff62   8a9000f98a52                                        "kube-apiserver --ad…"   24 seconds ago   Up 23 seconds             k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
6a21496a57a4   138fb5a3a2e3                                        "kube-controller-man…"   24 seconds ago   Up 23 seconds             k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
5631104357a5   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 25 seconds ago   Up 25 seconds             k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
562543f7a8d6   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 25 seconds ago   Up 25 seconds             k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
16dbdd75513f   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 25 seconds ago   Up 25 seconds             k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
5bfab6a1a042   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 26 seconds ago   Up 25 seconds             k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
[root@k8s-master ~]# docker start nginx-1
nginx-1
[root@k8s-master ~]# docker ps
CONTAINER ID   IMAGE                                               COMMAND                  CREATED              STATUS              PORTS     NAMES
063af5a1782b   17e960f4e39c                                        "start_runit"            12 seconds ago       Up 12 seconds                 k8s_calico-node_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_66
133fda8d5c2f   cbb01a7bd410                                        "/coredns -conf /etc…"   22 seconds ago       Up 21 seconds                 k8s_coredns_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_12
2cad37aaa64d   08c1b67c88ce                                        "/usr/bin/kube-contr…"   22 seconds ago       Up 22 seconds                 k8s_calico-kube-controllers_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_11
245ed185ea4a   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 27 seconds ago       Up 26 seconds                 k8s_POD_coredns-857d9ff4c9-ntrmg_kube-system_9a07dc52-b60a-4376-add2-5a128335c9df_8
60a93585eea1   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 27 seconds ago       Up 27 seconds                 k8s_POD_calico-kube-controllers-558d465845-x59c8_kube-system_1586cb4f-6051-4cf2-bcbc-7a05f93739ee_9
1ee1263b4193   cbb01a7bd410                                        "/coredns -conf /etc…"   45 seconds ago       Up 45 seconds                 k8s_coredns_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_13
829516e501fa   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 47 seconds ago       Up 46 seconds                 k8s_POD_coredns-857d9ff4c9-29ldj_kube-system_9ee2e5e5-d728-4c02-a87e-8dcaab82fbd7_8
e0c8a6330d0e   9344fce2372f                                        "/usr/local/bin/kube…"   51 seconds ago       Up 50 seconds                 k8s_kube-proxy_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_9
255fea7d86a5   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 52 seconds ago       Up 51 seconds                 k8s_POD_calico-node-9fhpq_kube-system_92a3a119-8007-48a9-8743-0afdf65f592c_7
36c5922e79eb   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 53 seconds ago       Up 52 seconds                 k8s_POD_kube-proxy-nq4x2_kube-system_a3ee7cb5-f97d-4339-8f9e-01e0e15874ba_7
1cfe981dc26a   a0eed15eed44                                        "etcd --advertise-cl…"   About a minute ago   Up About a minute             k8s_etcd_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_9
17717a8530ef   6fc5e6b7218c                                        "kube-scheduler --au…"   About a minute ago   Up About a minute             k8s_kube-scheduler_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_16
e0df13dfff62   8a9000f98a52                                        "kube-apiserver --ad…"   About a minute ago   Up About a minute             k8s_kube-apiserver_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_10
6a21496a57a4   138fb5a3a2e3                                        "kube-controller-man…"   About a minute ago   Up About a minute             k8s_kube-controller-manager_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_15
5631104357a5   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 About a minute ago   Up About a minute             k8s_POD_kube-apiserver-k8s-master_kube-system_bc05f019b265f704d6a2ffb204a2c88f_7
562543f7a8d6   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 About a minute ago   Up About a minute             k8s_POD_kube-controller-manager-k8s-master_kube-system_51eafc84967051e22b58cf0ebce14e35_7
16dbdd75513f   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 About a minute ago   Up About a minute             k8s_POD_kube-scheduler-k8s-master_kube-system_299cca9182c20d90f643981b13c43213_7
5bfab6a1a042   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 About a minute ago   Up About a minute             k8s_POD_etcd-k8s-master_kube-system_e4b42e5b51c6629d934233cc43f26a22_7
d85077c98a69   nginx                                               "/docker-entrypoint.…"   18 hours ago         Up 12 seconds       80/tcp    nginx-1
[root@k8s-master ~]# docker exec -it nginx-1 /bin/bash
root@d85077c98a69:/# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.2  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:ac:11:00:02  txqueuelen 0  (Ethernet)
        RX packets 14  bytes 1252 (1.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@d85077c98a69:/# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         172.17.0.1      0.0.0.0         UG    0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth0
#宿主机
root@d85077c98a69:/# exit
exit
[root@k8s-master ~]# ifconfig
cali6632e2eedff: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 3  bytes 125 (125.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 770 (770.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cali7b6489f2f47: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 3  bytes 125 (125.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 770 (770.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calieaec58fb34e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 1000  (Ethernet)
        RX packets 3  bytes 125 (125.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 770 (770.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:5fff:fe05:698c  prefixlen 64  scopeid 0x20<link>
        ether 02:42:5f:05:69:8c  txqueuelen 0  (Ethernet)
        RX packets 3  bytes 125 (125.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 770 (770.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.117.207  netmask 255.255.255.0  broadcast 192.168.117.255
        inet6 fe80::20c:29ff:fe96:278c  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:96:27:8c  txqueuelen 1000  (Ethernet)
        RX packets 554  bytes 64561 (63.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 596  bytes 65850 (64.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 49719  bytes 16290594 (15.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 49719  bytes 16290594 (15.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth6881202: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::408f:cdff:fe98:623a  prefixlen 64  scopeid 0x20<link>
        ether 42:8f:cd:98:62:3a  txqueuelen 0  (Ethernet)
        RX packets 3  bytes 167 (167.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18  bytes 1566 (1.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@k8s-master ~]#
[root@k8s-master ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.02425f05698c       no              veth6881202
[root@k8s-master ~]#

​ 这就使得 Veth Pair 常常被用作连接不同 Network Namespace 的“网线”。

​ 我们启动了一个叫作 nginx-1 的容器

​ 这个容器里有一张叫作 eth0 的网卡,它正是一个 Veth Pair 设备在容器里的这一端。

​ 通过 route 命令查看 nginx-1 容器的路由表,我们可以看到,这个 eth0 网卡是这个容器里的默认路由设备;所有对 172.17.0.0/16 网段的请求,也会被交给 eth0 来处理(第二条 172.17.0.0 路由规则)。

​ 通过宿主机 ifconfig 命令的输出,你可以看到,nginx-1 容器对应的 Veth Pair 设备,在宿主机上是一张虚拟网卡。它的名字叫作veth6881202

并且,通过 brctl show 的输出,你可以看到这张网卡被“插”在了 docker0 上。

如果我们再在这台宿主机上启动另一个 Docker 容器,比如 nginx-2

[root@k8s-master ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.02425f05698c       no              veth6881202
[root@k8s-master ~]# docker run -d --name nginx-2 nginx
e3b1a33fa82952f99bdf47e1451d05d83a9686cb006798744d2e593f02cf65c8
[root@k8s-master ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.02425f05698c       no              veth40408f3
                                                        veth6881202
[root@k8s-master ~]#

查看容器ip

[root@k8s-master ~]#
[root@k8s-master ~]# docker inspect nginx-1
[
    {
        "Id": "d85077c98a69846efe9bf17c4b1b4efb2152ec2078f5de483edc524c674eed76",
        "Created": "2025-02-16T06:21:15.681636573Z",
        "Path": "/docker-entrypoint.sh",
----------
    
                   "Links": null,
                    "Aliases": null,
                    "MacAddress": "02:42:ac:11:00:02",
                    "DriverOpts": null,
                    "NetworkID": "5ce1ccec1789844b6a4712acd0c8d6f0ef9fba840c00f53be667a0dd6fbae39c",
                    "EndpointID": "786e7d287ca79fda20dc3895bb64b9830a99f1989538fd503e9f877e4ad574f3",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "DNSNames": null
                }
            }
        }
    }
]

ip为"IPAddress": "172.17.0.2",

进入nginx-2来ping nginx-1(curl也行)

[root@k8s-master ~]# docker exec -it nginx-2 /bin/bash
root@e3b1a33fa829:/# ping 172.17.0.2
bash: ping: command not found
root@e3b1a33fa829:/# yum -y install ping
bash: yum: command not found
root@e3b1a33fa829:/# apt-get install -y iputils-ping
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package iputils-ping
root@e3b1a33fa829:/# curl http://172.17.0.2
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@e3b1a33fa829:/#

​ 当你在 nginx-1 容器里访问 nginx-2 容器的 IP 地址(比如 ping 172.17.0.3)的时候,这个目的 IP 地址会匹配到 nginx-1 容器里的第二条路由规则。可以看到,这条路由规则的网关(Gateway)是 0.0.0.0,这就意味着这是一条直连规则,即:凡是匹配到这条规则的 IP 包,应该经过本机的 eth0 网卡,通过二层网络直接发往目的主机。

​ 这个 eth0 网卡,是一个 Veth Pair,它的一端在这个 nginx-1 容器的 Network Namespace 里,而另一端则位于宿主机上(Host Namespace),并且被“插”在了宿主机的 docker0 网桥上。

​ 一旦一张虚拟网卡被“插”在网桥上,它就会变成该网桥的“从设备”。从设备会被“剥夺”调用网络协议栈处理数据包的资格,从而“降级”成为网桥上的一个端口。而这个端口唯一的作用,就是接收流入的数据包,然后把这些数据包的“生杀大权”(比如转发或者丢弃),全部交给对应的网桥。

​ 在收到这些 ARP 请求之后,docker0 网桥就会扮演二层交换机的角色,把 ARP 广播转发到其他被“插”在 docker0 上的虚拟网卡上。这样,同样连接在 docker0 上的 nginx-2 容器的网络协议栈就会收到这个 ARP 请求,从而将 172.17.0.3 所对应的 MAC 地址回复给 nginx-1 容器。

​ 有了这个目的 MAC 地址,nginx-1 容器的 eth0 网卡就可以将数据包发出去。

被限制在 Network Namespace 里的容器进程,实际上是通过 Veth Pair 设备 + 宿主机网桥的方式,实现了跟同其他容器的数据交换。

当一个容器试图连接到另外一个宿主机时,比如:ping 10.168.0.3,它发出的请求数据包,首先经过 docker0 网桥出现在宿主机上。然后根据宿主机的路由表里的直连路由规则(10.168.0.0/24 via eth0)),对 10.168.0.3 的访问请求就会交给宿主机的 eth0 处理。

​ 这个数据包就会经宿主机的 eth0 网卡转发到宿主机网络上,最终到达 10.168.0.3 对应的宿主机上。当然,这个过程的实现要求这两台宿主机本身是连通的

当你遇到容器连不通“外网”的时候,你都应该先试试 docker0 网桥能不能 ping 通,然后查看一下跟 docker0 和 Veth Pair 设备相关的 iptables 规则是不是有异常,往往就能够找到问题的答案了。

veth pair: 虚拟1 - docker0 - 虚拟2,每个上面都有一个地址,虚拟1,2不需要解析包,网桥docker0来解析,并做转发操作

“跨主通信”问题

​ 如果在另外一台宿主机(比如:10.168.0.3)上,也有一个 Docker 容器。那么,我们的 nginx-1 容器又该如何访问它呢?

​ 在 Docker 的默认配置下,一台宿主机上的 docker0 网桥,和其他宿主机上的 docker0 网桥,没有任何关联,它们互相之间也没办法连通。所以,连接在这些网桥上的容器,自然也没办法进行通信了。

​ 如果我们通过软件的方式,创建一个整个集群“公用”的网桥,然后把集群里的所有容器都连接到这个网桥上,不就可以相互通信了吗?

​ 构建这种容器网络的核心在于:我们需要在已有的宿主机网络上,再通过软件构建一个覆盖在已有宿主机网络之上的、可以把所有容器连通在一起的虚拟网络。所以,这种技术就被称为:Overlay Network(覆盖网络)。

​ Overlay Network 本身,可以由每台宿主机上的一个“特殊网桥”共同组成。比如,当 Node 1 上的 Container 1 要访问 Node 2 上的 Container 3 的时候,Node 1 上的“特殊网桥”在收到数据包之后,能够通过某种方式,把数据包发送到正确的宿主机,比如 Node 2 上。而 Node 2 上的“特殊网桥”在收到数据包后,也能够通过某种方式,把数据包转发给正确的容器,比如 Container 3。

​ 甚至,每台宿主机上,都不需要有一个这种特殊的网桥,而仅仅通过某种方式配置宿主机的路由表,就能够把数据包转发到正确的宿主机上。

这里的关键在于,容器要想跟外界进行通信,它发出的 IP 包就必须从它的 Network Namespace 里出来,来到宿主机上。而解决这个问题的方法就是:为容器创建一个一端在容器里充当默认网卡、另一端在宿主机上的 Veth Pair 设备。

posted @   guixiang  阅读(19)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· spring官宣接入deepseek,真的太香了~
点击右上角即可分享
微信分享提示