Cilium Native Routing with kubeProxy 模式
Cilium Native Routing with kubeProxy 模式
一、环境信息
主机 | IP |
---|---|
ubuntu | 172.16.94.141 |
软件 | 版本 |
---|---|
docker | 26.1.4 |
helm | v3.15.0-rc.2 |
kind | 0.18.0 |
kubernetes | 1.23.4 |
ubuntu os | Ubuntu 20.04.6 LTS |
kernel | 5.11.5 内核升级文档 |
内核升级
# wget https://raw.githubusercontent.com/pimlie/ubuntu-mainline-kernel.sh/master/ubuntu-mainline-kernel.sh
# sudo install ubuntu-mainline-kernel.sh /usr/local/bin/
# sudo ubuntu-mainline-kernel.sh -i 5.11.5
## 安装指定版本:sudo ubuntu-mainline-kernel.sh -i 内核版本号
# sudo reboot
# uname -r
二、安装服务
kind
配置文件信息
$ cat install.sh
#!/bin/bash
date
set -v
# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-kubeproxy --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
# kind 默认使用 rancher cni,cni 我们需要自己创建
disableDefaultCNI: true
#kubeProxyMode: "none" # Enable KubeProxy
nodes:
- role: control-plane
- role: worker
- role: worker
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
endpoint = ["https://harbor.evescn.com"]
EOF
# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide
# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1
# Direct Routing Options(--set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8")
helm install cilium cilium/cilium \
--set k8sServiceHost=$controller_node_ip \
--set k8sServicePort=6443 \
--version 1.13.0-rc5 \
--namespace kube-system \
--set debug.enabled=true \
--set debug.verbose=datapath \
--set monitorAggregation=none \
--set ipam.mode=cluster-pool \
--set cluster.name=cilium-kubeproxy \
--set tunnel=disabled \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR="10.0.0.0/8"
# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium)
do
echo $i
docker cp /usr/bin/ping $i:/usr/bin/ping
docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done
--set
参数解释
-
--set tunnel=disabled
- 含义: 禁用隧道模式。
- 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
-
--set autoDirectNodeRoutes=true
- 含义: 启用自动直接节点路由。
- 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
-
--set ipv4NativeRoutingCIDR="10.0.0.0/8"
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
10.0.0.0/8
。 - 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
- 安装
k8s
集群和cilium
服务
# ./install.sh
Creating cluster "cilium-kubeproxy" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-cilium-kubeproxy"
You can now use your cluster with:
kubectl cluster-info --context kind-cilium-kubeproxy
Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
- 查看安装的服务
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-47qhb 1/1 Running 0 9m15s
kube-system cilium-fk2nr 1/1 Running 0 9m15s
kube-system cilium-nmxnh 1/1 Running 0 9m15s
kube-system cilium-operator-dd757785c-k47m5 1/1 Running 0 9m15s
kube-system cilium-operator-dd757785c-s7jk9 1/1 Running 0 9m15s
kube-system coredns-64897985d-l7q5n 1/1 Running 0 10m
kube-system coredns-64897985d-ljwh4 1/1 Running 0 10m
kube-system etcd-cilium-kubeproxy-control-plane 1/1 Running 0 10m
kube-system kube-apiserver-cilium-kubeproxy-control-plane 1/1 Running 0 10m
kube-system kube-controller-manager-cilium-kubeproxy-control-plane 1/1 Running 0 10m
kube-system kube-proxy-hdf27 1/1 Running 0 10m
kube-system kube-proxy-jb95q 1/1 Running 0 10m
kube-system kube-proxy-v7pqb 1/1 Running 0 10m
kube-system kube-scheduler-cilium-kubeproxy-control-plane 1/1 Running 0 10m
local-path-storage local-path-provisioner-5ddd94ff66-sv7l4 1/1 Running 0 10m
cilium
配置信息
# kubectl -n kube-system exec -it ds/cilium -- cilium status
KVStore: Ok Disabled
Kubernetes: Ok 1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Disabled
Host firewall: Disabled
CNI Chaining: none
CNI Config file: CNI configuration file management disabled
Cilium: Ok 1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 6/254 allocated from 10.0.0.0/24,
IPv6 BIG TCP: Disabled
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status: 34/34 healthy
Proxy Status: OK, ip 10.0.0.63, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 10.11 Metrics: Disabled
Encryption: Disabled
Cluster health: 3/3 reachable (2024-06-25T08:37:57Z)
KubeProxyReplacement: Disabled
- kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
Host Routing: Legacy
- 使用传统的主机路由。
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
- 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。
k8s
集群安装 Pod
测试网络
# cat cni.yaml
apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
labels:
app: cilium-with-kubeproxy
name: cilium-with-kubeproxy
spec:
#replicas: 1
selector:
matchLabels:
app: cilium-with-kubeproxy
template:
metadata:
labels:
app: cilium-with-kubeproxy
spec:
containers:
- image: harbor.dayuan1997.com/devops/nettool:0.9
name: nettoolbox
securityContext:
privileged: true
---
apiVersion: v1
kind: Service
metadata:
name: serversvc
spec:
type: NodePort
selector:
app: cilium-with-kubeproxy
ports:
- name: cni
port: 80
targetPort: 80
nodePort: 32000
# kubectl apply -f cni.yaml
daemonset.apps/cilium-with-kubeproxy created
service/serversvc created
# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
- 查看安装服务信息
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-with-kubeproxy-dqw5b 1/1 Running 0 109s 10.0.1.60 cilium-kubeproxy-control-plane <none> <none>
cilium-with-kubeproxy-jlx85 1/1 Running 0 109s 10.0.2.76 cilium-kubeproxy-worker <none> <none>
cilium-with-kubeproxy-zxtpj 1/1 Running 0 109s 10.0.0.53 cilium-kubeproxy-worker2 <none> <none>
net 1/1 Running 0 24s 10.0.2.153 cilium-kubeproxy-worker <none> <none>
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14m
serversvc NodePort 10.96.0.36 <none> 80:32000/TCP 2m4s
三、测试网络
同节点 Pod
网络通讯
Pod
节点信息
## ip 信息
# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:72:2c:1e:1a:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
# 子网掩码 32 位
inet 10.0.2.153/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6872:2cff:fe1e:1a82/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
# kubectl exec -it net -- ip r s
default via 10.0.2.199 dev eth0 mtu 1500
10.0.2.199 dev eth0 scope link
查看 Pod
信息发现在 cilium
中主机的 IP
地址为 32
位掩码,意味着该 IP
地址是单个主机的唯一标识,而不是一个子网。这个主机访问其他 IP
均会走路由到达
Pod
节点所在Node
节点信息
# docker exec -it cilium-kubeproxy-worker bash
## ip 信息
root@cilium-kubeproxy-worker:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether f2:46:e3:8c:bf:2d brd ff:ff:ff:ff:ff:ff
inet6 fe80::f046:e3ff:fe8c:bf2d/64 scope link
valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 4e:67:23:df:e2:0c brd ff:ff:ff:ff:ff:ff
inet 10.0.2.199/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::4c67:23ff:fedf:e20c/64 scope link
valid_lft forever preferred_lft forever
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:2/64 scope link
valid_lft forever preferred_lft forever
6: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2e:65:1c:bf:79:64 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::2c65:1cff:febf:7964/64 scope link
valid_lft forever preferred_lft forever
8: lxc6be7bdb14e12@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether f6:5a:f9:5b:1f:fd brd ff:ff:ff:ff:ff:ff link-netns cni-5d3ffe4d-51ee-28da-a9d2-a60afab1c45e
inet6 fe80::f45a:f9ff:fe5b:1ffd/64 scope link
valid_lft forever preferred_lft forever
12: lxcb6cad2eb0861@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fe:d3:e6:00:d9:6a brd ff:ff:ff:ff:ff:ff link-netns cni-f260ef4f-7cd4-a12f-8924-dc0caa5ff691
inet6 fe80::fcd3:e6ff:fe00:d96a/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
root@cilium-kubeproxy-worker:/# ip r s
default via 172.18.0.1 dev eth0
10.0.0.0/24 via 172.18.0.3 dev eth0
10.0.1.0/24 via 172.18.0.4 dev eth0
10.0.2.0/24 via 10.0.2.199 dev cilium_host src 10.0.2.199
10.0.2.199 dev cilium_host scope link
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
Pod
节点进行ping
包测试,查看宿主机路由信息,发现并在数据包会在通过10.0.2.0/24 via 10.0.2.199 dev cilium_host src 10.0.2.199
路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.2.76 -c 1
PING 10.0.2.76 (10.0.2.76): 56 data bytes
64 bytes from 10.0.2.76: seq=0 ttl=63 time=0.411 ms
--- 10.0.2.76 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.411/0.411/0.411 ms
Pod
节点eth0
网卡抓包
net~$ tcpdump -pne -i eth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078180 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078547 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64
Node
节点cilium_host
网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
通过抓包发现,我们无法在 cilium_host
网卡上抓到数据包信息,但是同节点的 Pod
通讯正常,查看前面 Pod eth0
网卡抓包信息,发现下一条mac 地址为: fe:d3:e6:00:d9:6a
,对比 Node 节点信息,发现是 lxcb6cad2eb0861
网卡 mac 地址,并且通过网卡 id 信息,可以确定他们互为 veth pair 网卡
Node
节点lxcb6cad2eb0861
网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxcb6cad2eb0861
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lxcb6cad2eb0861, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078183 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078546 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64
在 lxcb6cad2eb0861
网卡抓包到了数据包传输信息,表示数据包送到了 Node
节点,但是却没有按照路由信息送到 cilium_host
网卡,实质上是 cilium
底层的代码实现了数据包劫持,当发送数据包是同节点时,还未走的查询 routing
路由表信息时,就完成了数据包的转发,送往了目的 Pod
节点的 veth pair
网卡
- 查看
10.0.2.76
Pod
主机信息
# kubectl exec -it cilium-with-kubeproxy-jlx85 -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6e:58:19:c1:9c:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.2.76/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6c58:19ff:fec1:9cdf/64 scope link
valid_lft forever preferred_lft forever
查看 eth0
信息,可以确定在 Pod
的 eth0
网卡在宿主机的 veth pair
网卡位 id = 8
的网卡,查看前面 Node
节点信息发现为: 8: lxc6be7bdb14e12@if7
Node
节点lxc6be7bdb14e12
网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxc6be7bdb14e12
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lxc6be7bdb14e12, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078384 f6:5a:f9:5b:1f:fd > 6e:58:19:c1:9c:df, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078396 6e:58:19:c1:9c:df > f6:5a:f9:5b:1f:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64
在此网卡上抓起到了数据包信息,目的 mac
6e:58:19:c1:9c:df
为 10.0.2.76
Pod
主机 eth0
网卡 mac
信息,即此数据包时送往目标 IP
的数据包信息
总结:同节点
Pod
通讯,数据包通过veth pair
送往Node
节点后, Node 节点上运行的cilium
代码实现了数据包劫持,当发送数据包是同节点时,还未走到查询routing
路由表信息时,就完成了数据包的转发,送往了目的Pod
节点的veth pair
网卡
不同节点 Pod
网络通讯
-
Pod
节点信息,查看前面: 同节点Pod
网络通讯 -
Pod
节点所在Node
节点信息,查看前面: 同节点Pod
网络通讯 -
Pod
节点进行ping
包测试,查看宿主机路由信息,发现并在数据包会在通过10.0.0.0/24 via 172.18.0.3 dev eth0
路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.0.53 -c 1
PING 10.0.0.53 (10.0.0.53): 56 data bytes
64 bytes from 10.0.0.53: seq=0 ttl=63 time=0.411 ms
--- 10.0.0.53 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.411/0.411/0.411 ms
Node
节点cilium-kubeproxy-worker
lxcb6cad2eb0861
网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxcb6cad2eb0861
09:57:05.768885 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
09:57:05.769549 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64
Pod
节点数据出来后,还是先到达了 veth pair
网卡。
Node
节点cilium-kubeproxy-worker
eth0
网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i eth0 icmp
09:57:05.769062 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
09:57:05.769422 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64
查看 eth0
网卡抓包信息,发现数据包下一跳 mac 02:42:ac:12:00:03
,按照之前的路由信息,分析此地址应该为 172.18.0.3
节点 eth0
mac
地址
查看 cilium-kubeproxy-worker
节点 mac
信息
root@cilium-kubeproxy-worker:/# arp -n
Address HWtype HWaddress Flags Mask Iface
172.18.0.4 ether 02:42:ac:12:00:04 C eth0
172.18.0.3 ether 02:42:ac:12:00:03 C eth0
172.18.0.1 ether 02:42:2f:fe:43:35 C eth0
Node
节点cilium-kubeproxy-worker2
eth0
网卡抓包
root@cilium-kubeproxy-worker2:/# tcpdump -pne -i eth0
08:01:58.623752 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
08:01:58.624110 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64
查看 eth0
网卡抓包信息,数据包源 mac 02:42:ac:12:00:02
为 172.18.0.2
eth0
mac 地址,目的 mac 02:42:ac:12:00:03
,为本机 eth0
mac
地址
查看 cilium-kubeproxy-worker2
节点 ip
信息
root@cilium-kubeproxy-worker2:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 96:0e:53:42:98:b2 brd ff:ff:ff:ff:ff:ff
inet6 fe80::940e:53ff:fe42:98b2/64 scope link
valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ce:1e:f9:4f:4f:b1 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.99/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::cc1e:f9ff:fe4f:4fb1/64 scope link
valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0e:74:87:76:e7:93 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::c74:87ff:fe76:e793/64 scope link
valid_lft forever preferred_lft forever
7: lxc362604ac4ff4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 3a:44:a3:48:94:e1 brd ff:ff:ff:ff:ff:ff link-netns cni-901d361e-7ed3-e13f-0d1a-9beefa724748
inet6 fe80::3844:a3ff:fe48:94e1/64 scope link
valid_lft forever preferred_lft forever
13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::3/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:3/64 scope link
valid_lft forever preferred_lft forever
查看 cilium-kubeproxy-worker2
节点 route
信息
## 路由信息
root@cilium-kubeproxy-worker2:/# ip r s
default via 172.18.0.1 dev eth0
10.0.0.0/24 via 10.0.0.99 dev cilium_host src 10.0.0.99
10.0.0.99 dev cilium_host scope link
10.0.1.0/24 via 172.18.0.4 dev eth0
10.0.2.0/24 via 172.18.0.2 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
查看路由信息,数据包目的地址 10.0.0.53
,会在通过 10.0.0.0/24 via 10.0.0.99 dev cilium_host src 10.0.0.99
路由信息转发,在 cilium_host
网卡抓包
root@cilium-kubeproxy-worker2:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
通过抓包发现,我们无法在 cilium_host
网卡上抓到数据包信息。类似同节点的 Pod
通讯情况,也无法在 cilium_host
网卡上抓包,但是通讯正常。
- 目标
Pod
eth0
网卡抓包
cilium-with-kubeproxy-zxtpj~$ tcpdump -pne -i eth0
08:01:58.623944 3a:44:a3:48:94:e1 > 42:6c:de:59:dd:36, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
08:01:58.623981 42:6c:de:59:dd:36 > 3a:44:a3:48:94:e1, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64
查看数据包信息中的源 mac 3a:44:a3:48:94:e1
,发是 cilium-kubeproxy-worker2
节点 lxc362604ac4ff4
网卡 mac
地址。实质上还是 cilium
底层的代码实现了数据包劫持,当发送数据包是 Node
节点时,还未走到查询 Node
节点 routing
路由表信息时,就完成了数据包的转发,送往了目的 Pod
节点的 veth pair
网卡。lxc362604ac4ff4
网卡和目标 Pod
eth0
网卡互为 veth pair
网卡
Service
网络通讯
- 查看
Service
信息
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 138m
serversvc NodePort 10.96.142.173 <none> 80:32000/TCP 74m
net
服务上请求Pod
所在Node
节点32000
端口
# kubectl exec -ti net -- curl 172.18.0.2:32000
PodName: cilium-with-kubeproxy-dqw5b | PodIP: eth0 10.0.1.60/32
并在 net
服务 eth0
网卡 抓包查看
net~$ tcpdump -pne -i eth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:47:11.587710 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 74: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [S], seq 1116502993, win 64240, options [mss 1460,sackOK,TS val 3054730101 ecr 0,nop,wscale 7], length 0
08:47:11.588352 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 74: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [S.], seq 2686464703, ack 1116502994, win 65160, options [mss 1460,sackOK,TS val 584638172 ecr 3054730101,nop,wscale 7], length 0
08:47:11.588362 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 1, win 502, options [nop,nop,TS val 3054730101 ecr 584638172], length 0
08:47:11.589016 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 146: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 3054730102 ecr 584638172], length 80
08:47:11.589561 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 66: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [.], ack 81, win 509, options [nop,nop,TS val 584638173 ecr 3054730102], length 0
08:47:11.589780 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 302: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 584638173 ecr 3054730102], length 236
08:47:11.590091 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 131: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [P.], seq 237:302, ack 81, win 509, options [nop,nop,TS val 584638174 ecr 3054730102], length 65
08:47:11.590362 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 237, win 501, options [nop,nop,TS val 3054730103 ecr 584638173], length 0
08:47:11.590861 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 302, win 501, options [nop,nop,TS val 3054730104 ecr 584638174], length 0
08:47:11.592648 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [F.], seq 81, ack 302, win 501, options [nop,nop,TS val 3054730105 ecr 584638174], length 0
08:47:11.593241 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 66: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [F.], seq 302, ack 82, win 509, options [nop,nop,TS val 584638177 ecr 3054730105], length 0
08:47:11.593248 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 303, win 501, options [nop,nop,TS val 3054730106 ecr 584638177], length 0
抓包数据显示, net
服务使用 36788
端口和 172.18.0.2
32000
端口进行 tcp
通讯。
* `KubeProxyReplacement: Disabled`
* kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
cilium
配置 KubeProxyReplacement: Disabled
,通过配置信息确定 cilium
没有接管 kube-proxy
的功能。那么 kube-proxy
使用 iptables
或 ipvs
进行 service
转发,此处 kind
使用 iptables
,查看 conntrack
连接跟踪和 iptables
规则验证
conntrack
信息
root@cilium-kubeproxy-worker:/# conntrack -L | grep 32000
conntrack v1.4.6 (conntrack-tools): 38 flow entries have been shown.
tcp 6 91 TIME_WAIT src=10.0.2.153 dst=172.18.0.2 sport=59298 dport=32000 src=10.0.1.60 dst=172.18.0.2 sport=80 dport=56944 [ASSURED] mark=0 use=1
iptables
信息
root@cilium-kubeproxy-worker:/# iptables-save | grep 32000
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-SVC-CU7F3MNN62CF4ANP
-A KUBE-SVC-CU7F3MNN62CF4ANP -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-MARK-MASQ