Cilium Native Routing with KubeProxyReplacement 模式
Cilium Native Routing with KubeProxyReplacement 模式
一、环境信息
主机 | IP |
---|---|
ubuntu | 172.16.94.141 |
软件 | 版本 |
---|---|
docker | 26.1.4 |
helm | v3.15.0-rc.2 |
kind | 0.18.0 |
kubernetes | 1.23.4 |
ubuntu os | Ubuntu 20.04.6 LTS |
kernel | 5.11.5 内核升级文档 |
二、安装服务
kind
配置文件信息
root@kind:~# cat install.sh
#!/bin/bash
date
set -v
# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-kubeproxy-replacement --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
# kind 默认使用 rancher cni,cni 我们需要自己创建
disableDefaultCNI: true
# kind 安装 k8s 集群需要禁用 kube-proxy 安装,是 cilium 代替 kube-proxy 功能
kubeProxyMode: "none"
nodes:
- role: control-plane
- role: worker
- role: worker
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
endpoint = ["https://harbor.evescn.com"]
EOF
# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide
# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1
# Direct Routing Options(--set kubeProxyReplacement=strict --set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8")
helm install cilium cilium/cilium \
--set k8sServiceHost=$controller_node_ip \
--set k8sServicePort=6443 \
--version 1.13.0-rc5 \
--namespace kube-system \
--set debug.enabled=true \
--set debug.verbose=datapath \
--set monitorAggregation=none \
--set ipam.mode=cluster-pool \
--set cluster.name=cilium-kubeproxy-replacement \
--set kubeProxyReplacement=strict \
--set tunnel=disabled \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR="10.0.0.0/8"
# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium)
do
echo $i
docker cp /usr/bin/ping $i:/usr/bin/ping
docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done
--set
参数解释
-
--set kubeProxyReplacement=strict
- 含义: 启用 kube-proxy 替代功能,并以严格模式运行。
- 用途: Cilium 将完全替代 kube-proxy 实现服务负载均衡,提供更高效的流量转发和网络策略管理。
-
--set tunnel=disabled
- 含义: 禁用隧道模式。
- 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
-
--set autoDirectNodeRoutes=true
- 含义: 启用自动直接节点路由。
- 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
-
--set ipv4NativeRoutingCIDR="10.0.0.0/8"
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
10.0.0.0/8
。 - 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
- 安装
k8s
集群和cilium
服务
# ./install.sh
Creating cluster "cilium-kubeproxy-replacement" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-cilium-kubeproxy-replacement"
You can now use your cluster with:
kubectl cluster-info --context kind-cilium-kubeproxy-replacement
Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
- 查看安装的服务
root@kind:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-2jwcw 1/1 Running 0 4m15s
kube-system cilium-2xvsw 1/1 Running 0 4m15s
kube-system cilium-operator-dd757785c-q8rnw 1/1 Running 0 4m15s
kube-system cilium-operator-dd757785c-wtf4w 1/1 Running 0 4m15s
kube-system cilium-q4h4z 1/1 Running 0 4m15s
kube-system coredns-64897985d-2tmk6 1/1 Running 0 6m9s
kube-system coredns-64897985d-bjgfx 1/1 Running 0 6m10s
kube-system etcd-cilium-kubeproxy-replacement-control-plane 1/1 Running 0 6m25s
kube-system kube-apiserver-cilium-kubeproxy-replacement-control-plane 1/1 Running 0 6m27s
kube-system kube-controller-manager-cilium-kubeproxy-replacement-control-plane 1/1 Running 0 6m25s
kube-system kube-scheduler-cilium-kubeproxy-replacement-control-plane 1/1 Running 0 6m25s
local-path-storage local-path-provisioner-5ddd94ff66-k8d66 1/1 Running 0 6m10s
查看
Pod
服务信息,发现没有kube-proxy
服务,因为我们设置了kubeProxyReplacement=strict
,那么cilium
将完全替代kube-proxy
实现服务负载均衡。并且在kind
安装k8s
集群的时候也需要设置禁用kube-proxy
安装kubeProxyMode: "none"
cilium
配置信息
root@kind:~# kubectl -n kube-system exec -it ds/cilium -- cilium status
KVStore: Ok Disabled
Kubernetes: Ok 1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Strict [eth0 172.18.0.4 (Direct Routing)]
Host firewall: Disabled
CNI Chaining: none
CNI Config file: CNI configuration file management disabled
Cilium: Ok 1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 5/254 allocated from 10.0.0.0/24,
IPv6 BIG TCP: Disabled
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status: 31/31 healthy
Proxy Status: OK, ip 10.0.0.84, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 9.15 Metrics: Disabled
Encryption: Disabled
Cluster health: 3/3 reachable (2024-06-26T09:52:50Z)
KubeProxyReplacement: Strict [eth0 172.18.0.3 (Direct Routing)]
- Cilium 完全接管所有 kube-proxy 功能,包括服务负载均衡、NodePort 和其他网络策略管理。这种配置适用于你希望最大限度利用 Cilium 的高级网络功能,并完全替代 kube-proxy 的场景。此模式提供更高效的流量转发和更强大的网络策略管理。
Host Routing: Legacy
- 使用传统的主机路由。
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
- 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。
k8s
集群安装 Pod
测试网络
root@kind:~# cat cni.yaml
apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
labels:
app: cilium-with-replacement
name: cilium-with-replacement
spec:
#replicas: 1
selector:
matchLabels:
app: cilium-with-replacement
template:
metadata:
labels:
app: cilium-with-replacement
spec:
containers:
- image: harbor.dayuan1997.com/devops/nettool:0.9
name: nettoolbox
securityContext:
privileged: true
---
apiVersion: v1
kind: Service
metadata:
name: serversvc
spec:
type: NodePort
selector:
app: cilium-with-replacement
ports:
- name: cni
port: 80
targetPort: 80
nodePort: 32000
root@kind:~# kubectl apply -f cni.yaml
daemonset.apps/cilium-with-replacement created
service/serversvc created
root@kind:~# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
- 查看安装服务信息
root@kind:~# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-with-replacement-hbvkm 1/1 Running 0 62s 10.0.2.204 cilium-kubeproxy-replacement-worker2 <none> <none>
cilium-with-replacement-vhhzl 1/1 Running 0 62s 10.0.0.142 cilium-kubeproxy-replacement-control-plane <none> <none>
cilium-with-replacement-xtzwx 1/1 Running 0 62s 10.0.1.15 cilium-kubeproxy-replacement-worker <none> <none>
net 1/1 Running 0 2s 10.0.2.32 cilium-kubeproxy-replacement-worker2 <none> <none>
root@kind:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m
serversvc NodePort 10.96.18.68 <none> 80:32000/TCP 76s
三、测试网络
同节点 Pod
网络通讯
Pod
节点信息
## ip 信息
# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:38:09:24:7b:fd brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.2.32/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6038:9ff:fe24:7bfd/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
# kubectl exec -it net -- ip r s
default via 10.0.2.184 dev eth0 mtu 1500
10.0.2.184 dev eth0 scope link
查看 Pod
信息发现在 cilium
中主机的 IP
地址为 32
位掩码,意味着该 IP
地址是单个主机的唯一标识,而不是一个子网。这个主机访问其他 IP
均会走路由到达
Pod
节点所在Node
节点信息
# docker exec -it cilium-kubeproxy-replacement-worker2 bash
## ip 信息
root@cilium-kubeproxy-replacement-worker2:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether da:dc:10:5e:63:bd brd ff:ff:ff:ff:ff:ff
inet6 fe80::d8dc:10ff:fe5e:63bd/64 scope link
valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether de:18:2a:12:05:3d brd ff:ff:ff:ff:ff:ff
inet 10.0.2.184/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::dc18:2aff:fe12:53d/64 scope link
valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether c6:60:02:0a:62:2f brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::c460:2ff:fe0a:622f/64 scope link
valid_lft forever preferred_lft forever
13: lxc3d8b8c4b9039@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 1e:0e:95:4f:d1:32 brd ff:ff:ff:ff:ff:ff link-netns cni-4fde611b-ba69-ea8c-256a-2655cf743623
inet6 fe80::1c0e:95ff:fe4f:d132/64 scope link
valid_lft forever preferred_lft forever
15: lxc89eb07782005@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 8e:b6:63:bb:41:3e brd ff:ff:ff:ff:ff:ff link-netns cni-1d4bc46a-77cd-aa33-cd3e-5df5335d2f00
inet6 fe80::8cb6:63ff:febb:413e/64 scope link
valid_lft forever preferred_lft forever
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:2/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
root@cilium-kubeproxy-replacement-worker2:/# ip r s
default via 172.18.0.1 dev eth0
10.0.0.0/24 via 172.18.0.4 dev eth0
10.0.1.0/24 via 172.18.0.3 dev eth0
10.0.2.0/24 via 10.0.2.184 dev cilium_host src 10.0.2.184
10.0.2.184 dev cilium_host scope link
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
Pod
节点进行ping
包测试,查看宿主机路由信息,发现并在数据包会在通过10.0.2.0/24 via 10.0.2.184 dev cilium_host src 10.0.2.184
路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.2.204 -c 1
PING 10.0.2.204 (10.0.2.204): 56 data bytes
64 bytes from 10.0.2.204: seq=0 ttl=63 time=0.787 ms
--- 10.0.2.204 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.787/0.787/0.787 ms
Pod
节点eth0
网卡抓包
net~$ tcpdump -pne -i eth0
10:12:51.034229 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 47, seq 0, length 64
10:12:51.034733 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 47, seq 0, length 64
Node
节点cilium_host
网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
通过抓包发现,我们无法在 cilium_host
网卡上抓到数据包信息,但是同节点的 Pod
通讯正常,查看前面 Pod eth0
网卡抓包信息,发现下一条mac 地址为: 8e:b6:63:bb:41:3e
,对比 Node 节点信息,发现是 lxc89eb07782005
网卡 mac 地址,并且通过网卡 id 信息,可以确定他们互为 veth pair 网卡
Node
节点lxc89eb07782005
网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc89eb07782005
10:12:51.034229 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 47, seq 0, length 64
10:12:51.034732 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 47, seq 0, length 64
在 lxc89eb07782005
网卡抓包到了数据包传输信息,表示数据包送到了 Node
节点,但是却没有按照路由信息送到 cilium_host
网卡,实质上是 cilium
底层的代码实现了数据包劫持,当发送数据包是同节点时,还未走的查询 routing
路由表信息时,就完成了数据包的转发,送往了目的 Pod
节点的 veth pair
网卡
- 查看
10.0.2.204
Pod
主机信息
root@kind:~# kubectl exec -it cilium-with-replacement-hbvkm -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 76:51:a5:5f:2b:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.2.204/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::7451:a5ff:fe5f:2b1b/64 scope link
valid_lft forever preferred_lft forever
查看 eth0
信息,可以确定在 Pod
的 eth0
网卡在宿主机的 veth pair
网卡位 id = 13
的网卡,查看前面 Node
节点信息发现为: 13: lxc3d8b8c4b9039@if12
Node
节点lxc3d8b8c4b9039
网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc3d8b8c4b9039
10:14:14.584364 1e:0e:95:4f:d1:32 > 76:51:a5:5f:2b:1b, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 53, seq 0, length 64
10:14:14.584377 76:51:a5:5f:2b:1b > 1e:0e:95:4f:d1:32, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 53, seq 0, length 64
在此网卡上抓起到了数据包信息,目的 mac
76:51:a5:5f:2b:1b
为 10.0.2.204
Pod
主机 eth0
网卡 mac
信息,即此数据包时送往目标 IP
的数据包信息
总结:同节点
Pod
通讯,数据包通过veth pair
送往Node
节点后, Node 节点上运行的cilium
代码实现了数据包劫持,当发送数据包是同节点时,还未走到查询Node
节点routing
路由表信息时,就完成了数据包的转发,送往了目的Pod
节点的veth pair
网卡
不同节点 Pod
网络通讯
-
Pod
节点信息,查看前面: 同节点Pod
网络通讯 -
Pod
节点所在Node
节点信息,查看前面: 同节点Pod
网络通讯 -
Pod
节点进行ping
包测试,查看宿主机路由信息,发现并在数据包会在通过10.0.1.0/24 via 172.18.0.3 dev eth0
路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.1.15 -c 1
PING 10.0.1.15 (10.0.1.15): 56 data bytes
64 bytes from 10.0.1.15: seq=0 ttl=60 time=1.153 ms
--- 10.0.1.15 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.153/1.153/1.153 ms
Node
节点cilium-kubeproxy-replacement-worker2
lxc89eb07782005
网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc89eb07782005
10:40:08.611228 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 82, seq 0, length 64
10:40:08.611928 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 82, seq 0, length 64
Pod
节点数据出来后,还是先到达了 veth pair
网卡。
Node
节点cilium-kubeproxy-replacement-worker2
eth0
网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i eth0 icmp
10:40:18.228122 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 88, seq 0, length 64
10:40:18.228499 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 88, seq 0, length 64
查看 eth0
网卡抓包信息,发现数据包下一跳 mac 02:42:ac:12:00:03
,按照之前的路由信息,分析此地址应该为 172.18.0.3
节点 eth0
mac
地址
查看 cilium-kubeproxy-replacement-worker2
节点 mac
信息
root@cilium-kubeproxy-replacement-worker2:/#arp -n
Address HWtype HWaddress Flags Mask Iface
172.18.0.3 ether 02:42:ac:12:00:03 C eth0
172.18.0.1 ether 02:42:2f:fe:43:35 C eth0
172.18.0.4 ether 02:42:ac:12:00:04 C eth0
Node
节点cilium-kubeproxy-replacement-worker
eth0
网卡抓包
root@cilium-kubeproxy-replacement-worker:/# tcpdump -pne -i eth0
10:43:02.648720 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 100, seq 0, length 64
10:43:02.649112 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 100, seq 0, length 64
查看 eth0
网卡抓包信息,数据包源 mac 02:42:ac:12:00:02
为 172.18.0.2
eth0
mac 地址,目的 mac 02:42:ac:12:00:03
,为本机 eth0
mac
地址
查看 cilium-kubeproxy-replacement-worker
节点 ip
信息
root@cilium-kubeproxy-replacement-worker:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 66:a1:f4:d3:f1:ab brd ff:ff:ff:ff:ff:ff
inet6 fe80::64a1:f4ff:fed3:f1ab/64 scope link
valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 02:89:4e:78:94:3c brd ff:ff:ff:ff:ff:ff
inet 10.0.1.152/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::89:4eff:fe78:943c/64 scope link
valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ea:d9:f2:a9:ae:5a brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::e8d9:f2ff:fea9:ae5a/64 scope link
valid_lft forever preferred_lft forever
15: lxcd0c238daf9fe@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether be:06:69:82:a0:bc brd ff:ff:ff:ff:ff:ff link-netns cni-c70b15f3-69c3-a69f-969f-91a1f4b4686a
inet6 fe80::bc06:69ff:fe82:a0bc/64 scope link
valid_lft forever preferred_lft forever
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::3/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:3/64 scope link
valid_lft forever preferred_lft forever
查看 cilium-kubeproxy-replacement-worker22
节点 route
信息
## 路由信息
root@cilium-kubeproxy-replacement-worker22:/# ip r s
default via 172.18.0.1 dev eth0
10.0.0.0/24 via 172.18.0.4 dev eth0
10.0.1.0/24 via 10.0.1.152 dev cilium_host src 10.0.1.152
10.0.1.152 dev cilium_host scope link
10.0.2.0/24 via 172.18.0.2 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
查看路由信息,数据包目的地址 10.0.0.53
,会在通过 10.0.1.0/24 via 10.0.1.152 dev cilium_host src 10.0.1.152
路由信息转发,在 cilium_host
网卡抓包
root@cilium-kubeproxy-replacement-worker22:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
通过抓包发现,我们无法在 cilium_host
网卡上抓到数据包信息。类似同节点的 Pod
通讯情况,也无法在 cilium_host
网卡上抓包,但是通讯正常。
- 目标
Pod
eth0
网卡抓包
cilium-with-kubeproxy-zxtpj~$ tcpdump -pne -i eth0
10:45:19.070446 be:06:69:82:a0:bc > 6a:25:e6:a2:a8:ec, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 107, seq 0, length 64
10:45:19.070458 6a:25:e6:a2:a8:ec > be:06:69:82:a0:bc, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 107, seq 0, length 64
- 目标
Pod
ip
信息
cilium-with-replacement-xtzwx~$ ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:25:e6:a2:a8:ec brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.15/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6825:e6ff:fea2:a8ec/64 scope link
valid_lft forever preferred_lft forever
查看数据包信息中的源 mac 3a:44:a3:48:94:e1
,发是 cilium-kubeproxy-replacement-worker
节点 lxcd0c238daf9fe
网卡 mac
地址。实质上还是 cilium
底层的代码实现了数据包劫持,当发送数据包是 Node
节点时,还未走到查询 Node
节点 routing
路由表信息时,就完成了数据包的转发,送往了目的 Pod
节点的 veth pair
网卡。lxcd0c238daf9fe
网卡和目标 Pod
eth0
网卡互为 veth pair
网卡
Service
网络通讯
- 查看
Service
信息
root@kind:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 63m
serversvc NodePort 10.96.18.68 <none> 80:32000/TCP 42m
net
服务上请求Pod
所在Node
节点32000
端口
root@kind:~# kubectl exec -ti net -- curl 172.18.0.2:32000
PodName: cilium-with-replacement-xtzwx | PodIP: eth0 10.0.1.15/32
并在 net
服务 eth0
网卡 抓包查看
net~$ tcpdump -pne -i eth0
10:49:06.504713 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 74: 10.0.2.32.39204 > 10.0.1.15.80: Flags [S], seq 3323102457, win 64240, options [mss 1460,sackOK,TS val 3254307702 ecr 0,nop,wscale 7], length 0
10:49:06.505340 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 74: 10.0.1.15.80 > 10.0.2.32.39204: Flags [S.], seq 292579920, ack 3323102458, win 65160, options [mss 1460,sackOK,TS val 554604858 ecr 3254307702,nop,wscale 7], length 0
10:49:06.505351 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 3254307702 ecr 554604858], length 0
10:49:06.506761 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 146: 10.0.2.32.39204 > 10.0.1.15.80: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 3254307704 ecr 554604858], length 80: HTTP: GET / HTTP/1.1
10:49:06.507358 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 66: 10.0.1.15.80 > 10.0.2.32.39204: Flags [.], ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 0
10:49:06.507518 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 302: 10.0.1.15.80 > 10.0.2.32.39204: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 236: HTTP: HTTP/1.1 200 OK
10:49:06.507866 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 132: 10.0.1.15.80 > 10.0.2.32.39204: Flags [P.], seq 237:303, ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 66: HTTP
10:49:06.508241 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 303, win 500, options [nop,nop,TS val 3254307705 ecr 554604860], length 0
10:49:06.510714 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [F.], seq 81, ack 303, win 501, options [nop,nop,TS val 3254307708 ecr 554604860], length 0
10:49:06.511460 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 66: 10.0.1.15.80 > 10.0.2.32.39204: Flags [F.], seq 303, ack 82, win 509, options [nop,nop,TS val 554604864 ecr 3254307708], length 0
10:49:06.511469 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 304, win 501, options [nop,nop,TS val 3254307708 ecr 554604864], length 0
抓包数据显示, net
服务使用 36788
端口和 10.0.1.15
80
端口进行 tcp
通讯。
* `KubeProxyReplacement: Strict [eth0 172.18.0.3 (Direct Routing)]`
* Cilium 完全接管所有 kube-proxy 功能,包括服务负载均衡、NodePort 和其他网络策略管理。这种配置适用于你希望最大限度利用 Cilium 的高级网络功能,并完全替代 kube-proxy 的场景。此模式提供更高效的流量转发和更强大的网络策略管理。
cilium
配置 KubeProxyReplacement: Strict [eth0 172.18.0.3 (Direct Routing)]
,通过配置信息确定 cilium
接管 kube-proxy
的功能,使用 cilium
实现 service
转发。我们可以先检查下默认 kube-proxy
的 conntrack
信息和 iptables
信息
- 先检查下
conntrack
信息,发现没有链路信息
root@cilium-kubeproxy-replacement-worker2:/# conntrack -L | grep 32000
## 没有数据信息
iptables
信息,也没有iptables
规则信息
root@cilium-kubeproxy-replacement-worker2:/# iptables-save | grep 32000
## 没有数据信息
那么 cilium
是如何查询 service
信息,并返回后端 Pod
ip 地址给请求方的?其实 cilium
把数据保存在自身内部,使用 cilium
子命令可以查询到 service
信息
cilium
查询service
信息
root@kind:~# kubectl -n kube-system exec cilium-2xvsw -- cilium service list
ID Frontend Service Type Backend
1 10.96.0.1:443 ClusterIP 1 => 172.18.0.4:6443 (active)
2 10.96.0.10:53 ClusterIP 1 => 10.0.0.221:53 (active)
2 => 10.0.0.200:53 (active)
3 10.96.0.10:9153 ClusterIP 1 => 10.0.0.221:9153 (active)
2 => 10.0.0.200:9153 (active)
4 10.96.134.23:443 ClusterIP 1 => 172.18.0.2:4244 (active)
11 10.96.18.68:80 ClusterIP 1 => 10.0.1.15:80 (active)
2 => 10.0.2.204:80 (active)
3 => 10.0.0.142:80 (active)
12 172.18.0.2:32000 NodePort 1 => 10.0.1.15:80 (active)
2 => 10.0.2.204:80 (active)
3 => 10.0.0.142:80 (active)
13 0.0.0.0:32000 NodePort 1 => 10.0.1.15:80 (active)
2 => 10.0.2.204:80 (active)
3 => 10.0.0.142:80 (active)
查看上面的 service
信息得到, 172.18.0.2:32000
后端有 3
个 ip
地址信息,并且后端端口为 80
, cilium
劫持到 Pod
需要访问 service
信息,即会查询该 service
对应的后端 Pod
地址和端口返回给客户端,让客户端使用此地址发起 http
请求