Cilium Native Routing with KubeProxyReplacement 模式

Cilium Native Routing with KubeProxyReplacement 模式

一、环境信息

主机 IP
ubuntu 172.16.94.141
软件 版本
docker 26.1.4
helm v3.15.0-rc.2
kind 0.18.0
kubernetes 1.23.4
ubuntu os Ubuntu 20.04.6 LTS
kernel 5.11.5 内核升级文档

二、安装服务

kind 配置文件信息

root@kind:~# cat install.sh

#!/bin/bash
date
set -v

# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-kubeproxy-replacement --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  # kind 默认使用 rancher cni,cni 我们需要自己创建
  disableDefaultCNI: true
  # kind 安装 k8s 集群需要禁用 kube-proxy 安装,是 cilium 代替 kube-proxy 功能
  kubeProxyMode: "none"

nodes:
  - role: control-plane
  - role: worker
  - role: worker

containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
    endpoint = ["https://harbor.evescn.com"]
EOF

# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide

# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1

# Direct Routing Options(--set kubeProxyReplacement=strict --set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8")
helm install cilium cilium/cilium \
    --set k8sServiceHost=$controller_node_ip \
    --set k8sServicePort=6443 \
    --version 1.13.0-rc5 \
    --namespace kube-system \
    --set debug.enabled=true \
    --set debug.verbose=datapath \
    --set monitorAggregation=none \
    --set ipam.mode=cluster-pool \
    --set cluster.name=cilium-kubeproxy-replacement \
    --set kubeProxyReplacement=strict \
    --set tunnel=disabled \
    --set autoDirectNodeRoutes=true \
    --set ipv4NativeRoutingCIDR="10.0.0.0/8"

# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium) 
do
    echo $i
    docker cp /usr/bin/ping $i:/usr/bin/ping
    docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
    docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done

--set 参数解释

  1. --set kubeProxyReplacement=strict

    • 含义: 启用 kube-proxy 替代功能,并以严格模式运行。
    • 用途: Cilium 将完全替代 kube-proxy 实现服务负载均衡,提供更高效的流量转发和网络策略管理。
  2. --set tunnel=disabled

    • 含义: 禁用隧道模式。
    • 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
  3. --set autoDirectNodeRoutes=true

    • 含义: 启用自动直接节点路由。
    • 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
  4. --set ipv4NativeRoutingCIDR="10.0.0.0/8"

    • 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是 10.0.0.0/8
    • 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
  • 安装 k8s 集群和 cilium 服务
# ./install.sh

Creating cluster "cilium-kubeproxy-replacement" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✓ Preparing nodes 📦 📦 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing StorageClass 💾 
 ✓ Joining worker nodes 🚜 
Set kubectl context to "kind-cilium-kubeproxy-replacement"
You can now use your cluster with:

kubectl cluster-info --context kind-cilium-kubeproxy-replacement

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/
  • 查看安装的服务
root@kind:~# kubectl get pods -A

NAMESPACE            NAME                                                                 READY   STATUS    RESTARTS   AGE
kube-system          cilium-2jwcw                                                         1/1     Running   0          4m15s
kube-system          cilium-2xvsw                                                         1/1     Running   0          4m15s
kube-system          cilium-operator-dd757785c-q8rnw                                      1/1     Running   0          4m15s
kube-system          cilium-operator-dd757785c-wtf4w                                      1/1     Running   0          4m15s
kube-system          cilium-q4h4z                                                         1/1     Running   0          4m15s
kube-system          coredns-64897985d-2tmk6                                              1/1     Running   0          6m9s
kube-system          coredns-64897985d-bjgfx                                              1/1     Running   0          6m10s
kube-system          etcd-cilium-kubeproxy-replacement-control-plane                      1/1     Running   0          6m25s
kube-system          kube-apiserver-cilium-kubeproxy-replacement-control-plane            1/1     Running   0          6m27s
kube-system          kube-controller-manager-cilium-kubeproxy-replacement-control-plane   1/1     Running   0          6m25s
kube-system          kube-scheduler-cilium-kubeproxy-replacement-control-plane            1/1     Running   0          6m25s
local-path-storage   local-path-provisioner-5ddd94ff66-k8d66                              1/1     Running   0          6m10s

查看 Pod 服务信息,发现没有 kube-proxy 服务,因为我们设置了 kubeProxyReplacement=strict ,那么 cilium 将完全替代 kube-proxy 实现服务负载均衡。并且在 kind 安装 k8s 集群的时候也需要设置禁用 kube-proxy 安装 kubeProxyMode: "none"

cilium 配置信息

root@kind:~# kubectl -n kube-system exec -it ds/cilium -- cilium status

KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Strict   [eth0 172.18.0.4 (Direct Routing)]
Host firewall:           Disabled
CNI Chaining:            none
CNI Config file:         CNI configuration file management disabled
Cilium:                  Ok   1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor:             Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 5/254 allocated from 10.0.0.0/24, 
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       31/31 healthy
Proxy Status:            OK, ip 10.0.0.84, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 9.15   Metrics: Disabled
Encryption:              Disabled
Cluster health:          3/3 reachable   (2024-06-26T09:52:50Z)
  • KubeProxyReplacement: Strict [eth0 172.18.0.3 (Direct Routing)]
    • Cilium 完全接管所有 kube-proxy 功能,包括服务负载均衡、NodePort 和其他网络策略管理。这种配置适用于你希望最大限度利用 Cilium 的高级网络功能,并完全替代 kube-proxy 的场景。此模式提供更高效的流量转发和更强大的网络策略管理。
  • Host Routing: Legacy
    • 使用传统的主机路由。
  • Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
    • 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。

k8s 集群安装 Pod 测试网络

root@kind:~# cat cni.yaml

apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
  labels:
    app: cilium-with-replacement
  name: cilium-with-replacement
spec:
  #replicas: 1
  selector:
    matchLabels:
      app: cilium-with-replacement
  template:
    metadata:
      labels:
        app: cilium-with-replacement
    spec:
      containers:
      - image: harbor.dayuan1997.com/devops/nettool:0.9
        name: nettoolbox
        securityContext:
          privileged: true

---
apiVersion: v1
kind: Service
metadata:
  name: serversvc
spec:
  type: NodePort
  selector:
    app: cilium-with-replacement
  ports:
  - name: cni
    port: 80
    targetPort: 80
    nodePort: 32000
root@kind:~# kubectl apply -f cni.yaml
daemonset.apps/cilium-with-replacement created
service/serversvc created

root@kind:~# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
  • 查看安装服务信息
root@kind:~# kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP           NODE                                         NOMINATED NODE   READINESS GATES
cilium-with-replacement-hbvkm   1/1     Running   0          62s   10.0.2.204   cilium-kubeproxy-replacement-worker2         <none>           <none>
cilium-with-replacement-vhhzl   1/1     Running   0          62s   10.0.0.142   cilium-kubeproxy-replacement-control-plane   <none>           <none>
cilium-with-replacement-xtzwx   1/1     Running   0          62s   10.0.1.15    cilium-kubeproxy-replacement-worker          <none>           <none>
net                             1/1     Running   0          2s    10.0.2.32    cilium-kubeproxy-replacement-worker2         <none>           <none>

root@kind:~# kubectl get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1     <none>        443/TCP        22m
serversvc    NodePort    10.96.18.68   <none>        80:32000/TCP   76s

三、测试网络

同节点 Pod 网络通讯

img

  • Pod 节点信息
## ip 信息
# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:38:09:24:7b:fd brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.2.32/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6038:9ff:fe24:7bfd/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
# kubectl exec -it net -- ip r s
default via 10.0.2.184 dev eth0 mtu 1500 
10.0.2.184 dev eth0 scope link

查看 Pod 信息发现在 cilium 中主机的 IP 地址为 32 位掩码,意味着该 IP 地址是单个主机的唯一标识,而不是一个子网。这个主机访问其他 IP 均会走路由到达

  • Pod 节点所在 Node 节点信息
# docker exec -it cilium-kubeproxy-replacement-worker2 bash

## ip 信息
root@cilium-kubeproxy-replacement-worker2:/# ip a l 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether da:dc:10:5e:63:bd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d8dc:10ff:fe5e:63bd/64 scope link 
       valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether de:18:2a:12:05:3d brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.184/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::dc18:2aff:fe12:53d/64 scope link 
       valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c6:60:02:0a:62:2f brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::c460:2ff:fe0a:622f/64 scope link 
       valid_lft forever preferred_lft forever
13: lxc3d8b8c4b9039@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 1e:0e:95:4f:d1:32 brd ff:ff:ff:ff:ff:ff link-netns cni-4fde611b-ba69-ea8c-256a-2655cf743623
    inet6 fe80::1c0e:95ff:fe4f:d132/64 scope link 
       valid_lft forever preferred_lft forever
15: lxc89eb07782005@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8e:b6:63:bb:41:3e brd ff:ff:ff:ff:ff:ff link-netns cni-1d4bc46a-77cd-aa33-cd3e-5df5335d2f00
    inet6 fe80::8cb6:63ff:febb:413e/64 scope link 
       valid_lft forever preferred_lft forever
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::2/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
root@cilium-kubeproxy-replacement-worker2:/# ip r s
default via 172.18.0.1 dev eth0 
10.0.0.0/24 via 172.18.0.4 dev eth0 
10.0.1.0/24 via 172.18.0.3 dev eth0 
10.0.2.0/24 via 10.0.2.184 dev cilium_host src 10.0.2.184 
10.0.2.184 dev cilium_host scope link 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2 
  • Pod 节点进行 ping 包测试,查看宿主机路由信息,发现并在数据包会在通过 10.0.2.0/24 via 10.0.2.184 dev cilium_host src 10.0.2.184 路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.2.204 -c 1
PING 10.0.2.204 (10.0.2.204): 56 data bytes
64 bytes from 10.0.2.204: seq=0 ttl=63 time=0.787 ms

--- 10.0.2.204 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.787/0.787/0.787 ms
  • Pod 节点 eth0 网卡抓包
net~$ tcpdump -pne -i eth0
10:12:51.034229 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 47, seq 0, length 64
10:12:51.034733 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 47, seq 0, length 64
  • Node 节点 cilium_host 网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i cilium_host    
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

通过抓包发现,我们无法在 cilium_host 网卡上抓到数据包信息,但是同节点的 Pod 通讯正常,查看前面 Pod eth0 网卡抓包信息,发现下一条mac 地址为: 8e:b6:63:bb:41:3e ,对比 Node 节点信息,发现是 lxc89eb07782005 网卡 mac 地址,并且通过网卡 id 信息,可以确定他们互为 veth pair 网卡

  • Node 节点 lxc89eb07782005 网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc89eb07782005
10:12:51.034229 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 47, seq 0, length 64
10:12:51.034732 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 47, seq 0, length 64

lxc89eb07782005 网卡抓包到了数据包传输信息,表示数据包送到了 Node 节点,但是却没有按照路由信息送到 cilium_host 网卡,实质上是 cilium 底层的代码实现了数据包劫持,当发送数据包是同节点时,还未走的查询 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡

  • 查看 10.0.2.204 Pod 主机信息
root@kind:~# kubectl exec -it cilium-with-replacement-hbvkm -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 76:51:a5:5f:2b:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.2.204/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::7451:a5ff:fe5f:2b1b/64 scope link 
       valid_lft forever preferred_lft forever

查看 eth0 信息,可以确定在 Podeth0 网卡在宿主机的 veth pair 网卡位 id = 13 的网卡,查看前面 Node 节点信息发现为: 13: lxc3d8b8c4b9039@if12

  • Node 节点 lxc3d8b8c4b9039 网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc3d8b8c4b9039
10:14:14.584364 1e:0e:95:4f:d1:32 > 76:51:a5:5f:2b:1b, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.2.204: ICMP echo request, id 53, seq 0, length 64
10:14:14.584377 76:51:a5:5f:2b:1b > 1e:0e:95:4f:d1:32, ethertype IPv4 (0x0800), length 98: 10.0.2.204 > 10.0.2.32: ICMP echo reply, id 53, seq 0, length 64

在此网卡上抓起到了数据包信息,目的 mac 76:51:a5:5f:2b:1b10.0.2.204 Pod 主机 eth0 网卡 mac 信息,即此数据包时送往目标 IP 的数据包信息

总结:同节点 Pod 通讯,数据包通过 veth pair 送往 Node 节点后, Node 节点上运行的 cilium 代码实现了数据包劫持,当发送数据包是同节点时,还未走到查询 Node 节点 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡

不同节点 Pod 网络通讯

img

  • Pod 节点信息,查看前面: 同节点 Pod 网络通讯

  • Pod 节点所在 Node 节点信息,查看前面: 同节点 Pod 网络通讯

  • Pod 节点进行 ping 包测试,查看宿主机路由信息,发现并在数据包会在通过 10.0.1.0/24 via 172.18.0.3 dev eth0 路由信息转发

root@kind:~# kubectl exec -it net -- ping 10.0.1.15 -c 1
PING 10.0.1.15 (10.0.1.15): 56 data bytes
64 bytes from 10.0.1.15: seq=0 ttl=60 time=1.153 ms

--- 10.0.1.15 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.153/1.153/1.153 ms
  • Node 节点 cilium-kubeproxy-replacement-worker2 lxc89eb07782005 网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i lxc89eb07782005
10:40:08.611228 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 82, seq 0, length 64
10:40:08.611928 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 82, seq 0, length 64

Pod 节点数据出来后,还是先到达了 veth pair 网卡。

  • Node 节点 cilium-kubeproxy-replacement-worker2 eth0 网卡抓包
root@cilium-kubeproxy-replacement-worker2:/# tcpdump -pne -i eth0 icmp
10:40:18.228122 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 88, seq 0, length 64
10:40:18.228499 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 88, seq 0, length 64

查看 eth0 网卡抓包信息,发现数据包下一跳 mac 02:42:ac:12:00:03 ,按照之前的路由信息,分析此地址应该为 172.18.0.3 节点 eth0 mac 地址

查看 cilium-kubeproxy-replacement-worker2 节点 mac 信息

root@cilium-kubeproxy-replacement-worker2:/#arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
172.18.0.3               ether   02:42:ac:12:00:03   C                     eth0
172.18.0.1               ether   02:42:2f:fe:43:35   C                     eth0
172.18.0.4               ether   02:42:ac:12:00:04   C                     eth0
  • Node 节点 cilium-kubeproxy-replacement-worker eth0 网卡抓包
root@cilium-kubeproxy-replacement-worker:/# tcpdump -pne -i eth0
10:43:02.648720 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 100, seq 0, length 64
10:43:02.649112 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 100, seq 0, length 64

查看 eth0 网卡抓包信息,数据包源 mac 02:42:ac:12:00:02172.18.0.2 eth0 mac 地址,目的 mac 02:42:ac:12:00:03 ,为本机 eth0 mac 地址

查看 cilium-kubeproxy-replacement-worker 节点 ip 信息

root@cilium-kubeproxy-replacement-worker:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 66:a1:f4:d3:f1:ab brd ff:ff:ff:ff:ff:ff
    inet6 fe80::64a1:f4ff:fed3:f1ab/64 scope link 
       valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:89:4e:78:94:3c brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.152/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::89:4eff:fe78:943c/64 scope link 
       valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ea:d9:f2:a9:ae:5a brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::e8d9:f2ff:fea9:ae5a/64 scope link 
       valid_lft forever preferred_lft forever
15: lxcd0c238daf9fe@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether be:06:69:82:a0:bc brd ff:ff:ff:ff:ff:ff link-netns cni-c70b15f3-69c3-a69f-969f-91a1f4b4686a
    inet6 fe80::bc06:69ff:fe82:a0bc/64 scope link 
       valid_lft forever preferred_lft forever
19: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link 
       valid_lft forever preferred_lft forever

查看 cilium-kubeproxy-replacement-worker22 节点 route 信息

## 路由信息
root@cilium-kubeproxy-replacement-worker22:/# ip r s
default via 172.18.0.1 dev eth0 
10.0.0.0/24 via 172.18.0.4 dev eth0 
10.0.1.0/24 via 10.0.1.152 dev cilium_host src 10.0.1.152 
10.0.1.152 dev cilium_host scope link 
10.0.2.0/24 via 172.18.0.2 dev eth0 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3 

查看路由信息,数据包目的地址 10.0.0.53,会在通过 10.0.1.0/24 via 10.0.1.152 dev cilium_host src 10.0.1.152 路由信息转发,在 cilium_host 网卡抓包

root@cilium-kubeproxy-replacement-worker22:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

通过抓包发现,我们无法在 cilium_host 网卡上抓到数据包信息。类似同节点的 Pod 通讯情况,也无法在 cilium_host 网卡上抓包,但是通讯正常。

  • 目标 Pod eth0 网卡抓包
cilium-with-kubeproxy-zxtpj~$ tcpdump -pne -i eth0
10:45:19.070446 be:06:69:82:a0:bc > 6a:25:e6:a2:a8:ec, ethertype IPv4 (0x0800), length 98: 10.0.2.32 > 10.0.1.15: ICMP echo request, id 107, seq 0, length 64
10:45:19.070458 6a:25:e6:a2:a8:ec > be:06:69:82:a0:bc, ethertype IPv4 (0x0800), length 98: 10.0.1.15 > 10.0.2.32: ICMP echo reply, id 107, seq 0, length 64
  • 目标 Pod ip 信息
cilium-with-replacement-xtzwx~$ ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:25:e6:a2:a8:ec brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.1.15/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6825:e6ff:fea2:a8ec/64 scope link 
       valid_lft forever preferred_lft forever

查看数据包信息中的源 mac 3a:44:a3:48:94:e1 ,发是 cilium-kubeproxy-replacement-worker 节点 lxcd0c238daf9fe 网卡 mac 地址。实质上还是 cilium 底层的代码实现了数据包劫持,当发送数据包是 Node 节点时,还未走到查询 Node 节点 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡。lxcd0c238daf9fe 网卡和目标 Pod eth0 网卡互为 veth pair 网卡

Service 网络通讯

  • 查看 Service 信息
root@kind:~# kubectl get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1     <none>        443/TCP        63m
serversvc    NodePort    10.96.18.68   <none>        80:32000/TCP   42m
  • net 服务上请求 Pod 所在 Node 节点 32000 端口
root@kind:~# kubectl exec -ti net -- curl 172.18.0.2:32000
PodName: cilium-with-replacement-xtzwx | PodIP: eth0 10.0.1.15/32

并在 net 服务 eth0 网卡 抓包查看

net~$ tcpdump -pne -i eth0
10:49:06.504713 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 74: 10.0.2.32.39204 > 10.0.1.15.80: Flags [S], seq 3323102457, win 64240, options [mss 1460,sackOK,TS val 3254307702 ecr 0,nop,wscale 7], length 0
10:49:06.505340 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 74: 10.0.1.15.80 > 10.0.2.32.39204: Flags [S.], seq 292579920, ack 3323102458, win 65160, options [mss 1460,sackOK,TS val 554604858 ecr 3254307702,nop,wscale 7], length 0
10:49:06.505351 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 1, win 502, options [nop,nop,TS val 3254307702 ecr 554604858], length 0
10:49:06.506761 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 146: 10.0.2.32.39204 > 10.0.1.15.80: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 3254307704 ecr 554604858], length 80: HTTP: GET / HTTP/1.1
10:49:06.507358 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 66: 10.0.1.15.80 > 10.0.2.32.39204: Flags [.], ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 0
10:49:06.507518 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 302: 10.0.1.15.80 > 10.0.2.32.39204: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 236: HTTP: HTTP/1.1 200 OK
10:49:06.507866 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 132: 10.0.1.15.80 > 10.0.2.32.39204: Flags [P.], seq 237:303, ack 81, win 509, options [nop,nop,TS val 554604860 ecr 3254307704], length 66: HTTP
10:49:06.508241 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 303, win 500, options [nop,nop,TS val 3254307705 ecr 554604860], length 0
10:49:06.510714 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [F.], seq 81, ack 303, win 501, options [nop,nop,TS val 3254307708 ecr 554604860], length 0
10:49:06.511460 8e:b6:63:bb:41:3e > 62:38:09:24:7b:fd, ethertype IPv4 (0x0800), length 66: 10.0.1.15.80 > 10.0.2.32.39204: Flags [F.], seq 303, ack 82, win 509, options [nop,nop,TS val 554604864 ecr 3254307708], length 0
10:49:06.511469 62:38:09:24:7b:fd > 8e:b6:63:bb:41:3e, ethertype IPv4 (0x0800), length 66: 10.0.2.32.39204 > 10.0.1.15.80: Flags [.], ack 304, win 501, options [nop,nop,TS val 3254307708 ecr 554604864], length 0

抓包数据显示, net 服务使用 36788 端口和 10.0.1.15 80 端口进行 tcp 通讯。

* `KubeProxyReplacement:    Strict   [eth0 172.18.0.3 (Direct Routing)]`
  * Cilium 完全接管所有 kube-proxy 功能,包括服务负载均衡、NodePort 和其他网络策略管理。这种配置适用于你希望最大限度利用 Cilium 的高级网络功能,并完全替代 kube-proxy 的场景。此模式提供更高效的流量转发和更强大的网络策略管理。

cilium 配置 KubeProxyReplacement: Strict [eth0 172.18.0.3 (Direct Routing)],通过配置信息确定 cilium 接管 kube-proxy 的功能,使用 cilium 实现 service 转发。我们可以先检查下默认 kube-proxyconntrack 信息和 iptables 信息

  • 先检查下 conntrack 信息,发现没有链路信息
root@cilium-kubeproxy-replacement-worker2:/# conntrack -L | grep 32000
## 没有数据信息
  • iptables 信息,也没有 iptables 规则信息
root@cilium-kubeproxy-replacement-worker2:/# iptables-save | grep 32000
## 没有数据信息

那么 cilium 是如何查询 service 信息,并返回后端 Pod ip 地址给请求方的?其实 cilium 把数据保存在自身内部,使用 cilium 子命令可以查询到 service 信息

  • cilium 查询 service 信息
root@kind:~# kubectl -n kube-system exec  cilium-2xvsw -- cilium service list
ID   Frontend           Service Type   Backend                         
1    10.96.0.1:443      ClusterIP      1 => 172.18.0.4:6443 (active)   
2    10.96.0.10:53      ClusterIP      1 => 10.0.0.221:53 (active)     
                                       2 => 10.0.0.200:53 (active)     
3    10.96.0.10:9153    ClusterIP      1 => 10.0.0.221:9153 (active)   
                                       2 => 10.0.0.200:9153 (active)   
4    10.96.134.23:443   ClusterIP      1 => 172.18.0.2:4244 (active)   
11   10.96.18.68:80     ClusterIP      1 => 10.0.1.15:80 (active)      
                                       2 => 10.0.2.204:80 (active)     
                                       3 => 10.0.0.142:80 (active)     
12   172.18.0.2:32000   NodePort       1 => 10.0.1.15:80 (active)      
                                       2 => 10.0.2.204:80 (active)     
                                       3 => 10.0.0.142:80 (active)     
13   0.0.0.0:32000      NodePort       1 => 10.0.1.15:80 (active)      
                                       2 => 10.0.2.204:80 (active)     
                                       3 => 10.0.0.142:80 (active)

查看上面的 service 信息得到, 172.18.0.2:32000 后端有 3ip 地址信息,并且后端端口为 80cilium 劫持到 Pod 需要访问 service 信息,即会查询该 service 对应的后端 Pod 地址和端口返回给客户端,让客户端使用此地址发起 http 请求

posted @ 2024-06-27 16:38  evescn  阅读(198)  评论(0编辑  收藏  举报