Cilium Native Routing with kubeProxy 模式

Cilium Native Routing with kubeProxy 模式

一、环境信息

主机 IP
ubuntu 172.16.94.141
软件 版本
docker 26.1.4
helm v3.15.0-rc.2
kind 0.18.0
kubernetes 1.23.4
ubuntu os Ubuntu 20.04.6 LTS
kernel 5.11.5 内核升级文档

内核升级

# wget https://raw.githubusercontent.com/pimlie/ubuntu-mainline-kernel.sh/master/ubuntu-mainline-kernel.sh

# sudo install ubuntu-mainline-kernel.sh /usr/local/bin/

# sudo ubuntu-mainline-kernel.sh -i 5.11.5
## 安装指定版本:sudo ubuntu-mainline-kernel.sh -i 内核版本号

# sudo reboot
# uname -r

二、安装服务

kind 配置文件信息

$ cat install.sh

#!/bin/bash
date
set -v

# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-kubeproxy --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  # kind 默认使用 rancher cni,cni 我们需要自己创建
  disableDefaultCNI: true
  #kubeProxyMode: "none" # Enable KubeProxy

nodes:
  - role: control-plane
  - role: worker
  - role: worker

containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
    endpoint = ["https://harbor.evescn.com"]
EOF

# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide

# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1

# Direct Routing Options(--set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8")
helm install cilium cilium/cilium \
    --set k8sServiceHost=$controller_node_ip \
    --set k8sServicePort=6443 \
    --version 1.13.0-rc5 \
    --namespace kube-system \
    --set debug.enabled=true \
    --set debug.verbose=datapath \
    --set monitorAggregation=none \
    --set ipam.mode=cluster-pool \
    --set cluster.name=cilium-kubeproxy \
    --set tunnel=disabled \
    --set autoDirectNodeRoutes=true \
    --set ipv4NativeRoutingCIDR="10.0.0.0/8"

# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium) 
do
    echo $i
    docker cp /usr/bin/ping $i:/usr/bin/ping
    docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
    docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done

--set 参数解释

  1. --set tunnel=disabled

    • 含义: 禁用隧道模式。
    • 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
  2. --set autoDirectNodeRoutes=true

    • 含义: 启用自动直接节点路由。
    • 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
  3. --set ipv4NativeRoutingCIDR="10.0.0.0/8"

    • 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是 10.0.0.0/8
    • 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
  • 安装 k8s 集群和 cilium 服务
# ./install.sh

Creating cluster "cilium-kubeproxy" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✓ Preparing nodes 📦 📦 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing StorageClass 💾 
 ✓ Joining worker nodes 🚜 
Set kubectl context to "kind-cilium-kubeproxy"
You can now use your cluster with:

kubectl cluster-info --context kind-cilium-kubeproxy

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/
  • 查看安装的服务
# kubectl get pods -A
NAMESPACE            NAME                                                     READY   STATUS    RESTARTS   AGE
kube-system          cilium-47qhb                                             1/1     Running   0          9m15s
kube-system          cilium-fk2nr                                             1/1     Running   0          9m15s
kube-system          cilium-nmxnh                                             1/1     Running   0          9m15s
kube-system          cilium-operator-dd757785c-k47m5                          1/1     Running   0          9m15s
kube-system          cilium-operator-dd757785c-s7jk9                          1/1     Running   0          9m15s
kube-system          coredns-64897985d-l7q5n                                  1/1     Running   0          10m
kube-system          coredns-64897985d-ljwh4                                  1/1     Running   0          10m
kube-system          etcd-cilium-kubeproxy-control-plane                      1/1     Running   0          10m
kube-system          kube-apiserver-cilium-kubeproxy-control-plane            1/1     Running   0          10m
kube-system          kube-controller-manager-cilium-kubeproxy-control-plane   1/1     Running   0          10m
kube-system          kube-proxy-hdf27                                         1/1     Running   0          10m
kube-system          kube-proxy-jb95q                                         1/1     Running   0          10m
kube-system          kube-proxy-v7pqb                                         1/1     Running   0          10m
kube-system          kube-scheduler-cilium-kubeproxy-control-plane            1/1     Running   0          10m
local-path-storage   local-path-provisioner-5ddd94ff66-sv7l4                  1/1     Running   0          10m

cilium 配置信息

# kubectl -n kube-system exec -it ds/cilium -- cilium status

KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Disabled   
Host firewall:           Disabled
CNI Chaining:            none
CNI Config file:         CNI configuration file management disabled
Cilium:                  Ok   1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor:             Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 6/254 allocated from 10.0.0.0/24, 
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       34/34 healthy
Proxy Status:            OK, ip 10.0.0.63, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 10.11   Metrics: Disabled
Encryption:              Disabled
Cluster health:          3/3 reachable   (2024-06-25T08:37:57Z)
  • KubeProxyReplacement: Disabled
    • kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
  • Host Routing: Legacy
    • 使用传统的主机路由。
  • Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
    • 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。

k8s 集群安装 Pod 测试网络

# cat cni.yaml

apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
  labels:
    app: cilium-with-kubeproxy
  name: cilium-with-kubeproxy
spec:
  #replicas: 1
  selector:
    matchLabels:
      app: cilium-with-kubeproxy
  template:
    metadata:
      labels:
        app: cilium-with-kubeproxy
    spec:
      containers:
      - image: harbor.dayuan1997.com/devops/nettool:0.9
        name: nettoolbox
        securityContext:
          privileged: true

---
apiVersion: v1
kind: Service
metadata:
  name: serversvc
spec:
  type: NodePort
  selector:
    app: cilium-with-kubeproxy
  ports:
  - name: cni
    port: 80
    targetPort: 80
    nodePort: 32000
# kubectl apply -f cni.yaml
daemonset.apps/cilium-with-kubeproxy created
service/serversvc created

# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
  • 查看安装服务信息
# kubectl get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE    IP           NODE                             NOMINATED NODE   READINESS GATES
cilium-with-kubeproxy-dqw5b   1/1     Running   0          109s   10.0.1.60    cilium-kubeproxy-control-plane   <none>           <none>
cilium-with-kubeproxy-jlx85   1/1     Running   0          109s   10.0.2.76    cilium-kubeproxy-worker          <none>           <none>
cilium-with-kubeproxy-zxtpj   1/1     Running   0          109s   10.0.0.53    cilium-kubeproxy-worker2         <none>           <none>
net                           1/1     Running   0          24s    10.0.2.153   cilium-kubeproxy-worker          <none>           <none>

# kubectl get svc 
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP        14m
serversvc    NodePort    10.96.0.36   <none>        80:32000/TCP   2m4s

三、测试网络

同节点 Pod 网络通讯

img

  • Pod 节点信息
## ip 信息
# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:72:2c:1e:1a:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    # 子网掩码 32 位
    inet 10.0.2.153/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6872:2cff:fe1e:1a82/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
#  kubectl exec -it net -- ip r s
default via 10.0.2.199 dev eth0 mtu 1500 
10.0.2.199 dev eth0 scope link 

查看 Pod 信息发现在 cilium 中主机的 IP 地址为 32 位掩码,意味着该 IP 地址是单个主机的唯一标识,而不是一个子网。这个主机访问其他 IP 均会走路由到达

  • Pod 节点所在 Node 节点信息
# docker exec -it cilium-kubeproxy-worker bash

## ip 信息
root@cilium-kubeproxy-worker:/# ip a l 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:46:e3:8c:bf:2d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f046:e3ff:fe8c:bf2d/64 scope link 
       valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 4e:67:23:df:e2:0c brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.199/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::4c67:23ff:fedf:e20c/64 scope link 
       valid_lft forever preferred_lft forever
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::2/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link 
       valid_lft forever preferred_lft forever
6: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2e:65:1c:bf:79:64 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::2c65:1cff:febf:7964/64 scope link 
       valid_lft forever preferred_lft forever
8: lxc6be7bdb14e12@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f6:5a:f9:5b:1f:fd brd ff:ff:ff:ff:ff:ff link-netns cni-5d3ffe4d-51ee-28da-a9d2-a60afab1c45e
    inet6 fe80::f45a:f9ff:fe5b:1ffd/64 scope link 
       valid_lft forever preferred_lft forever
12: lxcb6cad2eb0861@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:d3:e6:00:d9:6a brd ff:ff:ff:ff:ff:ff link-netns cni-f260ef4f-7cd4-a12f-8924-dc0caa5ff691
    inet6 fe80::fcd3:e6ff:fe00:d96a/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
root@cilium-kubeproxy-worker:/# ip r s
default via 172.18.0.1 dev eth0 
10.0.0.0/24 via 172.18.0.3 dev eth0 
10.0.1.0/24 via 172.18.0.4 dev eth0 
10.0.2.0/24 via 10.0.2.199 dev cilium_host src 10.0.2.199 
10.0.2.199 dev cilium_host scope link 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2 
  • Pod 节点进行 ping 包测试,查看宿主机路由信息,发现并在数据包会在通过 10.0.2.0/24 via 10.0.2.199 dev cilium_host src 10.0.2.199 路由信息转发
root@kind:~# kubectl exec -it net -- ping 10.0.2.76 -c 1
PING 10.0.2.76 (10.0.2.76): 56 data bytes
64 bytes from 10.0.2.76: seq=0 ttl=63 time=0.411 ms

--- 10.0.2.76 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.411/0.411/0.411 ms
  • Pod 节点 eth0 网卡抓包
net~$ tcpdump -pne -i eth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078180 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078547 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64
  • Node 节点 cilium_host 网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i cilium_host    
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

通过抓包发现,我们无法在 cilium_host 网卡上抓到数据包信息,但是同节点的 Pod 通讯正常,查看前面 Pod eth0 网卡抓包信息,发现下一条mac 地址为: fe:d3:e6:00:d9:6a ,对比 Node 节点信息,发现是 lxcb6cad2eb0861 网卡 mac 地址,并且通过网卡 id 信息,可以确定他们互为 veth pair 网卡

  • Node 节点 lxcb6cad2eb0861 网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxcb6cad2eb0861
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lxcb6cad2eb0861, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078183 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078546 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64

lxcb6cad2eb0861 网卡抓包到了数据包传输信息,表示数据包送到了 Node 节点,但是却没有按照路由信息送到 cilium_host 网卡,实质上是 cilium 底层的代码实现了数据包劫持,当发送数据包是同节点时,还未走的查询 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡

  • 查看 10.0.2.76 Pod 主机信息
# kubectl exec -it cilium-with-kubeproxy-jlx85 -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6e:58:19:c1:9c:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.2.76/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6c58:19ff:fec1:9cdf/64 scope link 
       valid_lft forever preferred_lft forever

查看 eth0 信息,可以确定在 Podeth0 网卡在宿主机的 veth pair 网卡位 id = 8 的网卡,查看前面 Node 节点信息发现为: 8: lxc6be7bdb14e12@if7

  • Node 节点 lxc6be7bdb14e12 网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxc6be7bdb14e12
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lxc6be7bdb14e12, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:13:30.078384 f6:5a:f9:5b:1f:fd > 6e:58:19:c1:9c:df, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.2.76: ICMP echo request, id 83, seq 0, length 64
09:13:30.078396 6e:58:19:c1:9c:df > f6:5a:f9:5b:1f:fd, ethertype IPv4 (0x0800), length 98: 10.0.2.76 > 10.0.2.153: ICMP echo reply, id 83, seq 0, length 64

在此网卡上抓起到了数据包信息,目的 mac 6e:58:19:c1:9c:df10.0.2.76 Pod 主机 eth0 网卡 mac 信息,即此数据包时送往目标 IP 的数据包信息

总结:同节点 Pod 通讯,数据包通过 veth pair 送往 Node 节点后, Node 节点上运行的 cilium 代码实现了数据包劫持,当发送数据包是同节点时,还未走到查询 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡

不同节点 Pod 网络通讯

img

  • Pod 节点信息,查看前面: 同节点 Pod 网络通讯

  • Pod 节点所在 Node 节点信息,查看前面: 同节点 Pod 网络通讯

  • Pod 节点进行 ping 包测试,查看宿主机路由信息,发现并在数据包会在通过 10.0.0.0/24 via 172.18.0.3 dev eth0 路由信息转发

root@kind:~# kubectl exec -it net -- ping 10.0.0.53 -c 1
PING 10.0.0.53 (10.0.0.53): 56 data bytes
64 bytes from 10.0.0.53: seq=0 ttl=63 time=0.411 ms

--- 10.0.0.53 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.411/0.411/0.411 ms
  • Node 节点 cilium-kubeproxy-worker lxcb6cad2eb0861 网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i lxcb6cad2eb0861
09:57:05.768885 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
09:57:05.769549 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64

Pod 节点数据出来后,还是先到达了 veth pair 网卡。

  • Node 节点 cilium-kubeproxy-worker eth0 网卡抓包
root@cilium-kubeproxy-worker:/# tcpdump -pne -i eth0 icmp
09:57:05.769062 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
09:57:05.769422 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64

查看 eth0 网卡抓包信息,发现数据包下一跳 mac 02:42:ac:12:00:03 ,按照之前的路由信息,分析此地址应该为 172.18.0.3 节点 eth0 mac 地址

查看 cilium-kubeproxy-worker 节点 mac 信息

root@cilium-kubeproxy-worker:/# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
172.18.0.4               ether   02:42:ac:12:00:04   C                     eth0
172.18.0.3               ether   02:42:ac:12:00:03   C                     eth0
172.18.0.1               ether   02:42:2f:fe:43:35   C                     eth0
  • Node 节点 cilium-kubeproxy-worker2 eth0 网卡抓包
root@cilium-kubeproxy-worker2:/# tcpdump -pne -i eth0
08:01:58.623752 02:42:ac:12:00:02 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
08:01:58.624110 02:42:ac:12:00:03 > 02:42:ac:12:00:02, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64

查看 eth0 网卡抓包信息,数据包源 mac 02:42:ac:12:00:02172.18.0.2 eth0 mac 地址,目的 mac 02:42:ac:12:00:03 ,为本机 eth0 mac 地址

查看 cilium-kubeproxy-worker2 节点 ip 信息

root@cilium-kubeproxy-worker2:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 96:0e:53:42:98:b2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::940e:53ff:fe42:98b2/64 scope link 
       valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:1e:f9:4f:4f:b1 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.99/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::cc1e:f9ff:fe4f:4fb1/64 scope link 
       valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0e:74:87:76:e7:93 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::c74:87ff:fe76:e793/64 scope link 
       valid_lft forever preferred_lft forever
7: lxc362604ac4ff4@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 3a:44:a3:48:94:e1 brd ff:ff:ff:ff:ff:ff link-netns cni-901d361e-7ed3-e13f-0d1a-9beefa724748
    inet6 fe80::3844:a3ff:fe48:94e1/64 scope link 
       valid_lft forever preferred_lft forever
13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link 
       valid_lft forever preferred_lft forever

查看 cilium-kubeproxy-worker2 节点 route 信息

## 路由信息
root@cilium-kubeproxy-worker2:/# ip r s
default via 172.18.0.1 dev eth0 
10.0.0.0/24 via 10.0.0.99 dev cilium_host src 10.0.0.99 
10.0.0.99 dev cilium_host scope link 
10.0.1.0/24 via 172.18.0.4 dev eth0 
10.0.2.0/24 via 172.18.0.2 dev eth0 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3 

查看路由信息,数据包目的地址 10.0.0.53,会在通过 10.0.0.0/24 via 10.0.0.99 dev cilium_host src 10.0.0.99 路由信息转发,在 cilium_host 网卡抓包

root@cilium-kubeproxy-worker2:/# tcpdump -pne -i cilium_host
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_host, link-type EN10MB (Ethernet), snapshot length 262144 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel

通过抓包发现,我们无法在 cilium_host 网卡上抓到数据包信息。类似同节点的 Pod 通讯情况,也无法在 cilium_host 网卡上抓包,但是通讯正常。

  • 目标 Pod eth0 网卡抓包
cilium-with-kubeproxy-zxtpj~$ tcpdump -pne -i eth0
08:01:58.623944 3a:44:a3:48:94:e1 > 42:6c:de:59:dd:36, ethertype IPv4 (0x0800), length 98: 10.0.2.153 > 10.0.0.53: ICMP echo request, id 128, seq 0, length 64
08:01:58.623981 42:6c:de:59:dd:36 > 3a:44:a3:48:94:e1, ethertype IPv4 (0x0800), length 98: 10.0.0.53 > 10.0.2.153: ICMP echo reply, id 128, seq 0, length 64

查看数据包信息中的源 mac 3a:44:a3:48:94:e1 ,发是 cilium-kubeproxy-worker2 节点 lxc362604ac4ff4 网卡 mac 地址。实质上还是 cilium 底层的代码实现了数据包劫持,当发送数据包是 Node 节点时,还未走到查询 Node 节点 routing 路由表信息时,就完成了数据包的转发,送往了目的 Pod 节点的 veth pair 网卡。lxc362604ac4ff4 网卡和目标 Pod eth0 网卡互为 veth pair 网卡

Service 网络通讯

  • 查看 Service 信息
# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        138m
serversvc    NodePort    10.96.142.173   <none>        80:32000/TCP   74m
  • net 服务上请求 Pod 所在 Node 节点 32000 端口
# kubectl exec -ti net -- curl 172.18.0.2:32000
PodName: cilium-with-kubeproxy-dqw5b | PodIP: eth0 10.0.1.60/32

并在 net 服务 eth0 网卡 抓包查看

net~$ tcpdump -pne -i eth0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:47:11.587710 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 74: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [S], seq 1116502993, win 64240, options [mss 1460,sackOK,TS val 3054730101 ecr 0,nop,wscale 7], length 0
08:47:11.588352 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 74: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [S.], seq 2686464703, ack 1116502994, win 65160, options [mss 1460,sackOK,TS val 584638172 ecr 3054730101,nop,wscale 7], length 0
08:47:11.588362 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 1, win 502, options [nop,nop,TS val 3054730101 ecr 584638172], length 0
08:47:11.589016 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 146: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [P.], seq 1:81, ack 1, win 502, options [nop,nop,TS val 3054730102 ecr 584638172], length 80
08:47:11.589561 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 66: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [.], ack 81, win 509, options [nop,nop,TS val 584638173 ecr 3054730102], length 0
08:47:11.589780 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 302: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [P.], seq 1:237, ack 81, win 509, options [nop,nop,TS val 584638173 ecr 3054730102], length 236
08:47:11.590091 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 131: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [P.], seq 237:302, ack 81, win 509, options [nop,nop,TS val 584638174 ecr 3054730102], length 65
08:47:11.590362 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 237, win 501, options [nop,nop,TS val 3054730103 ecr 584638173], length 0
08:47:11.590861 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 302, win 501, options [nop,nop,TS val 3054730104 ecr 584638174], length 0
08:47:11.592648 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [F.], seq 81, ack 302, win 501, options [nop,nop,TS val 3054730105 ecr 584638174], length 0
08:47:11.593241 fe:d3:e6:00:d9:6a > 6a:72:2c:1e:1a:82, ethertype IPv4 (0x0800), length 66: 172.18.0.2.32000 > 10.0.2.153.36788: Flags [F.], seq 302, ack 82, win 509, options [nop,nop,TS val 584638177 ecr 3054730105], length 0
08:47:11.593248 6a:72:2c:1e:1a:82 > fe:d3:e6:00:d9:6a, ethertype IPv4 (0x0800), length 66: 10.0.2.153.36788 > 172.18.0.2.32000: Flags [.], ack 303, win 501, options [nop,nop,TS val 3054730106 ecr 584638177], length 0

抓包数据显示, net 服务使用 36788 端口和 172.18.0.2 32000 端口进行 tcp 通讯。

* `KubeProxyReplacement:    Disabled`
  * kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。

cilium 配置 KubeProxyReplacement: Disabled,通过配置信息确定 cilium 没有接管 kube-proxy 的功能。那么 kube-proxy 使用 iptablesipvs 进行 service 转发,此处 kind 使用 iptables,查看 conntrack 连接跟踪和 iptables 规则验证

  • conntrack 信息
root@cilium-kubeproxy-worker:/# conntrack -L | grep 32000
conntrack v1.4.6 (conntrack-tools): 38 flow entries have been shown.
tcp      6 91 TIME_WAIT src=10.0.2.153 dst=172.18.0.2 sport=59298 dport=32000 src=10.0.1.60 dst=172.18.0.2 sport=80 dport=56944 [ASSURED] mark=0 use=1
  • iptables 信息
root@cilium-kubeproxy-worker:/# iptables-save | grep 32000
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-SVC-CU7F3MNN62CF4ANP
-A KUBE-SVC-CU7F3MNN62CF4ANP -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-MARK-MASQ
posted @ 2024-06-26 16:58  evescn  阅读(88)  评论(0编辑  收藏  举报