Cilium IPSec with kubeProxy 模式

Cilium IPSec with kubeProxy 模式

一、环境信息

主机 IP
ubuntu 172.16.94.141
软件 版本
docker 26.1.4
helm v3.15.0-rc.2
kind 0.18.0
kubernetes 1.23.4
ubuntu os Ubuntu 20.04.6 LTS
kernel 5.11.5 内核升级文档

二、安装服务

kind 配置文件信息

$ cat install.sh

#!/bin/bash
date
set -v

# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-ipsec-native-routing --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  # kind 默认使用 rancher cni,cni 我们需要自己创建
  disableDefaultCNI: true
  #kubeProxyMode: "none" # Enable KubeProxy

nodes:
  - role: control-plane
  - role: worker
  - role: worker

containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
    endpoint = ["https://harbor.evescn.com"]
EOF

# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
# kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide

# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1

# 创建 IPSec key 信息
kubectl create -n kube-system secret generic cilium-ipsec-keys \
            --from-literal=keys="3 rfc4106(gcm(aes)) $(echo $(dd if=/dev/urandom count=20 bs=1 2> /dev/null | xxd -p -c 64)) 128"

# IPSec Options(--set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8" --set encryption.enabled=true --set encryption.type=ipsec)
helm install cilium cilium/cilium \
    --set k8sServiceHost=$controller_node_ip \
    --set k8sServicePort=6443 \
    --version 1.13.0-rc5 \
    --namespace kube-system \
    --set debug.enabled=true \
    --set debug.verbose=datapath \
    --set monitorAggregation=none \
    --set ipam.mode=cluster-pool \
    --set cluster.name=cilium-ipsec-native-routing \
    --set tunnel=disabled \
    --set autoDirectNodeRoutes=true \
    --set ipv4NativeRoutingCIDR="10.0.0.0/8" \
    --set encryption.enabled=true \
    --set encryption.type=ipsec

# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium) 
do
    echo $i
    docker cp /usr/bin/ping $i:/usr/bin/ping
    docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
    docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done

--set 参数解释

  1. --set tunnel=disabled

    • 含义: 禁用隧道模式。
    • 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
  2. --set autoDirectNodeRoutes=true

    • 含义: 启用自动直接节点路由。
    • 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
  3. --set ipv4NativeRoutingCIDR="10.0.0.0/8"

    • 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是 10.0.0.0/8
    • 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
  4. encryption.enabledencryption.type:

    • --set encryption.enabled=true: 启用加密功能。
    • --set encryption.type=ipsec: 使用 IPsec 进行加密。
  • 安装 k8s 集群和 cilium 服务
# ./install.sh

Creating cluster "cilium-ipsec-native-routing" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✓ Preparing nodes 📦 📦 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing StorageClass 💾 
 ✓ Joining worker nodes 🚜 
Set kubectl context to "kind-cilium-ipsec-native-routing"
You can now use your cluster with:

kubectl cluster-info --context kind-cilium-ipsec-native-routing

Thanks for using kind! 😊
  • 查看安装的服务
root@kind:~# kubectl get pods -A
NAMESPACE            NAME                                                                READY   STATUS     RESTARTS   AGE
kube-system          cilium-dctx6                                                        1/1     Running    0          4m24s
kube-system          cilium-g7kc6                                                        1/1     Running    0          4m24s
kube-system          cilium-operator-68d8dcd5dc-24zkr                                    1/1     Running    0          4m24s
kube-system          cilium-operator-68d8dcd5dc-76rk9                                    1/1     Running    0          4m24s
kube-system          cilium-pnbd2                                                        0/1     Init:0/5   0          4m24s
kube-system          coredns-64897985d-djvrw                                             1/1     Running    0          5m45s
kube-system          coredns-64897985d-dm7zx                                             1/1     Running    0          5m45s
kube-system          etcd-cilium-ipsec-native-routing-control-plane                      1/1     Running    0          5m58s
kube-system          kube-apiserver-cilium-ipsec-native-routing-control-plane            1/1     Running    0          5m57s
kube-system          kube-controller-manager-cilium-ipsec-native-routing-control-plane   1/1     Running    0          5m58s
kube-system          kube-proxy-b74kc                                                    1/1     Running    0          5m30s
kube-system          kube-proxy-jt52k                                                    1/1     Running    0          5m45s
kube-system          kube-proxy-shtx2                                                    1/1     Running    0          5m29s
kube-system          kube-scheduler-cilium-ipsec-native-routing-control-plane            1/1     Running    0          5m58s
local-path-storage   local-path-provisioner-5ddd94ff66-bn5kk                             1/1     Running    0          5m45s

cilium 配置信息

root@kind:~# kubectl -n kube-system exec -it ds/cilium -- cilium status

KVStore:                 Ok   Disabled
Kubernetes:              Ok   1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Disabled   
Host firewall:           Disabled
CNI Chaining:            none
CNI Config file:         CNI configuration file management disabled
Cilium:                  Ok   1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor:             Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok   
IPAM:                    IPv4: 5/254 allocated from 10.0.0.0/24, 
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       30/30 healthy
Proxy Status:            OK, ip 10.0.0.40, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 17.34   Metrics: Disabled
Encryption:              IPsec
Cluster health:          2/2 reachable   (2024-07-03T09:05:43Z)
  • KubeProxyReplacement: Disabled
    • kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
  • Host Routing: Legacy
    • 使用传统的主机路由。
  • Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
    • 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。
  • Encryption
    • 启用了 IPsec 加密。

k8s 集群安装 Pod 测试网络

# cat cni.yaml

apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
  labels:
    app: cni
  name: cni
spec:
  #replicas: 1
  selector:
    matchLabels:
      app: cni
  template:
    metadata:
      labels:
        app: cni
    spec:
      containers:
      - image: harbor.dayuan1997.com/devops/nettool:0.9
        name: nettoolbox
        securityContext:
          privileged: true

---
apiVersion: v1
kind: Service
metadata:
  name: serversvc
spec:
  type: NodePort
  selector:
    app: cni
  ports:
  - name: cni
    port: 80
    targetPort: 80
    nodePort: 32000
# kubectl apply -f cni.yaml
daemonset.apps/cni created
service/serversvc created

# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
  • 查看安装服务信息
# kubectl get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP           NODE                                  NOMINATED NODE   READINESS GATES
cni-fcnh6   1/1     Running   0          15s   10.0.0.249   cilium-ipsec-native-routing-worker2   <none>           <none>
cni-s22px   1/1     Running   0          15s   10.0.1.103   cilium-ipsec-native-routing-worker    <none>           <none>
net         1/1     Running   0          10s   10.0.1.52    cilium-ipsec-native-routing-worker    <none>           <none>

# kubectl get svc 
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        20m
serversvc    NodePort    10.96.169.246   <none>        80:32000/TCP   29s

三、测试网络

同节点 Pod 网络通讯

img

可以查看此文档 Cilium Native Routing with kubeProxy 模式 中,同节点网络通讯,数据包转发流程一致

不同节点 Pod 网络通讯

img

  • Pod 节点信息
## ip 信息
root@kind:~# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6e:7b:6a:37:31:ab brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.1.52/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6c7b:6aff:fe37:31ab/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
root@kind:~# kubectl exec -it net -- ip r s
default via 10.0.1.138 dev eth0 mtu 1423 
10.0.1.138 dev eth0 scope link 
  • Pod 节点所在 Node 节点信息
root@kind:~# docker exec -it cilium-ipsec-native-routing-worker bash

## ip 信息
root@cilium-ipsec-native-routing-worker:/# ip a l 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:43:07:d8:3b:01 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::843:7ff:fed8:3b01/64 scope link 
       valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 12:3d:af:49:19:f5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.138/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::103d:afff:fe49:19f5/64 scope link 
       valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d6:15:3e:8c:b1:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::d415:3eff:fe8c:b1b0/64 scope link 
       valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::3/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:3/64 scope link 
       valid_lft forever preferred_lft forever
8: lxca872fa140b51@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:86:19:2c:28:f0 brd ff:ff:ff:ff:ff:ff link-netns cni-c2e97fe2-5e76-8220-9aaa-6b4c9bef8d59
    inet6 fe80::6886:19ff:fe2c:28f0/64 scope link 
       valid_lft forever preferred_lft forever
10: lxcda1dd0af135e@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 4e:0d:10:48:00:05 brd ff:ff:ff:ff:ff:ff link-netns cni-67b56f17-88fa-ca36-906f-0217343aeb95
    inet6 fe80::4c0d:10ff:fe48:5/64 scope link 
       valid_lft forever preferred_lft forever

## 路由信息
root@cilium-ipsec-native-routing-worker:/# ip r s
default via 172.18.0.1 dev eth0 
10.0.0.0/24 via 172.18.0.4 dev eth0 
10.0.1.0/24 via 10.0.1.138 dev cilium_host src 10.0.1.138 
10.0.1.138 dev cilium_host scope link 
10.0.2.0/24 via 172.18.0.2 dev eth0 
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3 
  • Pod 节点进行 ping 包测试
root@kind:~# kubectl exec -it net -- ping -c 1 10.0.0.249
PING 10.0.0.249 (10.0.0.249): 56 data bytes
64 bytes from 10.0.0.249: seq=0 ttl=61 time=18.400 ms

--- 10.0.0.249 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 18.400/18.400/18.400 ms
  • Pod 节点 eth0 网卡抓包
net~$ tcpdump -pne -i eth0
03:12:21.916697 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 98: 10.0.1.52 > 10.0.0.249: ICMP echo request, id 87, seq 0, length 64
03:12:21.918070 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 98: 10.0.0.249 > 10.0.1.52: ICMP echo reply, id 87, seq 0, length 64
  • Node 节点 cilium-ipsec-native-routing-worker 的所有网卡抓包,并使用 wireshark 工具分析
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# tcpdump -pne -i any -w /tmp/all.cap
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# sz /tmp/all.cap

img

搜索 esp 数据包, ipsec 模式下,数据包是密文 esp 数据包,需要进行解密后才能查看。密钥信息查询 Node 节点 cilium-ipsec-native-routing-worker 主机的 ipsec 信息

## ipsec 信息
root@cilium-ipsec-native-routing-worker:/# ip x s
src 10.0.1.138 dst 10.0.2.52
        proto esp spi 0x00000003 reqid 1 mode tunnel
        replay-window 0 
        # 新增 output-mark 信息,同通用的 ipsec 模式,用于 SBR 源地址路由寻址使用
        mark 0x3e00/0xff00 output-mark 0xe00/0xf00
        aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
        sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.0.1.138 dst 10.0.0.40
        proto esp spi 0x00000003 reqid 1 mode tunnel
        replay-window 0 
        mark 0x3e00/0xff00 output-mark 0xe00/0xf00
        aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
        sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 0.0.0.0 dst 10.0.1.138
        proto esp spi 0x00000003 reqid 1 mode tunnel
        replay-window 0 
        mark 0xd00/0xf00 output-mark 0xd00/0xf00
        aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
        sel src 0.0.0.0/0 dst 0.0.0.0/0 

root@cilium-ipsec-native-routing-worker:/# ip x p
src 10.0.1.0/24 dst 10.0.2.0/24 
        dir out priority 0 
        # 新增 output-mark 信息,同通用的 ipsec 模式
        mark 0x3e00/0xff00 
        tmpl src 10.0.1.138 dst 10.0.2.52
                proto esp spi 0x00000003 reqid 1 mode tunnel
src 10.0.1.0/24 dst 10.0.0.0/24 
        dir out priority 0 
        mark 0x3e00/0xff00 
        tmpl src 10.0.1.138 dst 10.0.0.40
                proto esp spi 0x00000003 reqid 1 mode tunnel
src 0.0.0.0/0 dst 10.0.1.0/24 
        dir fwd priority 2975 
        mark 0/0xf00 
        tmpl src 0.0.0.0 dst 10.0.1.138
                proto esp reqid 1 mode tunnel
                level use 
src 0.0.0.0/0 dst 10.0.1.0/24 
        dir in priority 0 
        mark 0xd00/0xf00 
        tmpl src 0.0.0.0 dst 10.0.1.138
                proto esp reqid 1 mode tunnel
src 0.0.0.0/0 dst 10.0.1.0/24 
        dir in priority 0 
        mark 0x200/0xf00 
        tmpl src 0.0.0.0 dst 10.0.1.138
                proto esp reqid 1 mode tunnel
                level use 

# 源地址路由信息
root@cilium-ipsec-native-routing-worker:/# ip rule show
1:      from all fwmark 0xd00/0xf00 lookup 200
1:      from all fwmark 0xe00/0xf00 lookup 200
9:      from all fwmark 0x200/0xf00 lookup 2004
10:     from all fwmark 0xa00/0xf00 lookup 2005
100:    from all lookup local
32766:  from all lookup main
32767:  from all lookup default
root@cilium-ipsec-native-routing-worker:/# ip rule show t 200
root@cilium-ipsec-native-routing-worker:/# ip r s t 200
10.0.0.0/24 dev cilium_host mtu 1500 
local 10.0.1.0/24 dev eth0 proto 50 scope host 
10.0.2.0/24 dev cilium_host mtu 1500 
# 查看宿主机路由发现也存在 10.0.0.0/24 10.0.2.0/24 这2个网端路由,但是源地址路由优先级高于目的地址路由,
# 基于源地址路由信息会发现数据会送往 cilium_host 网卡

解密 esp 数据包,获取到如下图数据信息

img

  • 内层数据包信息中,只有 ip 层信息和 icmp 信息,没有 mac 层信息,比较类似 ipip 模式, ipip 数据包内层包也没有 mac 信息,但是 ipip 数据包非加密,不需要使用解密即可查看到包信息
  • 外层数据包 ip 层地址未使用 node 节点 ip: 172.18.0.3 , 而是 10.0.1.138 地址,查看 node eth0 ip 信息可以得到这个地址是 3: cilium_host@cilium_net 网卡地址,不通于传统的 ipsec 直接使用的 node ip 作为外层数据包中的 ip 地址。
    • 如何把数据包,传送到 cilium_host 网卡的?
    • 宿主机目的地址路由表信息 10.0.0.0/24 via 172.18.0.4 dev eth0
    • 宿主机源地址路由表信息 10.0.0.0/24 dev cilium_host mtu 1500
    • 可以发现 源地址路由优先级高于目的地址路由 ,所以数据在匹配到源地址路由后会送往 cilium_host 网卡
    • 那目的地址路由就没有用了?
    • 其实在后续完成了 esp 数据包加密后,外层 ip 层地址源地址和目的地址未均使用 node 节点 eth0 ip ,而是使用 cilium_host 网卡 ip ,这个时候需要把数据送往目的 node,会使用到 目的地址路由表信息 10.0.0.0/24 via 172.18.0.4 dev eth0 ,最终在 mac 中封装使用 eth0 网卡的 mac 地址,通过 eth0 发送到对端主机
  • 抓包所有网卡数据 esp 加密包中得到了 3 个 requests 包,可以分析应该为一下网卡数据
    • 3: cilium_host@cilium_net: 和 2: cilium_net@cilium_host 互为 veth pair 网卡
    • 2: cilium_net@cilium_host: 和 3: cilium_host@cilium_net 互为 veth pair 网卡
    • eth0: 宿主机真实的出口网卡
    • 网卡接口抓包信息如下:
root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i cilium_host esp
06:57:02.956575 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x5), length 120
06:57:03.957696 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x6), length 120
06:57:04.958607 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x7), length 120
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i cilium_net esp
06:57:14.973219 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x11), length 120
06:57:15.974173 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x12), length 120
06:57:16.975254 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x13), length 120
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel

root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i eth0 esp
06:57:23.984250 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1a), length 120
06:57:23.984770 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1a), length 120
06:57:24.985496 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1b), length 120
06:57:24.986230 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1b), length 120
06:57:25.986848 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1c), length 120
06:57:25.987313 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1c), length 120
06:57:26.987852 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1d), length 120
06:57:26.988388 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1d), length 120
  • eth0 网卡有 10.0.1.138 > 10.0.0.40 方向数据包,也有 10.0.0.40 > 10.0.1.138 数据包,并且 mac 层信息为 eth0 网卡 mac 地址,但是 ip 层信息为 cilium_host 网卡 ip 地址

  • 2: cilium_net@cilium_host3: cilium_host@cilium_net esp 中,均只有 10.0.1.138 > 10.0.0.40 数据包,没有回来的数据包信息,应该是会来的 esp 数据包经过 eth0 网卡后在 kernel 完成了解密后,之间送往了 pod eth0 网卡对应的 veth pair 网卡 10: lxcda1dd0af135e@if9

  • Node 节点 cilium-ipsec-native-routing-workereth0 网卡抓包,并使用 wireshark 工具分析

root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# tcpdump -pne -i eth0 -w /tmp/ipsec.cap
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# sz /tmp/ipsec.cap

img

  • 内层数据包信息中,只有 ip 层信息和 icmp 信息,没有 mac 层信息,比较类似 ipip 模式, ipip 数据包内层包也没有 mac 信息,但是 ipip 数据包非加密,不需要使用解密即可查看到包信息
  • 外层数据包 ip 层地址未使用 node 节点 ip: 172.18.0.3 , 而是 10.0.1.138 地址,查看 node ip 信息可以得到这个地址是 3: cilium_host@cilium_net 网卡地址,不通于传统的 ipsec 直接使用的 node ip 作为外层数据包中的 ip 地址
  • 外部 mac 信息中,源 mac: 02:42:ac:12:00:03cilium-ipsec-native-routing-workereth0 网卡 mac ,目的 mac: 02:42:ac:12:00:04 为对端 Pod 宿主机 cilium-ipsec-native-routing-worker2eth0 网卡 mac
  • 内层数据包信息中,只有 ip 层信息和 icmp 信息

img

  • 数据从 net 服务发出,通过查看本机路由表,送往 node 节点。路由: default via 10.0.1.138 dev eth0 mtu 1423
  • node 节点获取到数据包后,查询路由表后发现非节点的数据包信息,然后查询到源地址路由表后,会被送往 node 节点上的 cilium_host 网卡
  • cilium_host 接口收到数据包信息后,基于 源地址路由表 信息 1: from all fwmark 0xe00/0xf00 lookup 200 会关联上 ipsec 规则 src 10.0.1.138 dst 10.0.0.40 ... mark 0x3e00/0xff00 output-mark 0xe00/0xf00 ,对数据进行加密并封装 ip 层信息后后发送到 eth0 网卡。
  • 数据封装完成后,会送往 eth0 网卡,封装上 mac 层信息后,并送往对端 node 节点。
  • 对端 node 节点接受到数据包后,发现这个是一个 ipsec 数据包,将数据包内核模块处理。
  • 解封装后发现内部的数据包,目的地址为 10.0.0.249 ,发现是本机 Pod 地址段,会直接送往目标 Pod eth0veth pair 网卡 lxc92161d5a7c99
  • 最终会把数据包送到目地 Pod 主机

Service 网络通讯

  • 查看 Service 信息
root@kind:~# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        138m
serversvc    NodePort    10.96.142.173   <none>        80:32000/TCP   74m
  • net 服务上请求 Pod 所在 Node 节点 32000 端口
root@kind:~# kubectl exec -ti net -- curl 172.18.0.3:32000
PodName: cni-fcnh6 | PodIP: eth0 10.0.0.249/32

并在 net 服务 eth0 网卡 抓包查看

net~$tcpdump -pne -i eth0

07:22:39.257030 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 74: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [S], seq 1069156627, win 65001, options [mss 1383,sackOK,TS val 3849413747 ecr 0,nop,wscale 7], length 0
07:22:39.258581 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 74: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [S.], seq 3267497765, ack 1069156628, win 64437, options [mss 1383,sackOK,TS val 3355341573 ecr 3849413747,nop,wscale 7], length 0
07:22:39.258598 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 1, win 508, options [nop,nop,TS val 3849413748 ecr 3355341573], length 0
07:22:39.261595 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 146: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [P.], seq 1:81, ack 1, win 508, options [nop,nop,TS val 3849413751 ecr 3355341573], length 80
07:22:39.262098 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 66: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [.], ack 81, win 503, options [nop,nop,TS val 3355341577 ecr 3849413751], length 0
07:22:39.320140 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 302: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [P.], seq 1:237, ack 81, win 503, options [nop,nop,TS val 3355341635 ecr 3849413751], length 236
07:22:39.320152 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 237, win 507, options [nop,nop,TS val 3849413810 ecr 3355341635], length 0
07:22:39.321635 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 113: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [P.], seq 237:284, ack 81, win 503, options [nop,nop,TS val 3355341636 ecr 3849413810], length 47
07:22:39.321642 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 284, win 507, options [nop,nop,TS val 3849413811 ecr 3355341636], length 0
07:22:39.322130 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [F.], seq 81, ack 284, win 507, options [nop,nop,TS val 3849413812 ecr 3355341636], length 0
07:22:39.330074 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 66: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [F.], seq 284, ack 82, win 503, options [nop,nop,TS val 3355341645 ecr 3849413812], length 0
07:22:39.330085 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 285, win 507, options [nop,nop,TS val 3849413820 ecr 3355341645], length 0

抓包数据显示, net 服务使用 48858 端口和 172.18.0.3 32000 端口进行 tcp 通讯。

* `KubeProxyReplacement:    Disabled`
  * kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。

cilium 配置 KubeProxyReplacement: Disabled,通过配置信息确定 cilium 没有接管 kube-proxy 的功能。那么 kube-proxy 使用 iptablesipvs 进行 service 转发,此处 kind 使用 iptables,查看 conntrack 连接跟踪和 iptables 规则验证

  • conntrack 信息
root@cilium-ipsec-native-routing-worker:/# conntrack -L | grep 32000
conntrack v1.4.6 (conntrack-tools): 44 flow entries have been shown.
tcp      6 118 TIME_WAIT src=10.0.1.52 dst=172.18.0.3 sport=45404 dport=32000 src=10.0.0.249 dst=172.18.0.3 sport=80 dport=50772 [ASSURED] mark=0 use=1
  • iptables 信息
root@cilium-ipsec-native-routing-worker:/# iptables-save | grep 32000
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-SVC-CU7F3MNN62CF4ANP
-A KUBE-SVC-CU7F3MNN62CF4ANP -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-MARK-MASQ
posted @ 2024-07-05 18:16  evescn  阅读(17)  评论(0编辑  收藏  举报