Cilium IPSec with kubeProxy 模式
Cilium IPSec with kubeProxy 模式
一、环境信息
主机 | IP |
---|---|
ubuntu | 172.16.94.141 |
软件 | 版本 |
---|---|
docker | 26.1.4 |
helm | v3.15.0-rc.2 |
kind | 0.18.0 |
kubernetes | 1.23.4 |
ubuntu os | Ubuntu 20.04.6 LTS |
kernel | 5.11.5 内核升级文档 |
二、安装服务
kind
配置文件信息
$ cat install.sh
#!/bin/bash
date
set -v
# 1.prep noCNI env
cat <<EOF | kind create cluster --name=cilium-ipsec-native-routing --image=kindest/node:v1.23.4 --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
# kind 默认使用 rancher cni,cni 我们需要自己创建
disableDefaultCNI: true
#kubeProxyMode: "none" # Enable KubeProxy
nodes:
- role: control-plane
- role: worker
- role: worker
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.evescn.com"]
endpoint = ["https://harbor.evescn.com"]
EOF
# 2.remove taints
controller_node_ip=`kubectl get node -o wide --no-headers | grep -E "control-plane|bpf1" | awk -F " " '{print $6}'`
# kubectl taint nodes $(kubectl get nodes -o name | grep control-plane) node-role.kubernetes.io/master:NoSchedule-
kubectl get nodes -o wide
# 3.install cni
helm repo add cilium https://helm.cilium.io > /dev/null 2>&1
helm repo update > /dev/null 2>&1
# 创建 IPSec key 信息
kubectl create -n kube-system secret generic cilium-ipsec-keys \
--from-literal=keys="3 rfc4106(gcm(aes)) $(echo $(dd if=/dev/urandom count=20 bs=1 2> /dev/null | xxd -p -c 64)) 128"
# IPSec Options(--set tunnel=disabled --set autoDirectNodeRoutes=true --set ipv4NativeRoutingCIDR="10.0.0.0/8" --set encryption.enabled=true --set encryption.type=ipsec)
helm install cilium cilium/cilium \
--set k8sServiceHost=$controller_node_ip \
--set k8sServicePort=6443 \
--version 1.13.0-rc5 \
--namespace kube-system \
--set debug.enabled=true \
--set debug.verbose=datapath \
--set monitorAggregation=none \
--set ipam.mode=cluster-pool \
--set cluster.name=cilium-ipsec-native-routing \
--set tunnel=disabled \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR="10.0.0.0/8" \
--set encryption.enabled=true \
--set encryption.type=ipsec
# 4.install necessary tools
for i in $(docker ps -a --format "table {{.Names}}" | grep cilium)
do
echo $i
docker cp /usr/bin/ping $i:/usr/bin/ping
docker exec -it $i bash -c "sed -i -e 's/jp.archive.ubuntu.com\|archive.ubuntu.com\|security.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list"
docker exec -it $i bash -c "apt-get -y update >/dev/null && apt-get -y install net-tools tcpdump lrzsz bridge-utils >/dev/null 2>&1"
done
--set
参数解释
-
--set tunnel=disabled
- 含义: 禁用隧道模式。
- 用途: 禁用后,Cilium 将不使用 vxlan 技术,直接在主机之间路由数据包,即 direct-routing 模式。
-
--set autoDirectNodeRoutes=true
- 含义: 启用自动直接节点路由。
- 用途: 使 Cilium 自动设置直接节点路由,优化网络流量。
-
--set ipv4NativeRoutingCIDR="10.0.0.0/8"
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
10.0.0.0/8
。 - 用途: 配置 Cilium 使其知道哪些 IP 地址范围应该通过本地路由进行处理,不做 snat , Cilium 默认会对所用地址做 snat。
- 含义: 指定用于 IPv4 本地路由的 CIDR 范围,这里是
-
encryption.enabled
和encryption.type
:--set encryption.enabled=true
: 启用加密功能。--set encryption.type=ipsec
: 使用 IPsec 进行加密。
- 安装
k8s
集群和cilium
服务
# ./install.sh
Creating cluster "cilium-ipsec-native-routing" ...
✓ Ensuring node image (kindest/node:v1.23.4) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-cilium-ipsec-native-routing"
You can now use your cluster with:
kubectl cluster-info --context kind-cilium-ipsec-native-routing
Thanks for using kind! 😊
- 查看安装的服务
root@kind:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-dctx6 1/1 Running 0 4m24s
kube-system cilium-g7kc6 1/1 Running 0 4m24s
kube-system cilium-operator-68d8dcd5dc-24zkr 1/1 Running 0 4m24s
kube-system cilium-operator-68d8dcd5dc-76rk9 1/1 Running 0 4m24s
kube-system cilium-pnbd2 0/1 Init:0/5 0 4m24s
kube-system coredns-64897985d-djvrw 1/1 Running 0 5m45s
kube-system coredns-64897985d-dm7zx 1/1 Running 0 5m45s
kube-system etcd-cilium-ipsec-native-routing-control-plane 1/1 Running 0 5m58s
kube-system kube-apiserver-cilium-ipsec-native-routing-control-plane 1/1 Running 0 5m57s
kube-system kube-controller-manager-cilium-ipsec-native-routing-control-plane 1/1 Running 0 5m58s
kube-system kube-proxy-b74kc 1/1 Running 0 5m30s
kube-system kube-proxy-jt52k 1/1 Running 0 5m45s
kube-system kube-proxy-shtx2 1/1 Running 0 5m29s
kube-system kube-scheduler-cilium-ipsec-native-routing-control-plane 1/1 Running 0 5m58s
local-path-storage local-path-provisioner-5ddd94ff66-bn5kk 1/1 Running 0 5m45s
cilium
配置信息
root@kind:~# kubectl -n kube-system exec -it ds/cilium -- cilium status
KVStore: Ok Disabled
Kubernetes: Ok 1.23 (v1.23.4) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Disabled
Host firewall: Disabled
CNI Chaining: none
CNI Config file: CNI configuration file management disabled
Cilium: Ok 1.13.0-rc5 (v1.13.0-rc5-dc22a46f)
NodeMonitor: Listening for events on 128 CPUs with 64x4096 of shared memory
Cilium health daemon: Ok
IPAM: IPv4: 5/254 allocated from 10.0.0.0/24,
IPv6 BIG TCP: Disabled
BandwidthManager: Disabled
Host Routing: Legacy
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status: 30/30 healthy
Proxy Status: OK, ip 10.0.0.40, 0 redirects active on ports 10000-20000
Global Identity Range: min 256, max 65535
Hubble: Ok Current/Max Flows: 4095/4095 (100.00%), Flows/s: 17.34 Metrics: Disabled
Encryption: IPsec
Cluster health: 2/2 reachable (2024-07-03T09:05:43Z)
KubeProxyReplacement: Disabled
- kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
Host Routing: Legacy
- 使用传统的主机路由。
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
- 使用 iptables 进行 IP 伪装(NAT),IPv4 伪装启用,IPv6 伪装禁用。
- Encryption
- 启用了 IPsec 加密。
k8s
集群安装 Pod
测试网络
# cat cni.yaml
apiVersion: apps/v1
kind: DaemonSet
#kind: Deployment
metadata:
labels:
app: cni
name: cni
spec:
#replicas: 1
selector:
matchLabels:
app: cni
template:
metadata:
labels:
app: cni
spec:
containers:
- image: harbor.dayuan1997.com/devops/nettool:0.9
name: nettoolbox
securityContext:
privileged: true
---
apiVersion: v1
kind: Service
metadata:
name: serversvc
spec:
type: NodePort
selector:
app: cni
ports:
- name: cni
port: 80
targetPort: 80
nodePort: 32000
# kubectl apply -f cni.yaml
daemonset.apps/cni created
service/serversvc created
# kubectl run net --image=harbor.dayuan1997.com/devops/nettool:0.9
pod/net created
- 查看安装服务信息
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cni-fcnh6 1/1 Running 0 15s 10.0.0.249 cilium-ipsec-native-routing-worker2 <none> <none>
cni-s22px 1/1 Running 0 15s 10.0.1.103 cilium-ipsec-native-routing-worker <none> <none>
net 1/1 Running 0 10s 10.0.1.52 cilium-ipsec-native-routing-worker <none> <none>
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20m
serversvc NodePort 10.96.169.246 <none> 80:32000/TCP 29s
三、测试网络
同节点 Pod
网络通讯
可以查看此文档 Cilium Native Routing with kubeProxy 模式 中,同节点网络通讯,数据包转发流程一致
不同节点 Pod
网络通讯
Pod
节点信息
## ip 信息
root@kind:~# kubectl exec -it net -- ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6e:7b:6a:37:31:ab brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.52/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::6c7b:6aff:fe37:31ab/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
root@kind:~# kubectl exec -it net -- ip r s
default via 10.0.1.138 dev eth0 mtu 1423
10.0.1.138 dev eth0 scope link
Pod
节点所在Node
节点信息
root@kind:~# docker exec -it cilium-ipsec-native-routing-worker bash
## ip 信息
root@cilium-ipsec-native-routing-worker:/# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0a:43:07:d8:3b:01 brd ff:ff:ff:ff:ff:ff
inet6 fe80::843:7ff:fed8:3b01/64 scope link
valid_lft forever preferred_lft forever
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 12:3d:af:49:19:f5 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.138/32 scope link cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::103d:afff:fe49:19f5/64 scope link
valid_lft forever preferred_lft forever
5: lxc_health@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d6:15:3e:8c:b1:b0 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::d415:3eff:fe8c:b1b0/64 scope link
valid_lft forever preferred_lft forever
7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::3/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:3/64 scope link
valid_lft forever preferred_lft forever
8: lxca872fa140b51@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:86:19:2c:28:f0 brd ff:ff:ff:ff:ff:ff link-netns cni-c2e97fe2-5e76-8220-9aaa-6b4c9bef8d59
inet6 fe80::6886:19ff:fe2c:28f0/64 scope link
valid_lft forever preferred_lft forever
10: lxcda1dd0af135e@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 4e:0d:10:48:00:05 brd ff:ff:ff:ff:ff:ff link-netns cni-67b56f17-88fa-ca36-906f-0217343aeb95
inet6 fe80::4c0d:10ff:fe48:5/64 scope link
valid_lft forever preferred_lft forever
## 路由信息
root@cilium-ipsec-native-routing-worker:/# ip r s
default via 172.18.0.1 dev eth0
10.0.0.0/24 via 172.18.0.4 dev eth0
10.0.1.0/24 via 10.0.1.138 dev cilium_host src 10.0.1.138
10.0.1.138 dev cilium_host scope link
10.0.2.0/24 via 172.18.0.2 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
Pod
节点进行ping
包测试
root@kind:~# kubectl exec -it net -- ping -c 1 10.0.0.249
PING 10.0.0.249 (10.0.0.249): 56 data bytes
64 bytes from 10.0.0.249: seq=0 ttl=61 time=18.400 ms
--- 10.0.0.249 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 18.400/18.400/18.400 ms
Pod
节点eth0
网卡抓包
net~$ tcpdump -pne -i eth0
03:12:21.916697 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 98: 10.0.1.52 > 10.0.0.249: ICMP echo request, id 87, seq 0, length 64
03:12:21.918070 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 98: 10.0.0.249 > 10.0.1.52: ICMP echo reply, id 87, seq 0, length 64
Node
节点cilium-ipsec-native-routing-worker
的所有网卡抓包,并使用 wireshark 工具分析
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# tcpdump -pne -i any -w /tmp/all.cap
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# sz /tmp/all.cap
搜索 esp
数据包, ipsec
模式下,数据包是密文 esp
数据包,需要进行解密后才能查看。密钥信息查询 Node
节点 cilium-ipsec-native-routing-worker
主机的 ipsec
信息
## ipsec 信息
root@cilium-ipsec-native-routing-worker:/# ip x s
src 10.0.1.138 dst 10.0.2.52
proto esp spi 0x00000003 reqid 1 mode tunnel
replay-window 0
# 新增 output-mark 信息,同通用的 ipsec 模式,用于 SBR 源地址路由寻址使用
mark 0x3e00/0xff00 output-mark 0xe00/0xf00
aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
src 10.0.1.138 dst 10.0.0.40
proto esp spi 0x00000003 reqid 1 mode tunnel
replay-window 0
mark 0x3e00/0xff00 output-mark 0xe00/0xf00
aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
src 0.0.0.0 dst 10.0.1.138
proto esp spi 0x00000003 reqid 1 mode tunnel
replay-window 0
mark 0xd00/0xf00 output-mark 0xd00/0xf00
aead rfc4106(gcm(aes)) 0xd528e76e15ded4e1b417ea56f6c4633b7942b344 128
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src 0.0.0.0/0 dst 0.0.0.0/0
root@cilium-ipsec-native-routing-worker:/# ip x p
src 10.0.1.0/24 dst 10.0.2.0/24
dir out priority 0
# 新增 output-mark 信息,同通用的 ipsec 模式
mark 0x3e00/0xff00
tmpl src 10.0.1.138 dst 10.0.2.52
proto esp spi 0x00000003 reqid 1 mode tunnel
src 10.0.1.0/24 dst 10.0.0.0/24
dir out priority 0
mark 0x3e00/0xff00
tmpl src 10.0.1.138 dst 10.0.0.40
proto esp spi 0x00000003 reqid 1 mode tunnel
src 0.0.0.0/0 dst 10.0.1.0/24
dir fwd priority 2975
mark 0/0xf00
tmpl src 0.0.0.0 dst 10.0.1.138
proto esp reqid 1 mode tunnel
level use
src 0.0.0.0/0 dst 10.0.1.0/24
dir in priority 0
mark 0xd00/0xf00
tmpl src 0.0.0.0 dst 10.0.1.138
proto esp reqid 1 mode tunnel
src 0.0.0.0/0 dst 10.0.1.0/24
dir in priority 0
mark 0x200/0xf00
tmpl src 0.0.0.0 dst 10.0.1.138
proto esp reqid 1 mode tunnel
level use
# 源地址路由信息
root@cilium-ipsec-native-routing-worker:/# ip rule show
1: from all fwmark 0xd00/0xf00 lookup 200
1: from all fwmark 0xe00/0xf00 lookup 200
9: from all fwmark 0x200/0xf00 lookup 2004
10: from all fwmark 0xa00/0xf00 lookup 2005
100: from all lookup local
32766: from all lookup main
32767: from all lookup default
root@cilium-ipsec-native-routing-worker:/# ip rule show t 200
root@cilium-ipsec-native-routing-worker:/# ip r s t 200
10.0.0.0/24 dev cilium_host mtu 1500
local 10.0.1.0/24 dev eth0 proto 50 scope host
10.0.2.0/24 dev cilium_host mtu 1500
# 查看宿主机路由发现也存在 10.0.0.0/24 10.0.2.0/24 这2个网端路由,但是源地址路由优先级高于目的地址路由,
# 基于源地址路由信息会发现数据会送往 cilium_host 网卡
解密 esp
数据包,获取到如下图数据信息
- 内层数据包信息中,只有
ip
层信息和icmp
信息,没有mac
层信息,比较类似ipip
模式,ipip
数据包内层包也没有mac
信息,但是ipip
数据包非加密,不需要使用解密即可查看到包信息 - 外层数据包
ip
层地址未使用node
节点ip: 172.18.0.3
, 而是10.0.1.138
地址,查看node
eth0 ip
信息可以得到这个地址是3: cilium_host@cilium_net
网卡地址,不通于传统的ipsec
直接使用的node ip
作为外层数据包中的 ip 地址。- 如何把数据包,传送到
cilium_host
网卡的? - 宿主机目的地址路由表信息
10.0.0.0/24 via 172.18.0.4 dev eth0
- 宿主机源地址路由表信息
10.0.0.0/24 dev cilium_host mtu 1500
- 可以发现
源地址路由优先级高于目的地址路由
,所以数据在匹配到源地址路由后会送往cilium_host
网卡 - 那目的地址路由就没有用了?
- 其实在后续完成了 esp 数据包加密后,外层
ip
层地址源地址和目的地址未均使用node
节点eth0 ip
,而是使用cilium_host
网卡ip
,这个时候需要把数据送往目的node
,会使用到 目的地址路由表信息10.0.0.0/24 via 172.18.0.4 dev eth0
,最终在mac
中封装使用eth0
网卡的mac
地址,通过eth0
发送到对端主机
- 如何把数据包,传送到
- 抓包所有网卡数据 esp 加密包中得到了 3 个 requests 包,可以分析应该为一下网卡数据
3: cilium_host@cilium_net
: 和2: cilium_net@cilium_host
互为veth pair
网卡2: cilium_net@cilium_host
: 和3: cilium_host@cilium_net
互为veth pair
网卡eth0
: 宿主机真实的出口网卡- 网卡接口抓包信息如下:
root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i cilium_host esp
06:57:02.956575 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x5), length 120
06:57:03.957696 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x6), length 120
06:57:04.958607 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x7), length 120
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i cilium_net esp
06:57:14.973219 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x11), length 120
06:57:15.974173 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x12), length 120
06:57:16.975254 12:3d:af:49:19:f5 > 12:3d:af:49:19:f5, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x13), length 120
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
root@cilium-ipsec-native-routing-worker:/# tcpdump -pne -i eth0 esp
06:57:23.984250 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1a), length 120
06:57:23.984770 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1a), length 120
06:57:24.985496 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1b), length 120
06:57:24.986230 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1b), length 120
06:57:25.986848 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1c), length 120
06:57:25.987313 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1c), length 120
06:57:26.987852 02:42:ac:12:00:03 > 02:42:ac:12:00:04, ethertype IPv4 (0x0800), length 154: 10.0.1.138 > 10.0.0.40: ESP(spi=0x00000003,seq=0x1d), length 120
06:57:26.988388 02:42:ac:12:00:04 > 02:42:ac:12:00:03, ethertype IPv4 (0x0800), length 154: 10.0.0.40 > 10.0.1.138: ESP(spi=0x00000003,seq=0x1d), length 120
-
eth0
网卡有10.0.1.138 > 10.0.0.40
方向数据包,也有10.0.0.40 > 10.0.1.138
数据包,并且mac
层信息为eth0
网卡mac
地址,但是ip
层信息为cilium_host
网卡ip
地址 -
2: cilium_net@cilium_host
和3: cilium_host@cilium_net
esp 中,均只有10.0.1.138 > 10.0.0.40
数据包,没有回来的数据包信息,应该是会来的 esp 数据包经过 eth0 网卡后在 kernel 完成了解密后,之间送往了pod eth0
网卡对应的veth pair
网卡10: lxcda1dd0af135e@if9
-
Node
节点cilium-ipsec-native-routing-worker
的eth0
网卡抓包,并使用 wireshark 工具分析
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# tcpdump -pne -i eth0 -w /tmp/ipsec.cap
root@cilium-kubeproxy-replacement-ebpf-vxlan-worker:/# sz /tmp/ipsec.cap
- 内层数据包信息中,只有
ip
层信息和icmp
信息,没有mac
层信息,比较类似ipip
模式,ipip
数据包内层包也没有mac
信息,但是ipip
数据包非加密,不需要使用解密即可查看到包信息 - 外层数据包
ip
层地址未使用node
节点ip: 172.18.0.3
, 而是10.0.1.138
地址,查看node
ip
信息可以得到这个地址是3: cilium_host@cilium_net
网卡地址,不通于传统的ipsec
直接使用的node ip
作为外层数据包中的 ip 地址 - 外部
mac
信息中,源mac: 02:42:ac:12:00:03
为cilium-ipsec-native-routing-worker
的eth0
网卡mac
,目的mac: 02:42:ac:12:00:04
为对端Pod
宿主机cilium-ipsec-native-routing-worker2
的eth0
网卡mac
- 内层数据包信息中,只有
ip
层信息和icmp
信息
- 数据从
net
服务发出,通过查看本机路由表,送往node
节点。路由:default via 10.0.1.138 dev eth0 mtu 1423
node
节点获取到数据包后,查询路由表后发现非节点的数据包信息,然后查询到源地址路由表
后,会被送往node
节点上的cilium_host
网卡cilium_host
接口收到数据包信息后,基于源地址路由表
信息1: from all fwmark 0xe00/0xf00 lookup 200
会关联上ipsec
规则src 10.0.1.138 dst 10.0.0.40 ... mark 0x3e00/0xff00 output-mark 0xe00/0xf00
,对数据进行加密并封装ip
层信息后后发送到eth0
网卡。- 数据封装完成后,会送往
eth0
网卡,封装上mac
层信息后,并送往对端node
节点。 - 对端
node
节点接受到数据包后,发现这个是一个ipsec
数据包,将数据包内核模块处理。 - 解封装后发现内部的数据包,目的地址为
10.0.0.249
,发现是本机Pod
地址段,会直接送往目标Pod
eth0
的veth pair
网卡lxc92161d5a7c99
。 - 最终会把数据包送到目地
Pod
主机
Service
网络通讯
- 查看
Service
信息
root@kind:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 138m
serversvc NodePort 10.96.142.173 <none> 80:32000/TCP 74m
net
服务上请求Pod
所在Node
节点32000
端口
root@kind:~# kubectl exec -ti net -- curl 172.18.0.3:32000
PodName: cni-fcnh6 | PodIP: eth0 10.0.0.249/32
并在 net
服务 eth0
网卡 抓包查看
net~$tcpdump -pne -i eth0
07:22:39.257030 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 74: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [S], seq 1069156627, win 65001, options [mss 1383,sackOK,TS val 3849413747 ecr 0,nop,wscale 7], length 0
07:22:39.258581 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 74: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [S.], seq 3267497765, ack 1069156628, win 64437, options [mss 1383,sackOK,TS val 3355341573 ecr 3849413747,nop,wscale 7], length 0
07:22:39.258598 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 1, win 508, options [nop,nop,TS val 3849413748 ecr 3355341573], length 0
07:22:39.261595 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 146: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [P.], seq 1:81, ack 1, win 508, options [nop,nop,TS val 3849413751 ecr 3355341573], length 80
07:22:39.262098 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 66: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [.], ack 81, win 503, options [nop,nop,TS val 3355341577 ecr 3849413751], length 0
07:22:39.320140 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 302: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [P.], seq 1:237, ack 81, win 503, options [nop,nop,TS val 3355341635 ecr 3849413751], length 236
07:22:39.320152 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 237, win 507, options [nop,nop,TS val 3849413810 ecr 3355341635], length 0
07:22:39.321635 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 113: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [P.], seq 237:284, ack 81, win 503, options [nop,nop,TS val 3355341636 ecr 3849413810], length 47
07:22:39.321642 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 284, win 507, options [nop,nop,TS val 3849413811 ecr 3355341636], length 0
07:22:39.322130 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [F.], seq 81, ack 284, win 507, options [nop,nop,TS val 3849413812 ecr 3355341636], length 0
07:22:39.330074 4e:0d:10:48:00:05 > 6e:7b:6a:37:31:ab, ethertype IPv4 (0x0800), length 66: 172.18.0.3.32000 > 10.0.1.52.48858: Flags [F.], seq 284, ack 82, win 503, options [nop,nop,TS val 3355341645 ecr 3849413812], length 0
07:22:39.330085 6e:7b:6a:37:31:ab > 4e:0d:10:48:00:05, ethertype IPv4 (0x0800), length 66: 10.0.1.52.48858 > 172.18.0.3.32000: Flags [.], ack 285, win 507, options [nop,nop,TS val 3849413820 ecr 3355341645], length 0
抓包数据显示, net
服务使用 48858
端口和 172.18.0.3
32000
端口进行 tcp
通讯。
* `KubeProxyReplacement: Disabled`
* kube-proxy 替代功能被禁用,Cilium 没有接管 kube-proxy 的功能。Kubernetes 集群将继续使用默认的 kube-proxy 进行服务负载均衡和网络策略管理。
cilium
配置 KubeProxyReplacement: Disabled
,通过配置信息确定 cilium
没有接管 kube-proxy
的功能。那么 kube-proxy
使用 iptables
或 ipvs
进行 service
转发,此处 kind
使用 iptables
,查看 conntrack
连接跟踪和 iptables
规则验证
conntrack
信息
root@cilium-ipsec-native-routing-worker:/# conntrack -L | grep 32000
conntrack v1.4.6 (conntrack-tools): 44 flow entries have been shown.
tcp 6 118 TIME_WAIT src=10.0.1.52 dst=172.18.0.3 sport=45404 dport=32000 src=10.0.0.249 dst=172.18.0.3 sport=80 dport=50772 [ASSURED] mark=0 use=1
iptables
信息
root@cilium-ipsec-native-routing-worker:/# iptables-save | grep 32000
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-SVC-CU7F3MNN62CF4ANP
-A KUBE-SVC-CU7F3MNN62CF4ANP -p tcp -m comment --comment "default/serversvc:cni" -m tcp --dport 32000 -j KUBE-MARK-MASQ