Flannel网络组件

34.Flannel网络组件

GitHub - flannel-io/flannel:flannel 是一种用于容器的网络结构,专为 Kubernetes 设计

  • 概述
有CoreOS开源的针对k8s的网络服务,其目的为解决k8s集群中各主机上的pod相互通信的问题,其借助与etcd维护网络IP地址分配,并为每一个node服务器分配一个不同的IP地址段

流量从eth0走出,是UDP协议,端口是8472

34.1 flannel网络模式

Flannel 网络模式(后端模型)Flannel目前有三种实现方式 UDP/VXLAN/HOST-GW

UDP:早期版本的Flannel使用UDP封装完成报文的跨主机转发,其安全性及性能略有不足

#主流VXLAN
VXLAN:Linux内核在2012年底的v3.7.0之后加入了VXLAN协议支持,因此新版本的Flannel也有UDP转换为VXLAN,VXLAN本质上是一种tunnel隧道协议,用来基于3层网络实现虚拟的2层网络,目前flannel的网络模型已经是基于VXLAN的叠加覆盖网络,目前推荐使用vxlan作为其网络模型

Host-gw:也就是Host GateWay通过在node节点上创建到达各目标容器地址的路由表而完成报文的转发,因此这种方式要求各node节点本身必须处于同一个局域网二层网络中,因此不适用于网络变动频繁或比较大型的网络环境,但是其性能较好。
  • Flannel 组件解释
Cni0:网桥设备,每创建一个pod都会创建一对veth pair,其中一端是pod中的eth0,另一端是Cni0网桥中的端口网卡,Pod中从网卡eth0发出的流量都会发送到Cni0网桥设备的端口网卡上,Cni0设备获得的IP地址是该节点分配到的网段的第一个地址
Flannel.1:overlay网络的设备,用来进行vxlan报文的处理(封包和解包),不同node之间的pod数据流量都从overlay设备以隧道的形式发送到对端

34.2 flannel实验

  • 网卡信息
[root@master1 ~]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:32:6d:f0:44  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.50  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fe80::173a:af71:a4cd:ea46  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:98:fe:21  txqueuelen 1000  (Ethernet)
        RX packets 482096  bytes 550753313 (525.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 175661  bytes 15344201 (14.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:0c:29:98:fe:2b  txqueuelen 1000  (Ethernet)
        RX packets 184  bytes 56160 (54.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 247  bytes 40550 (39.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::8b6:74ff:fea9:8bd5  prefixlen 64  scopeid 0x20<link>
        ether 0a:b6:74:a9:8b:d5  txqueuelen 0  (Ethernet)
        RX packets 9  bytes 1971 (1.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11  bytes 1350 (1.3 KiB)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 161328  bytes 41041153 (39.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 161328  bytes 41041153 (39.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

  • flannel 端口信息
[root@master1 ~]# ss -lntup | grep 8472
udp    UNCONN     0      0         *:8472                  *:*  
  • 路由表
[root@master1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.2        0.0.0.0         UG    100    0        0 eth0
10.0.0.0        0.0.0.0         255.255.255.0   U     100    0        0 eth0
10.244.1.0      10.244.1.0      255.255.255.0   UG    0      0        0 flannel.1
10.244.2.0      10.244.2.0      255.255.255.0   UG    0      0        0 flannel.1
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
  • 部署一个简单服务演示
[root@master1 ~]# cat nginx-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

[root@master1 ~]# cat nginx-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  labels:
    app: nginx
spec:
  ports:
  - port: 88
    targetPort: 80
  selector:
    app: nginx
  type: NodePort
  • 抓包
[root@master1 ~]# tcpdump udp -i eth0 port 8472
#浏览器访问这个端口
[root@master1 ~]# kubectl get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes      ClusterIP   10.254.0.1       <none>        443/TCP        14m
nginx-service   NodePort    10.254.112.145   <none>        88:32095/TCP   3m52s

  • flannel子网信息
[root@master1 ~]# find / -name flannel
/run/flannel
/var/lib/docker/overlay2/37cdd3a3925063f7467587ceff307c24851cf8e2f6f477b0713d7e27d903ab7c/diff/flannel
/var/lib/docker/overlay2/b6fedea8403fe56e2d8fe9806cb0bf56fbe1f2f07b989d010fe0de8edf1b0865/diff/run/flannel
/var/lib/docker/overlay2/b6fedea8403fe56e2d8fe9806cb0bf56fbe1f2f07b989d010fe0de8edf1b0865/merged/run/flannel
/opt/cni/bin/flannel
[root@master1 ~]# cat /run/flannel/subnet.env 
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

34.3 flannel vxlan配置

[root@master1 ~]# kubectl edit configmap kube-flannel-cfg  -n kube-system
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"cni-conf.json":"{\n  \"name\": \"cbr0\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n","net-conf.json":"{\n  \"Network\": \"10.244.0.0/16\",\n  \"Backend\": {\n    \"Type\": \"vxlan\"\n  }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-system"}}
  creationTimestamp: "2022-12-31T07:41:36Z"
  labels:
    app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-system
  resourceVersion: "663"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kube-flannel-cfg
  uid: d31a75e8-dc0e-45a6-8c46-df5778344cb5

34.5 flannel vxlan架构图

1.首先node1的一个pod想去访问node2的pod,在容器都会有一个eth0的网卡,在宿主机都会有一个veth的网卡,他们都指向cni0网桥在这里进行报文的封装,然后再通过flannel网卡overlay隧道网络到宿主机的eth0,一个UDP的端口8473出去,下一跳是目的主机的宿主机网卡

2.到了对端的eth0网卡进行解包查看IP地址是去往哪个pod的容器,在通过flannel网卡cni0网桥,进入对应的veth到达容器

pod跨节点通信

通信流程:

  1. pod-a访问pod-b 因为两者IP不在同一个子网,首先数据会先到默认网关也就是cni010.64.0.1
  2. 节点上面的静态路由可以看出来10.64.1.0/24 是指向flannel.1的
  3. flannel.1 会使用vxlan协议把原始IP包加上目的mac地址封装成二层数据帧
  4. 原始vxlan数据帧无法在物理二层网络中通信,flannel.1 (linux 内核支持vxlan,此步骤由内核完成)会把数据帧封装成UDP报文经过物理网络发送到node-02
  5. node-02收到UDP报文中带有vxlan头信息,会转交给flannel.1解封装得到原始数据
  6. flannel.1 根据直连路由转发到cni0 上面
  7. cni0 转发给pod-2

34.6 flannel通信流程

1.源pod发起请求,此时报文中源IP为pod的eth0的IP,源mac为pod的eth0的mac,目的pod为目的pod的IP,目的的mac为网关(cni0)的MAC

tcpdump -nn -vvv -i veth91d6855 -nn -vvv ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53

#可以保存成一个pcap文件在使用客户端来分析
tcpdump -nn -vvv -i vethe2908f0c -nn -vvv ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53 -w aa.pcap

2.数据报文通过veth peer发送给网关cni0,检查目的mac就是发给自己的,cni0进行目标IP检查,如果是同一个网桥的报文就直接转发,不是的话就发送给flannel.1此时报文被修改如下

源IP:Pod IP 
目的IP:Pod IP
源MAC: 源Pod MAC
目的MAC: cni MAC

tcpdump -nn -vvv -i cni0 -nn -vvv ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! pot 53

3.到达flannel.1检查目的mac就是发给自己的,开始匹配路由表,先实现overlay报文的内层封装(主要是修改目的的Pod的对端flannel.1的MAC、源MAC为当前宿主机flannel.1的MAC)

#会生成一个永久的链路
[root@master1 ~]# bridge fdb show dev flannel.1
0e:91:58:e5:59:c0 dst 10.0.0.52 self permanent
2a:d7:30:a8:8c:27 dst 10.0.0.51 self permanent
  • 实验 node2 ping node1 容器
[root@master1 ~]# kubectl get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE    IP           NODE    NOMINATED NODE   READINESS GATES
nginx-deployment-585449566-drqvg   1/1     Running   0          113m   10.244.2.4   node2   <none>           <none>
nginx-deployment-585449566-j99kt   1/1     Running   0          113m   10.244.2.3   node2   <none>           <none>
nginx-deployment-585449566-r966g   1/1     Running   0          113m   10.244.1.3   node1   <none>           <none>
[root@master1 ~]# kubectl exec -it nginx-deployment-585449566-drqvg bash
root@nginx-deployment-585449566-drqvg:/# apt install ethtool
#查看网卡是9编号
root@nginx-deployment-585449566-drqvg:/# ethtool -S eth0
NIC statistics:
     peer_ifindex: 9
#连接的容器是node2所以在node2宿主机上查看这个网卡 是有这个编号9的
[root@node2 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:bd:8e:52 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:bd:8e:5c brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:79:c9:c4:0d brd ff:ff:ff:ff:ff:ff
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/ether 0e:91:58:e5:59:c0 brd ff:ff:ff:ff:ff:ff
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 16:d1:46:3a:de:ee brd ff:ff:ff:ff:ff:ff
7: vetha18db6f3@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default 
    link/ether 4e:fb:54:9c:0c:43 brd ff:ff:ff:ff:ff:ff link-netnsid 0
8: vethe2c60553@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default 
    link/ether 12:e3:9e:78:38:ef brd ff:ff:ff:ff:ff:ff link-netnsid 1
9: vethe2908f0c@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP mode DEFAULT group default 
    link/ether 0a:e9:7b:e1:f6:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 2

#凡是在这个网卡出去的包就可以看到了
tcpdump -nn -vvv -i vethe2908f0c -nn -vvv ! port 22 and ! port 2379 and ! port 6443 and ! port 10250 and ! arp and ! port 53

然后node2容器里ping百度域名查看是否可以抓取到

root@nginx-deployment-585449566-drqvg:/# ping baidu.com

node2抓取的

posted @ 2023-01-02 11:57  YIDADA-SRE  阅读(170)  评论(0编辑  收藏  举报