k8s service ipvs模式下nodePort实现

部署nodePort+StatefulSet

apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - port: 80
  selector:
    app: nginx
  type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx
spec:
  podManagementPolicy: Parallel
  serviceName: nginx
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.9.1

启用iptables trace调试

iptables -t raw -A PREROUTING -p tcp --sport 20000 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 20000 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 20000 -j TRACE
iptables -t raw -A OUTPUT -p tcp --dport 20000 -j TRACE
Nov 18 09:37:20 slave kernel: TRACE: raw:PREROUTING:policy:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: mangle:PREROUTING:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-SERVICES:rule:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-NODE-PORT:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-MARK-MASQ:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-MARK-MASQ:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-NODE-PORT:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-SERVICES:return:4 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:rule:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:DOCKER:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:policy:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: mangle:INPUT:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:INPUT:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:KUBE-FIREWALL:return:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:INPUT:policy:6 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:INPUT:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=80 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: raw:OUTPUT:policy:5 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: mangle:OUTPUT:policy:1 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:OUTPUT:rule:1 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:KUBE-FIREWALL:return:3 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:OUTPUT:policy:3 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: mangle:POSTROUTING:policy:2 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0

转发路径分析

容器网络走flannel vxlan。

通过nodeport访问,接收流量的首个k8s节点,snat是为了避免后端pod不在该节点而回不来。
主机访问cluster ip时,源ip是容器网关ip,不需要snat;容器访问cluster ip时,源ip是pod ip,也不需要snat。

prerouting链
nat表KUBE-NODE-PORT-TCP链ipset有NodePort 31531,打上0x4000标记,标记目的是进入ipvs dnat和iptables snat。
之所以要snat,是因为确保流量回程,先回到开始收到nodePort请求节点。

ipvs工作在input链,匹配目的IP+目的端口,完成负载均衡转发。

posted on 2023-11-11 13:09  王景迁  阅读(215)  评论(0编辑  收藏  举报

导航