k8s service ipvs模式下nodePort实现
部署nodePort+StatefulSet
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
ports:
- port: 80
selector:
app: nginx
type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx
spec:
podManagementPolicy: Parallel
serviceName: nginx
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.9.1
启用iptables trace调试
iptables -t raw -A PREROUTING -p tcp --sport 20000 -j TRACE
iptables -t raw -A PREROUTING -p tcp --dport 20000 -j TRACE
iptables -t raw -A OUTPUT -p tcp --sport 20000 -j TRACE
iptables -t raw -A OUTPUT -p tcp --dport 20000 -j TRACE
Nov 18 09:37:20 slave kernel: TRACE: raw:PREROUTING:policy:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: mangle:PREROUTING:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-SERVICES:rule:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-NODE-PORT:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-MARK-MASQ:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307)
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-MARK-MASQ:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-NODE-PORT:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:KUBE-SERVICES:return:4 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:rule:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:DOCKER:return:2 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:PREROUTING:policy:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: mangle:INPUT:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:INPUT:rule:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:KUBE-FIREWALL:return:3 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: filter:INPUT:policy:6 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=31531 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: nat:INPUT:policy:1 IN=ens33 OUT= MAC=00:0c:29:d0:ff:9b:00:0c:29:9e:20:65:08:00 SRC=192.168.0.105 DST=192.168.0.104 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35247 DF PROTO=TCP SPT=20000 DPT=80 SEQ=2222333540 ACK=0 WINDOW=29200 RES=0x00 SYN URGP=0 OPT (020405B40402080A00049BB40000000001030307) MARK=0x4000
Nov 18 09:37:20 slave kernel: TRACE: raw:OUTPUT:policy:5 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: mangle:OUTPUT:policy:1 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=80 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:OUTPUT:rule:1 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:KUBE-FIREWALL:return:3 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: filter:OUTPUT:policy:3 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
Nov 18 09:37:20 slave kernel: TRACE: mangle:POSTROUTING:policy:2 IN= OUT=ens33 SRC=192.168.0.104 DST=192.168.0.105 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=31531 DPT=20000 SEQ=154367509 ACK=2222333541 WINDOW=28960 RES=0x00 ACK SYN URGP=0 OPT (020405B40402080A0005604F00049BB401030307) UID=0 GID=0
转发路径分析
容器网络走flannel vxlan。
通过nodeport访问,接收流量的首个k8s节点,snat是为了避免后端pod不在该节点而回不来。
主机访问cluster ip时,源ip是容器网关ip,不需要snat;容器访问cluster ip时,源ip是pod ip,也不需要snat。
prerouting链
nat表KUBE-NODE-PORT-TCP链ipset有NodePort 31531,打上0x4000标记,标记目的是进入ipvs dnat和iptables snat。
之所以要snat,是因为确保流量回程,先回到开始收到nodePort请求节点。
ipvs工作在input链,匹配目的IP+目的端口,完成负载均衡转发。