k8s跨网段和多网卡部署时遇到的坑
部署背景
公司在杭州有1台服务器,前期已部署k8s的master节点,后续上海又新增加1台服务器,这次部署加入k8s集群中,部署为node
添加进集群中后,查看calico的pod中的node一直处于running状态,但是状态不是ready
查看pod详情报错
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 45s default-scheduler Successfully assigned kube-system/calico-node-pkbkv to k8s-node2
Normal Started 45s kubelet Started container install-cni
Normal Pulled 45s kubelet Container image "docker.io/calico/cni:v3.20.0" already present on machine
Normal Started 45s kubelet Started container upgrade-ipam
Normal Pulled 45s kubelet Container image "docker.io/calico/cni:v3.20.0" already present on machine
Normal Created 45s kubelet Created container install-cni
Normal Created 45s kubelet Created container upgrade-ipam
Normal Started 44s kubelet Started container flexvol-driver
Normal Pulled 44s kubelet Container image "docker.io/calico/pod2daemon-flexvol:v3.20.0" already present on machine
Normal Created 44s kubelet Created container flexvol-driver
Normal Pulled 43s kubelet Container image "docker.io/calico/node:v3.20.0" already present on machine
Normal Created 43s kubelet Created container calico-node
Normal Started 43s kubelet Started container calico-node
Warning Unhealthy 40s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 30s kubelet Readiness probe failed: 2021-09-15 02:36:49.282 [INFO][417] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122
网上搜索了一番,发现是由于calico不配置ip自动检测策略时,默认为first_found,当服务器有双网卡配置双IP时,会导致使用了另外一张网卡的IP地址,导致网络不可达
IP_AUTODETECTION_METHOD 配置项默认为first-found,这种模式中calico会使用第一获取到的有效网卡,虽然会排除docker网络,localhost啥的,但是在复杂网络环境下还是有出错的可能。在这次异常中master1上的calico选择了另外一张网卡enp13s0f1,而该网卡配置的IP为内网IP。
找到原因后,重新修改calico的yaml文件,配置项中添加env参数,添加位置为:spec.template.spec.containers[0]calico-node.env 下
- name: IP_AUTODETECTION_METHOD
value: can-reach=www.baidu.com
修改完成yml文件后,使用命令重建calico资源即可
kubectl replace -f calico.yaml
修改完成后,发现网络正常可用