k8s跨网段和多网卡部署时遇到的坑

部署背景

公司在杭州有1台服务器,前期已部署k8s的master节点,后续上海又新增加1台服务器,这次部署加入k8s集群中,部署为node
添加进集群中后,查看calico的pod中的node一直处于running状态,但是状态不是ready
查看pod详情报错

Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  45s   default-scheduler  Successfully assigned kube-system/calico-node-pkbkv to k8s-node2
  Normal   Started    45s   kubelet            Started container install-cni
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Started    45s   kubelet            Started container upgrade-ipam
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Created    45s   kubelet            Created container install-cni
  Normal   Created    45s   kubelet            Created container upgrade-ipam
  Normal   Started    44s   kubelet            Started container flexvol-driver
  Normal   Pulled     44s   kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.20.0" already present on machine
  Normal   Created    44s   kubelet            Created container flexvol-driver
  Normal   Pulled     43s   kubelet            Container image "docker.io/calico/node:v3.20.0" already present on machine
  Normal   Created    43s   kubelet            Created container calico-node
  Normal   Started    43s   kubelet            Started container calico-node
  Warning  Unhealthy  40s   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  30s   kubelet            Readiness probe failed: 2021-09-15 02:36:49.282 [INFO][417] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122

网上搜索了一番,发现是由于calico不配置ip自动检测策略时,默认为first_found,当服务器有双网卡配置双IP时,会导致使用了另外一张网卡的IP地址,导致网络不可达

IP_AUTODETECTION_METHOD 配置项默认为first-found,这种模式中calico会使用第一获取到的有效网卡,虽然会排除docker网络,localhost啥的,但是在复杂网络环境下还是有出错的可能。在这次异常中master1上的calico选择了另外一张网卡enp13s0f1,而该网卡配置的IP为内网IP。

image
找到原因后,重新修改calico的yaml文件,配置项中添加env参数,添加位置为:spec.template.spec.containers[0]calico-node.env 下

            - name: IP_AUTODETECTION_METHOD
              value: can-reach=www.baidu.com

image
修改完成yml文件后,使用命令重建calico资源即可

kubectl replace -f calico.yaml

修改完成后,发现网络正常可用

posted @ 2022-01-05 14:12  Hei蛋炒饭  阅读(3091)  评论(0编辑  收藏  举报