k8s跨网段和多网卡部署时遇到的坑

部署背景

公司在杭州有1台服务器，前期已部署k8s的master节点，后续上海又新增加1台服务器，这次部署加入k8s集群中，部署为node
添加进集群中后，查看calico的pod中的node一直处于running状态，但是状态不是ready
查看pod详情报错

Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  45s   default-scheduler  Successfully assigned kube-system/calico-node-pkbkv to k8s-node2
  Normal   Started    45s   kubelet            Started container install-cni
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Started    45s   kubelet            Started container upgrade-ipam
  Normal   Pulled     45s   kubelet            Container image "docker.io/calico/cni:v3.20.0" already present on machine
  Normal   Created    45s   kubelet            Created container install-cni
  Normal   Created    45s   kubelet            Created container upgrade-ipam
  Normal   Started    44s   kubelet            Started container flexvol-driver
  Normal   Pulled     44s   kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.20.0" already present on machine
  Normal   Created    44s   kubelet            Created container flexvol-driver
  Normal   Pulled     43s   kubelet            Container image "docker.io/calico/node:v3.20.0" already present on machine
  Normal   Created    43s   kubelet            Created container calico-node
  Normal   Started    43s   kubelet            Started container calico-node
  Warning  Unhealthy  40s   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  30s   kubelet            Readiness probe failed: 2021-09-15 02:36:49.282 [INFO][417] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122

网上搜索了一番，发现是由于calico不配置ip自动检测策略时，默认为first_found，当服务器有双网卡配置双IP时，会导致使用了另外一张网卡的IP地址，导致网络不可达

IP_AUTODETECTION_METHOD 配置项默认为first-found，这种模式中calico会使用第一获取到的有效网卡，虽然会排除docker网络，localhost啥的，但是在复杂网络环境下还是有出错的可能。在这次异常中master1上的calico选择了另外一张网卡enp13s0f1，而该网卡配置的IP为内网IP。

找到原因后，重新修改calico的yaml文件，配置项中添加env参数，添加位置为：spec.template.spec.containers[0]calico-node.env 下

            - name: IP_AUTODETECTION_METHOD
              value: can-reach=www.baidu.com

修改完成yml文件后，使用命令重建calico资源即可

kubectl replace -f calico.yaml

修改完成后，发现网络正常可用

posted @ 2022-01-05 14:12 Hei蛋炒饭阅读(3091) 评论(0) 编辑收藏举报

刷新页面返回顶部