Kubernetes master无法加入etcd 集群解决方法

背景:
一台master磁盘爆了导致k8s服务故障,重启之后死活kubelet起不来,于是老哥就想把它给reset掉重新join,接着出现如下报错提示是说etcd集群健康检查未通过:

error execution phase check-etcd: error syncing endpoints with etc: dial tcp 172.31.182.152:2379: connect: connection refused

 解决方法:

1.在kubeadm-config删除的状态不存在的etcd节点:

kubectl edit configmaps -n kube-system kubeadm-config

cn-hongkong.i-j6caps6av1mtyxyofmrw:
advertiseAddress: 172.31.182.152
bindPort: 6443

把上边的删掉:

 2.因为我是用kubeadm搭建的集群,所有etcd在每个master节点都会以pod的形式存在一个,etcd是在每个控制平面都启动一个实例的,当删除k8s-001节点时,etcd集群未自动删除此节点上的etcd成员,因此需要手动删除。

注意这里首先要进入etcd的pod。

kubectl exec -it etcd-cn-hongkong.i-j6caps6av1mtyxyofmrx sh -n kube-system

export ETCDCTL_API=3
alias etcdctl='etcdctl --endpoints=https://172.31.182.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d4322ce19cc3f8da, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379

#删除不存在的节点
/ # etcdctl member remove d4322ce19cc3f8da
Member d4322ce19cc3f8da removed from cluster ed812b9f85d5bcd7
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # etcdctl member list
cd4e1e075b1904b2, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # exit

 最后每次kubeadm join失败后要kubeadm reset重置节点,在kubeadm join才会成功。

 

posted @ 2019-09-11 15:09  西门运维  阅读(2361)  评论(0编辑  收藏  举报