k8s集群无法启动

K8S集群启动失败

一、问题现象

问题的起因:来源于大年初二的停电
上班后第一天:开始重启所有服务,就发现了k8s集群无法启动了。。
[root@test ~]# kubectl get nodes
The connection to the server 10.0.7.16:6443 was refused - did you specify the right host or port?

二、解决思路


###查看kubelet的状态
[root@test ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 一 2024-02-12 08:47:31 CST; 5 days ago
     Docs: https://kubernetes.io/docs/
 Main PID: 980 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─980 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/con...

###查看日志
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.760521     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.861049     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:30 test kubelet[980]: E0217 11:40:30.961809     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.062716     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.163402     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.264104     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.364707     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.465786     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.566598     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
2月 17 11:40:31 test kubelet[980]: E0217 11:40:31.667122     980 kubelet.go:2291] "Error getting node" err="node \"test\" not found"
###查看防火墙
[root@test ~]# systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

###最后在日志里发现下面这段的错误,提示证书过期了。。。。由于这套集群是之前同事搭建的,未做记录,故不知。。又碰巧遇到了这次停电
导致集群启动失败。
W0217 05:51:33.036279       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379   0 }. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2024-02-17T05:51:33Z is after 2024-01-12T09:25:13Z". Reconnecting...

###最后延长证书期限,并记录运维文档
https://blog.csdn.net/gotheon/article/details/133700695

本文作者:缘之世界

本文链接:https://www.cnblogs.com/world-of-yuan/p/18027174

版权声明:本作品采用知识共享署名-非商业性使用-禁止演绎 2.5 中国大陆许可协议进行许可。

posted @ 2024-02-22 13:52  wh459086748  阅读(233)  评论(0编辑  收藏  举报
💬
评论
📌
收藏
💗
关注
👍
推荐
🚀
回顶
收起