Kubernates - Control 节点虚机磁盘空间用完,导致node处于Notready状态,解决方案!

  进入虚机的k8s集群,突然发现node节点的状态不对啊,怎么controlnode1节点怎么处于NotReady的状态啊

[root@controlnode1 cloud-user]# kubectl get nodes
NAME           STATUS     ROLES    AGE   VERSION
controlnode1   NotReady   <none>   11d   v1.16.4
controlnode2   Ready      <none>   11d   v1.16.4
controlnode3   Ready      <none>   11d   v1.16.4
worknode1      Ready      <none>   11d   v1.16.4
worknode2      Ready      <none>   11d   v1.16.4

   使用命令在controlnode1上执行查询,也发现了大量的错误,发现时这台服务器的磁盘空间满了(平时习惯在contrilnode1上上传安装包已经拉取docker镜像,不知不觉就把内存用光了。。。尴尬😅)

systemctl status kubelet    ##查询vm的kubelet.service状态
journalctl -xeu kubelet     
journalctl -f -u kubelet    ##滚动输出kubelet.service的日志信息

   删除了一部分比较大的docker镜像(3个controlnode,其他两台其实都是有备份的)以及比较大的安装包后,已经空出来很多空间了,但是kubelet还是报错,难道不能自动恢复??

kubectl drain controlnode1                   ##手动移除node
kubectl uncordon controlnode1                ##手动接入ndoe
systemctl restart kubelet                    ##VM故障后可以尝试重启kubelet来恢复该node的集群状态

  手动尝试加入controlnode1到k8s集群,发现没啥反应。。。只好重新再移除一次这个节点,使节点处于 NotReady,SchedulingDisabled状态;

  然后再重启kubelet.service;使用命令查看kubelet已经正常启动后journalctl -f -u kubelet 


[root@controlnode1 cloud-user]# kubectl uncordon controlnode1
node/controlnode1 already uncordoned
[root@controlnode1 cloud-user]# kubectl get nodes
NAME           STATUS     ROLES    AGE   VERSION
controlnode1   NotReady   <none>   11d   v1.16.4
controlnode2   Ready      <none>   11d   v1.16.4
controlnode3   Ready      <none>   11d   v1.16.4
worknode1      Ready      <none>   11d   v1.16.4
worknode2      Ready      <none>   11d   v1.16.4
[root@controlnode1 cloud-user]# kubectl drain controlnode1
node/controlnode1 cordoned
node/controlnode1 drained
[root@controlnode1 cloud-user]# kubectl get nodes
NAME           STATUS                        ROLES    AGE   VERSION
controlnode1   NotReady,SchedulingDisabled   <none>   11d   v1.16.4
controlnode2   Ready                         <none>   11d   v1.16.4
controlnode3   Ready                         <none>   11d   v1.16.4
worknode1      Ready                         <none>   11d   v1.16.4
worknode2      Ready                         <none>   11d   v1.16.4
[root@controlnode1 cloud-user]#
[root@controlnode1 cloud-user]#
[root@controlnode1 cloud-user]# journalctl -f -u kubelet
-- Logs begin at Mon 2021-01-11 13:47:04 UTC. --

  再次手动将controlnode1节点加入集群,这时才发现状态已经从 NotReady,SchedulingDisabled 变为 Ready 啦 

[root@controlnode1 cloud-user]# kubectl get nodes
NAME           STATUS     ROLES    AGE   VERSION
controlnode1   Ready      <none>   11d   v1.16.4
controlnode2   Ready      <none>   11d   v1.16.4
controlnode3   Ready      <none>   11d   v1.16.4
worknode1      Ready      <none>   11d   v1.16.4
worknode2      Ready      <none>   11d   v1.16.4

 

posted @   zhangdaopin  阅读(157)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
点击右上角即可分享
微信分享提示