systemctl status kubelet
启动失败
查看日志
journalctl -u kubelet --no-pager
Aug 14 14:01:33 K8S-2 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Aug 14 14:01:33 K8S-2 systemd[1]: Unit kubelet.service entered failed state.
Aug 14 14:01:33 K8S-2 systemd[1]: kubelet.service failed.
Aug 14 14:01:43 K8S-2 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Aug 14 14:01:43 K8S-2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Aug 14 14:01:43 K8S-2 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Aug 14 14:01:43 K8S-2 kubelet[2777]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Aug 14 14:01:43 K8S-2 kubelet[2777]: E0814 14:01:43.915300 2777 run.go:74] "command failed" err="failed to parse kubelet flag: unknown flag: --network-plugin"
journalctl
的 -u
参数可以指定服务进行过滤,这样可以屏蔽掉其他无关日志。 --no-pager
参数可以一次性输出日志,当然如果你只是在线查看,则可以不用这个参数,只是输出日志受到屏幕宽度限制,需要通过方向键滚动。
问题分析:
根据日志可以初步判断,应该是cni网络模块除了问题,kebelet重启后,启动网络插件的命令无法执行
执行yum 升级后kube相关的插件版本和master不一致问题导致的,即master节点的kubeadm、kubelet、kubernetes-cni、kubectl和node节点的版本不一致,尝试将node节点的kubeadm、kubelet、kubernetes-cni、kubectl版本降级至与master节点一致
rpm -qa | grep kube
解决:
讲node节点的kube相关插件降级
yum downgrade kubernetes-cni-0.8.6-0.x86_64 kubeadm-1.18.6-0.x86_64 kubectl-1.18.6-0.x86_64 kubelet-1.18.6-0.x86_64 -y systemctl daemon-reload systemctl restart kubelet.service
问题二:
[root@K8S-2 docker.service.d]# journalctl -u kubelet -f -- Logs begin at Mon 2023-08-14 16:16:24 CST. -- Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677717 6298 container_manager_linux.go:271] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ /kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessTheriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagey:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677819 6298 topology_manager.go:126] [topologymanager] Creating topology manager with none policy Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677838 6298 container_manager_linux.go:301] [topologymanager] Initializing Topology Manager with none policy Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677843 6298 container_manager_linux.go:306] Creating device plugin manager: true Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677905 6298 client.go:75] Connecting to docker on unix:///var/run/docker.sock Aug 14 17:02:17 K8S-2 kubelet[6298]: I0814 17:02:17.677914 6298 client.go:92] Start docker client with request timeout=2m0s Aug 14 17:02:17 K8S-2 kubelet[6298]: F0814 17:02:17.678142 6298 server.go:274] failed to run Kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Aug 14 17:02:17 K8S-2 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a Aug 14 17:02:17 K8S-2 systemd[1]: Unit kubelet.service entered failed state. Aug 14 17:02:17 K8S-2 systemd[1]: kubelet.service failed.
查看 docker升级到version 24.0.5版本后无法启动
解决:
docker-ce重新安装,安装为集群初始的docker-ce-19.03.4
卸载
[root@K8S-2 docker.service.d]# yum remove docker*
重新安装
yum install -y containerd.io-1.2.10 \ docker-ce-19.03.4 \ docker-ce-cli-19.03.4
参考:
https://zhuanlan.zhihu.com/p/620392664
https://blog.csdn.net/qq_34556414/article/details/124187273
https://cloud-atlas.readthedocs.io/zh_CN/latest/kubernetes/debug/kubelet_start_fail.html