docker/k8s常见错误处理
启动docker失败,报错了
启动docker失败,报错了。Failed to load environment files: No such file or directory
[root@mcwk8s05 ~]# systemctl start docker Job for docker.service failed because a configured resource limit was exceeded. See "systemctl status docker.service" and "journalctl -xe" for details. [root@mcwk8s05 ~]# journalctl -xe -- Subject: Unit docker.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit docker.service has failed. -- -- The result is failed. ..... -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit docker.service has begun starting up. Apr 18 00:33:44 mcwk8s05 kube-proxy[1006]: I0418 00:33:44.786333 1006 reflector.go:160] Listing and watching *v1.Endpoints from k8s.io/client-go/informers/factory.go:133 Apr 18 00:33:44 mcwk8s05 kube-proxy[1006]: I0418 00:33:44.788405 1006 reflector.go:160] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:133 Apr 18 00:33:46 mcwk8s05 kube-proxy[1006]: I0418 00:33:46.143912 1006 proxier.go:748] Not syncing ipvs rules until Services and Endpoints have been received from master Apr 18 00:33:46 mcwk8s05 kube-proxy[1006]: I0418 00:33:46.144004 1006 proxier.go:744] syncProxyRules took 185.651µs Apr 18 00:33:46 mcwk8s05 kube-proxy[1006]: I0418 00:33:46.144024 1006 bounded_frequency_runner.go:221] sync-runner: ran, next possible in 0s, periodic in 30s Apr 18 00:33:46 mcwk8s05 systemd[1]: docker.service holdoff time over, scheduling restart. Apr 18 00:33:46 mcwk8s05 systemd[1]: Failed to load environment files: No such file or directory
查看这个环境文件
[root@mcwk8s05 ~]# cat /usr/lib/systemd/system/docker.service [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target docker.socket firewalld.service containerd.service Wants=network-online.target Requires=docker.socket containerd.service [Service] Type=notify # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker EnvironmentFile=/run/flannel/subnet.env
发现这个文件是flannel运行时的临时文件。flannel没有启动。那么先启动flannel
[root@mcwk8s05 ~]# ls /run/ abrt console crond.pid dbus faillock lock mount NetworkManager sepermit sshd.pid svnserve systemd tuned user vmware auditd.pid containerd cron.reboot docker.sock initramfs log netreport plymouth setrans sudo syslogd.pid tmpfiles.d udev utmp [root@mcwk8s05 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:dd brd ff:ff:ff:ff:ff:ff inet 10.0.0.35/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::3a1f:8b4:d1f1:9759/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:e7 brd ff:ff:ff:ff:ff:ff [root@mcwk8s05 ~]#
启动网络,然后启动容器,正常启动
[root@mcwk8s05 ~]# systemctl start flanneld.service [root@mcwk8s05 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:dd brd ff:ff:ff:ff:ff:ff inet 10.0.0.35/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::3a1f:8b4:d1f1:9759/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:e7 brd ff:ff:ff:ff:ff:ff 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN link/ether 4e:bb:c2:5c:bf:37 brd ff:ff:ff:ff:ff:ff inet 172.17.98.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::4cbb:c2ff:fe5c:bf37/64 scope link valid_lft forever preferred_lft forever [root@mcwk8s05 ~]# systemctl start docker [root@mcwk8s05 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:dd brd ff:ff:ff:ff:ff:ff inet 10.0.0.35/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::3a1f:8b4:d1f1:9759/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:25:ef:e7 brd ff:ff:ff:ff:ff:ff 4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN link/ether 4e:bb:c2:5c:bf:37 brd ff:ff:ff:ff:ff:ff inet 172.17.98.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::4cbb:c2ff:fe5c:bf37/64 scope link valid_lft forever preferred_lft forever 5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:f6:d4:62:1b brd ff:ff:ff:ff:ff:ff inet 172.17.98.1/24 brd 172.17.98.255 scope global docker0 valid_lft forever preferred_lft forever [root@mcwk8s05 ~]#
一次k8s的node 是not ready的排查
检查状态没有准备好
[root@mcwk8s03 ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-1 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} etcd-0 Healthy {"health":"true"} [root@mcwk8s03 ~]# [root@mcwk8s03 ~]# [root@mcwk8s03 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s05 NotReady <none> 166d v1.15.12 mcwk8s06 NotReady <none> 166d v1.15.12
关闭防火墙
systemctl stop firewalld.service
node 上kubelet没有启动
[root@mcwk8s05 ~]# systemctl status kubelet.service
node上查看错误信息,查看到访问的是nginx负载均衡器的vip。
[root@mcwk8s05 ~]# tail -100f /var/log/messages Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.110814 2985 reflector.go:160] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/kubelet/kubelet.go:454 Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118520 2985 setters.go:753] Error getting volume limit for plugin kubernetes.io/azure-disk Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118562 2985 setters.go:753] Error getting volume limit for plugin kubernetes.io/gce-pd Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118568 2985 setters.go:753] Error getting volume limit for plugin kubernetes.io/cinder Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118573 2985 setters.go:753] Error getting volume limit for plugin kubernetes.io/aws-ebs Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118591 2985 kubelet_node_status.go:471] Recording NodeHasSufficientMemory event message for node mcwk8s05 Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118605 2985 kubelet_node_status.go:471] Recording NodeHasNoDiskPressure event message for node mcwk8s05 Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118628 2985 kubelet_node_status.go:471] Recording NodeHasSufficientPID event message for node mcwk8s05 Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118644 2985 kubelet_node_status.go:72] Attempting to register node mcwk8s05 Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118645 2985 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"mcwk8s05", UID:"mcwk8s05", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasSufficientMemory' Node mcwk8s05 status is now: NodeHasSufficientMemory Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118671 2985 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"mcwk8s05", UID:"mcwk8s05", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasNoDiskPressure' Node mcwk8s05 status is now: NodeHasNoDiskPressure Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.118701 2985 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"mcwk8s05", UID:"mcwk8s05", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasSufficientPID' Node mcwk8s05 status is now: NodeHasSufficientPID Apr 18 01:14:35 mcwk8s05 kubelet: I0418 01:14:35.129924 2985 kubelet.go:1973] SyncLoop (housekeeping, skipped): sources aren't ready yet. Apr 18 01:14:35 mcwk8s05 kubelet: E0418 01:14:35.194840 2985 kubelet.go:2252] node "mcwk8s05" not found Apr 18 01:14:35 mcwk8s05 kubelet: E0418 01:14:35.295918 2985 kubelet.go:2252] node "mcwk8s05" not found Apr 18 01:14:37 mcwk8s05 kubelet: E0418 01:14:37.012374 2985 kubelet.go:2252] node "mcwk8s05" not found Apr 18 01:14:37 mcwk8s05 kube-proxy: E0418 01:14:37.109904 1006 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Get https://10.0.0.30:6443/api/v1/services?labelSelector=%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: dial tcp 10.0.0.30:6443: connect: no route to host Apr 18 01:14:37 mcwk8s05 kube-proxy: E0418 01:14:37.109992 1006 reflector.go:125] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Get https://10.0.0.30:6443/api/v1/endpoints?labelSelector=%21service.kubernetes.io%2Fservice-proxy-name&limit=500&resourceVersion=0: dial tcp 10.0.0.30:6443: connect: no route to host Apr 18 01:14:37 mcwk8s05 kubelet: E0418 01:14:37.110082 2985 kubelet_node_status.go:94] Unable to register node "mcwk8s05" with API server: Post https://10.0.0.30:6443/api/v1/nodes: dial tcp 10.0.0.30:6443: connect: no route to host Apr 18 01:14:37 mcwk8s05 kubelet: E0418 01:14:37.110127 2985 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:454: Failed to list *v1.Node: Get https://10.0.0.30:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmcwk8s05&limit=500&resourceVersion=0: dial tcp 10.0.0.30:6443: connect: no route to host
在两个nginx服务器上启动nginx进程。启动高可用
[root@mcwk8s01 ~]# ps -ef|grep nginx root 1575 1416 0 01:17 pts/0 00:00:00 grep --color=auto nginx [root@mcwk8s01 ~]# nginx [root@mcwk8s01 ~]# systemctl start keepalived.service [root@mcwk8s01 ~]#
然后查看node,已经成为准备状态,可以正常使用了
[root@mcwk8s03 ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s05 Ready <none> 166d v1.15.12 mcwk8s06 Ready <none> 166d v1.15.12 [root@mcwk8s03 ~]#