虚拟机上k8s部署好的第二天用时总是出现的各种问题
open /run/flannel/subnet.env: no such file or directory
open /run/flannel/subnet.env: no such file or directory Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9a5eade3c13f1eeeb000df80e942ed22e59d2c532def6f1f281fd2ebefdcfa2c" network for pod "mcw01dep-nginx1-69bc6f5957-lzpdv": networkPlugin cni failed to set up pod "mcw01dep-nginx1-69bc6f5957-lzpdv_default" network: open /run/flannel/subnet.env: no such file or directory scp /run/flannel/subnet.env 结点 scp /run/flannel/subnet.env 10.0.0.6:/run/flannel/
问题一:查看网络状态报错:RTNETLINK answers: File exists错误解决方法
CentOS7 Failed to start LSB: Bring up/down networking
RTNETLINK answers: File exists错误解决方法
https://blog.csdn.net/u010719917/article/details/79423180
chkconfig --level 35 network on
chkconfig --level 0123456 NetworkManager off
service NetworkManager stop
service network stop
service network start
如果还不行,重启系统看看
service network start 出现RTNETLINKanswers:Fileexists错误解决 或者
/etc/init.d/network start 出现RTNETLINKanswers:Fileexists错误解决
(其实两者是等效的,其实前者执行的就是这个命令)
在centos下出现该故障的原因是启动网络的两个服务有冲突:
/etc/init.d/network 和
/etc/init.d/NetworkManager 这两个服务有冲突吧。
从根本上说是NetworkMaganager(NM)的带来的冲突,停用NetworkManager即可解决,重启即可。
1.切换到root账户,并用chkconfig命令查看network 和NetworkManager两个服务的开机启动配置情况;
=====
只是执行如下三个命令就成功了
service NetworkManager stop
service network stop
service network start
问题二:ping外网时,Destination Host Unreachable。from 内网ip
排查过程
[root@mcw7 ~]$ ping www.baidu.com PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data. From bogon (172.16.1.137) icmp_seq=1 Destination Host Unreachable 查看能通外网的路由表 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 查看不通外网的路由表,发现缺少了一条关于10.0.0.2的路由, 应该加一条如上的路由试试0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 加错路由了,删除 route add -host 10.0.0.137 gw 10.0.0.2 [root@mcw7 ~]$ route add -host 10.0.0.137 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 10.0.0.137 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 删除路由 -host后面的ip,在路由的第一列,目标地址。我这里应该填0.0.0.0。目标地址是任意的,指定gw是10.0.0.2 [root@mcw7 ~]$ route del -host 10.0.0.137 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 [root@mcw7 ~]$ -host是指去往的目的主机,这里子网掩码应该设置为0.0.0.0,需要手动删除重建。旗帜貌似多了H,不知道干嘛的 [root@mcw7 ~]$ route add -host 0.0.0.0 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 删除指定目的主机,指定网卡接口 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 看提示信息,指定掩码用netmask,genmask和mask和是255的那种,取补集 [root@mcw7 ~]$ route add -host 0.0.0.0 MASK 0.0.0.0 gw 10.0.0.2 Usage: inet_route [-vF] del {-host|-net} Target[/prefix] [gw Gw] [metric M] [[dev] If] inet_route [-vF] add {-host|-net} Target[/prefix] [gw Gw] [metric M] [netmask N] [mss Mss] [window W] [irtt I] [mod] [dyn] [reinstate] [[dev] If] inet_route [-vF] add {-host|-net} Target[/prefix] [metric M] reject inet_route [-FC] flush NOT supported [root@mcw7 ~]$ [root@mcw7 ~]$ [root@mcw7 ~]$ route add -host 0.0.0.0 netmask 0.0.0.0 gw 10.0.0.2 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 255.255.255.255 UGH 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 [root@mcw7 ~]$ 再次删除 route del 指定目的主机,指定接口 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37
真正解决前的描述
删除默认网关 [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens33 SIOCDELRT: No such process [root@mcw7 ~]$ [root@mcw7 ~]$ route del -host 0.0.0.0 dev ens37 SIOCDELRT: No such process [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$ route del -host 0.0.0.0 SIOCDELRT: No such process [root@mcw7 ~]$ route del default [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 100 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$ route del default [root@mcw7 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 [root@mcw7 ~]$
真正的解决这个问题
参考:https://www.cnblogs.com/skgoo/p/13559964.html
mcw8上ping不能到外网,显示包来自服务器内网ip。 [root@mcw8 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.14) 56(84) bytes of data. From mcw8 (172.16.1.138) icmp_seq=1 Destination Host Unreachable mcw9上能ping通外网,显示包来着外网百度ip [root@mcw9 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.18) 56(84) bytes of data. 64 bytes from 39.156.66.18 (39.156.66.18): icmp_seq=1 ttl=128 time=43.2 ms 查看mcw9正常网关,是有10.0.0.2的网关ip [root@mcw9 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 100 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 101 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33 172.16.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 查看mcw8异常网络的路由,没有外网的网关10.0.0.2。 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 给mcw8添加默认网关,上面之前添加各种路由,结果genmask都不对,不能变成0.0.0.0。而使用如下命令,才实现了 Destination是0.0.0.0,Gateway是10.0.0.2,Genmask是0.0.0.0 ,Flags是UG,Iface是ens33。然后才成功访问外网 [root@mcw8 ~]$ route add default gw 10.0.0.2 [root@mcw8 ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.0.2 0.0.0.0 UG 0 0 0 ens33 0.0.0.0 172.16.1.2 0.0.0.0 UG 0 0 0 ens37 10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens33 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens37 172.16.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens37 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 在mcw8上可以正常访问外网了 [root@mcw8 ~]$ ping www.baidu.com PING www.a.shifen.com (39.156.66.14) 56(84) bytes of data. 64 bytes from 39.156.66.14 (39.156.66.14): icmp_seq=1 ttl=128 time=23.5 ms 64 bytes from 39.156.66.14 (39.156.66.14): icmp_seq=2 ttl=128 time=36.7 ms
由于第二次部署flannel下面网络不通了,网站访问不了(查域名是禁止查询的域名),但是我以前有把这个文件内容保存下来。这样我直接把文件内容复制进来,直接部署就可以了。如下
https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml [machangwei@mcw7 ~]$ ls mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
因为忘记init的加入集群的命令了。所以当我要kubeadm init,然后执行kubeadm reset之后,原本有的容器都没了
排查过程,以及IPtables规则的导出和导入
[root@mcw7 ~]$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root@mcw7 ~]$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root@mcw7 ~]$ 重设然后重初始化后,网络也没有的 [root@mcw7 ~]$ docker ps|grep kube-flannel [root@mcw7 ~]$ 进入普通用户重新部署网络报错 [machangwei@mcw7 ~]$ ls mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") [machangwei@mcw7 ~]$ 查询之前重设的信息。发现说不能清除CNI的信息 [root@mcw7 ~]$ echo y|kubeadm reset [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: [preflight] Running pre-flight checks [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. 移除文件不管用 [root@mcw7 ~]$ mv /etc/cni/net.d /etc/cni/net.dbak [root@mcw7 ~]$ ipvsadm --clear -bash: ipvsadm: command not found 查看了一大堆,不知道咋弄 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere INPUT_direct all -- anywhere anywhere INPUT_ZONES_SOURCE all -- anywhere anywhere 既然无法清除,那么直接从其它机子导出导入一份规则 导出: [root@mcw9 ~]$ iptables-save > /root/iptables_beifen.txt [root@mcw9 ~]$ cat iptables_beifen.txt # Generated by iptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *filter :INPUT ACCEPT [1676:135745] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [896:67997] :DOCKER - [0:0] :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION-STAGE-2 - [0:0] :DOCKER-USER - [0:0] -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN COMMIT # Completed on Fri Jan 7 23:05:39 2022 # Generated by iptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *nat :PREROUTING ACCEPT [32:2470] :INPUT ACCEPT [32:2470] :OUTPUT ACCEPT [8:528] :POSTROUTING ACCEPT [8:528] :DOCKER - [0:0] -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A DOCKER -i docker0 -j RETURN COMMIT # Completed on Fri Jan 7 23:05:39 2022 [root@mcw9 ~]$ cat mcw7上导入规则。出错,文件有问题。第一行注释一下吧 [root@mcw7 ~]$ iptables-restore</root/daoru.txt iptables-restore: line 1 failed [root@mcw7 ~]$ cat daoru.txt #命令 ptables-save v1.4.21 on Fri Jan 7 23:05:39 2022 *filter :INPUT ACCEPT [1676:135745] 导入,防火墙规则一致了 https://blog.csdn.net/jake_tian/article/details/102548306 [root@mcw7 ~]$ iptables-restore</root/daoru.txt [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere ============ 再次执行,试一试 重试 [root@mcw7 ~]$ echo y|kubeadm reset 再看防火墙,貌似是没有变化 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere [root@mcw7 ~]$ 重新初始化后 [root@mcw7 ~]$ kubeadm init --apiserver-advertise-address 10.0.0.137 --pod-network-cidr=10.244.0.0/24 --image-repository=registry.aliyuncs.com/google_containers kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u \ --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 [root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (2 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 你可能无法记住做题的步骤,但是你能根据笔记把题很快做出来,还有把握保证是对的 你可能无法记住部署的步骤,执行的每一个命令,但是你能根据自己以前的笔记很快做出来 原来这个问题跟防火墙没有关系。 [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes") [machangwei@mcw7 ~]$ [machangwei@mcw7 ~]$ kubectl get nodes Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
真正的解决方法:
做法如下,重新用普通用户配置kubectl,以前的配置失效了
[machangwei@mcw7 ~]$ ls -a . .. .bash_history .bash_logout .bash_profile .bashrc .kube mcw.txt mm.yml scripts tools .viminfo [machangwei@mcw7 ~]$ mv .kube kubebak [machangwei@mcw7 ~]$ mkdir -p $HOME/.kube [machangwei@mcw7 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config [machangwei@mcw7 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config [machangwei@mcw7 ~]$ kubectl get node NAME STATUS ROLES AGE VERSION mcw7 NotReady control-plane,master 10m v1.23.1
重新创建网络
[machangwei@mcw7 ~]$ ls kubebak mcw.txt mm.yml scripts tools [machangwei@mcw7 ~]$ kubectl apply -f mm.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created
此时再次用-L查看防火墙
[root@mcw7 ~]$ iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes health check service ports */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */ KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */ DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- mcw7/16 anywhere ACCEPT all -- anywhere mcw7/16 Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */ KUBE-FIREWALL all -- anywhere anywhere Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Chain KUBE-EXTERNAL-SERVICES (2 references) target prot opt source destination Chain KUBE-FIREWALL (2 references) target prot opt source destination DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000 ^C 看规则,应该用下面的才是合适的 [root@mcw7 ~]$ iptables-save # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *nat :PREROUTING ACCEPT [372:18270] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [239:14302] :POSTROUTING ACCEPT [239:14302] :DOCKER - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-MARK-DROP - [0:0] :KUBE-MARK-MASQ - [0:0] :KUBE-NODEPORTS - [0:0] :KUBE-POSTROUTING - [0:0] :KUBE-PROXY-CANARY - [0:0] :KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0] :KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0] :KUBE-SEP-N4G2XR5TDX7PQE7P - [0:0] :KUBE-SEP-XOVE7RWZIDAMLO2S - [0:0] :KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0] :KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0] :KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0] :KUBE-SERVICES - [0:0] :KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0] :KUBE-SVC-JD5MR3NA4I4DYORP - [0:0] :KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0] :KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0] -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -j RETURN -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE -A DOCKER -i docker0 -j RETURN -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN -A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE -A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ -A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.3:53 -A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ -A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SEP-N4G2XR5TDX7PQE7P -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ -A KUBE-SEP-N4G2XR5TDX7PQE7P -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.2:9153 -A KUBE-SEP-XOVE7RWZIDAMLO2S -s 10.0.0.137/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ -A KUBE-SEP-XOVE7RWZIDAMLO2S -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 10.0.0.137:6443 -A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ -A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ -A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.3:9153 -A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ -A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.3:53 -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4 -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS -A KUBE-SVC-ERIFXISQEP7F7OF4 ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-IT2ZTR26TO4XFPTO -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-ZXMNUKOKXUTL2MK2 -A KUBE-SVC-JD5MR3NA4I4DYORP ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ -A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-N4G2XR5TDX7PQE7P -A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-SEP-ZP3FB6NMPNCO4VBJ -A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s 10.244.0.0/24 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-XOVE7RWZIDAMLO2S -A KUBE-SVC-TCOU7JCQXEZGVUNU ! -s 10.244.0.0/24 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-YIL6JZP7A3QYXJU2 -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-6E7XQMQ4RAYOWTTM COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *mangle :PREROUTING ACCEPT [376111:67516258] :INPUT ACCEPT [369347:67204288] :FORWARD ACCEPT [6764:311970] :OUTPUT ACCEPT [369958:67425919] :POSTROUTING ACCEPT [371215:67488646] :FORWARD_direct - [0:0] :INPUT_direct - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-PROXY-CANARY - [0:0] :OUTPUT_direct - [0:0] :POSTROUTING_direct - [0:0] :PREROUTING_ZONES - [0:0] :PREROUTING_ZONES_SOURCE - [0:0] :PREROUTING_direct - [0:0] :PRE_docker - [0:0] :PRE_docker_allow - [0:0] :PRE_docker_deny - [0:0] :PRE_docker_log - [0:0] :PRE_public - [0:0] :PRE_public_allow - [0:0] :PRE_public_deny - [0:0] :PRE_public_log - [0:0] -A PREROUTING -j PREROUTING_direct -A PREROUTING -j PREROUTING_ZONES_SOURCE -A PREROUTING -j PREROUTING_ZONES -A INPUT -j INPUT_direct -A FORWARD -j FORWARD_direct -A OUTPUT -j OUTPUT_direct -A POSTROUTING -j POSTROUTING_direct -A PREROUTING_ZONES -i ens33 -g PRE_public -A PREROUTING_ZONES -i docker0 -j PRE_docker -A PREROUTING_ZONES -i ens37 -g PRE_public -A PREROUTING_ZONES -g PRE_public -A PRE_docker -j PRE_docker_log -A PRE_docker -j PRE_docker_deny -A PRE_docker -j PRE_docker_allow -A PRE_public -j PRE_public_log -A PRE_public -j PRE_public_deny -A PRE_public -j PRE_public_allow COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *security :INPUT ACCEPT [591940:133664590] :FORWARD ACCEPT [1257:62727] :OUTPUT ACCEPT [596315:107591486] :FORWARD_direct - [0:0] :INPUT_direct - [0:0] :OUTPUT_direct - [0:0] -A INPUT -j INPUT_direct -A FORWARD -j FORWARD_direct -A OUTPUT -j OUTPUT_direct COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *raw :PREROUTING ACCEPT [376111:67516258] :OUTPUT ACCEPT [369958:67425919] :OUTPUT_direct - [0:0] :PREROUTING_ZONES - [0:0] :PREROUTING_ZONES_SOURCE - [0:0] :PREROUTING_direct - [0:0] :PRE_docker - [0:0] :PRE_docker_allow - [0:0] :PRE_docker_deny - [0:0] :PRE_docker_log - [0:0] :PRE_public - [0:0] :PRE_public_allow - [0:0] :PRE_public_deny - [0:0] :PRE_public_log - [0:0] -A PREROUTING -j PREROUTING_direct -A PREROUTING -j PREROUTING_ZONES_SOURCE -A PREROUTING -j PREROUTING_ZONES -A OUTPUT -j OUTPUT_direct -A PREROUTING_ZONES -i ens33 -g PRE_public -A PREROUTING_ZONES -i docker0 -j PRE_docker -A PREROUTING_ZONES -i ens37 -g PRE_public -A PREROUTING_ZONES -g PRE_public -A PRE_docker -j PRE_docker_log -A PRE_docker -j PRE_docker_deny -A PRE_docker -j PRE_docker_allow -A PRE_public -j PRE_public_log -A PRE_public -j PRE_public_deny -A PRE_public -j PRE_public_allow COMMIT # Completed on Sat Jan 8 07:35:11 2022 # Generated by iptables-save v1.4.21 on Sat Jan 8 07:35:11 2022 *filter :INPUT ACCEPT [14882:2406600] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [15254:2447569] :DOCKER - [0:0] :DOCKER-ISOLATION-STAGE-1 - [0:0] :DOCKER-ISOLATION-STAGE-2 - [0:0] :DOCKER-USER - [0:0] :KUBE-EXTERNAL-SERVICES - [0:0] :KUBE-FIREWALL - [0:0] :KUBE-FORWARD - [0:0] :KUBE-KUBELET-CANARY - [0:0] :KUBE-NODEPORTS - [0:0] :KUBE-PROXY-CANARY - [0:0] :KUBE-SERVICES - [0:0] -A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A INPUT -j KUBE-FIREWALL -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES -A FORWARD -j DOCKER-USER -A FORWARD -j DOCKER-ISOLATION-STAGE-1 -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -s 10.244.0.0/16 -j ACCEPT -A FORWARD -d 10.244.0.0/16 -j ACCEPT -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT -j KUBE-FIREWALL -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN -A DOCKER-USER -j RETURN -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP -A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT COMMIT # Completed on Sat Jan 8 07:35:11 2022 [root@mcw7 ~]$ 回头研究忘记加入集群的命令,如何重新生成,以及是否对已加入集群的节点是否产生影响
/proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1问题解决
重新加入节点,有警告信息,我们应该把警告信息注意起来,比如docker让它开机启动,如果我们虚拟机没有设置开机启动,那么万一重启了虚拟机,容器就挂了
[root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u \ > --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 [preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING Hostname]: hostname "mcw8" could not be reached [WARNING Hostname]: hostname "mcw8": lookup mcw8 on 10.0.0.2:53: no such host error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher 解决方法 [root@mcw8 ~]$ echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables [root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462[preflight] Running pre-flight checks [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service' [WARNING Hostname]: hostname "mcw8" could not be reached [WARNING Hostname]: hostname "mcw8": lookup mcw8 on 10.0.0.2:53: no such host ^C [root@mcw8 ~]$ echo y|kubeadm reset 如上之后,还是不行,加不到mcw7 master节点,之前记得mcw8和mcw9两个node是没有部署k8s网络的,现在部署一下再试试。配置普通用户kubectl,然后 [machangwei@mcw7 ~]$ scp mm.yml 10.0.0.138:/home/machangwei/ machangwei@10.0.0.138's password: mm.yml 100% 5412 8.5MB/s 00:00 [machangwei@mcw7 ~]$ scp mm.yml 10.0.0.139:/home/machangwei/ machangwei@10.0.0.139's password: mm.yml 但是节点是不需要配置普通用户的kubectl的,因为缺少文件的 [root@mcw8 ~]$ su - machangwei [machangwei@mcw8 ~]$ mkdir -p $HOME/.kube [machangwei@mcw8 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config cp: cannot stat ‘/etc/kubernetes/admin.conf’: No such file or directory [machangwei@mcw8 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config chown: cannot access ‘/home/machangwei/.kube/config’: No such file or directory 加入集群一直卡住 ,加一个--V=2的参数,打印详情 [root@mcw8 ~]$ kubeadm join 10.0.0.137:6443 --token 1e2kkw.ivkth6zzkbx72z4u --discovery-token-ca-cert-hash sha256:fb83146082fb33ca2bff56a525c1e575b5f2587ab1be566f9dd3d7e8d7845462 --v=2 I0108 00:54:46.002913 32058 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName I0108 00:54:46.068584 32058 initconfiguration.go:117] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I0108 00:54:46.068919 32058 preflight.go:92] [preflight] Running general checks 发现报错信息 I0108 00:54:46.849380 32058 checks.go:620] validating kubelet version I0108 00:54:46.927861 32058 checks.go:133] validating if the "kubelet" service is enabled and active I0108 00:54:46.938910 32058 checks.go:206] validating availability of port 10250 I0108 00:54:46.960668 32058 checks.go:283] validating the existence of file /etc/kubernetes/pki/ca.crt I0108 00:54:46.960707 32058 checks.go:433] validating if the connectivity type is via proxy or direct I0108 00:54:46.960795 32058 join.go:530] [preflight] Discovering cluster-info I0108 00:54:46.960846 32058 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.0.0.137:6443" I0108 00:54:46.997909 32058 token.go:118] [discovery] Requesting info from "10.0.0.137:6443" again to validate TLS against the pinned public key I0108 00:54:47.003864 32058 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.137:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": x509: certificate has expired or is not yet valid: current time 2022-01-08T00:54:47+08:00 is before 2022-01-07T23:18:44Z 时间不一致,将mcw8改到错误时间前。mcw7也改了before 2022-01-07T23:18:44Z。然后错误已经变成别的了 [root@mcw8 ~]$ date -s "2022-1-7 23:10:00" Fri Jan 7 23:10:00 CST 2022
根据上面可知,错误变成如下了net/http: request canceled (Client.Timeout exceeded while awaiting headers)
错误变成如下了
ter-info?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0108 01:27:42.577217 32662 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.137:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
k8s系统容器总是起不来,停掉,报错如下。然后把停掉的所有容器多删几次,就好了。重新添加进集群,报错:拒绝
rpc error: code = Unknown desc = failed to create a sandbox for pod \"coredns-6d8c4cb4d-8l99d\": Error response from daemon: Conflict. The container name \"/k8s_POD_coredns-6d8c4cb4d-8l99d_kube-system_e030f426-3e8e-46fe-9e05-6c42a332f650_2\" is already in use by container \"b2dbcdd338ab4b2c35d5386e50e7e116fd41f26a0053a84ec3f1329e09d454a4\". You have to remove (or rename) that container to be able to reuse that name." pod="kube-system/coredns-6d8c4cb4d-8l99d"
[root@mcw8 ~]$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2edd274fd7b5 e6ea68648f0c "/opt/bin/flanneld -…" 7 seconds ago Exited (1) 5 seconds ago k8s_kube-flannel_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_2 5b1715be012d quay.io/coreos/flannel "cp -f /etc/kube-fla…" 28 seconds ago Exited (0) 27 seconds ago k8s_install-cni_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 7beb96ed15be rancher/mirrored-flannelcni-flannel-cni-plugin "cp -f /flannel /opt…" About a minute ago Exited (0) About a minute ago k8s_install-cni-plugin_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 4e998fdfce3e registry.aliyuncs.com/google_containers/kube-proxy "/usr/local/bin/kube…" 2 minutes ago Up 2 minutes k8s_kube-proxy_kube-proxy-5p7dn_kube-system_92b1b38a-f6fa-4308-93fb-8045d2bae63f_0 fed18476d9a3 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-flannel-ds-tvz9q_kube-system_e62fa7b1-1cce-42dc-91d8-cdbd2bfda0f3_0 ebc2403e3052 registry.aliyuncs.com/google_containers/pause:3.6 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-5p7dn_kube-system_92b1b38a-f6fa-4308-93fb-8045d2bae63f_0 已经好了 [machangwei@mcw7 ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcw7 Ready control-plane,master 7m22s v1.23.1 mcw8 Ready <none> 4m51s v1.23.1 mcw9 Ready <none> 3m45s v1.23.1 [machangwei@mcw7 ~]$ 每个加进集群部署好的节点,都有三个容器。加进集群的命令是访问主节点apiserver服务。然后就开始拉取镜像部署节点上的容器了 k8s_kube-proxy_kube- k8s_POD_kube-proxy-n k8s_POD_kube-flannel
pod状态:ContainerCreating,ErrImagePull,ImagePullBackOff
[machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-d2kwp 0/1 ContainerCreating 0 9m7s mcw01dep-nginx-5dd785954d-szdjd 0/1 ErrImagePull 0 9m7s mcw01dep-nginx-5dd785954d-v9x8j 0/1 ErrImagePull 0 9m7s [machangwei@mcw7 ~]$ [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-d2kwp 0/1 ContainerCreating 0 9m15s mcw01dep-nginx-5dd785954d-szdjd 0/1 ImagePullBackOff 0 9m15s mcw01dep-nginx-5dd785954d-v9x8j 0/1 ImagePullBackOff 0 9m15s node上的容器都删除,但是主节点pod还是删不掉了,强制删除 [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-v9x8j 0/1 Terminating 0 33m [machangwei@mcw7 ~]$ kubectl delete pod mcw01dep-nginx-5dd785954d-v9x8j --force warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "mcw01dep-nginx-5dd785954d-v9x8j" force deleted [machangwei@mcw7 ~]$ kubectl get pod No resources found in default namespace. 拉取镜像无效???容器都起来了 [machangwei@mcw7 ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-65zd4 0/1 ContainerCreating 0 118s mcw01dep-nginx-5dd785954d-hfw2k 0/1 ContainerCreating 0 118s mcw01dep-nginx-5dd785954d-qxzpl 0/1 ContainerCreating 0 118s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 112s default-scheduler Successfully assigned default/mcw01dep-nginx-5dd785954d-65zd4 to mcw8 Normal Pulling <invalid> kubelet Pulling image "nginx" 去node节点查看,原来起的是k8s_POD_mcw01dep-nginx这个,不是k8s_mcw01dep-nginx 既然主节点查看pod信息,拉取Nginx的年龄是无效 ,那么去node节点mcw8上直接手动拉取镜像 [root@mcw8 ~]$ docker pull nginx #镜像手动拉取成功 Status: Downloaded newer image for nginx:latest docker.io/library/nginx:latest 再次查看pod详情 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 7m21s default-scheduler Successfully assigned default/mcw01dep-nginx-5dd785954d-65zd4 to mcw8 Normal Pulling <invalid> kubelet Pulling image "nginx" 看到第一行显示调度,也就是每个容器都有个同名的POD容器,那是个调度。来自默认调度,消息里还能看到pod部署到哪个节点了, 多次查看,我已经将mcw8节点拉取了镜像,但是它没认出来,也没有重新拉取,既然如此,我删掉pod,让它自动重建pod,从mcw8节点本地拉取镜像 查看pod,带有命名空间的显示年龄是无效的,也就是mcw8和9的网络存在问题,这个是不是要重新生成呢?这个网络是节点加入到集群时创建的 [machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-tvz9q 0/1 CrashLoopBackOff 102 (<invalid> ago) 8h 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-v28gj 1/1 Running 102 (<invalid> ago) 8h 10.0.0.139 mcw9 <none> <none>
删除k8s系统的pod要指定命名空间
[machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-tvz9q 0/1 CrashLoopBackOff 103 (<invalid> ago) 8h 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-v28gj 0/1 CrashLoopBackOff 102 (<invalid> ago) 8h 10.0.0.139 mcw9 <none> <none> kube-system kube-flannel-ds-vjfkz 1/1 Running 0 8h 10.0.0.137 mcw7 <none> <none> [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-tvz9q Error from server (NotFound): pods "kube-flannel-ds-tvz9q" not found [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-tvz9q --namespace=kube-system pod "kube-flannel-ds-tvz9q" deleted [machangwei@mcw7 ~]$ kubectl delete pod kube-flannel-ds-v28gj --namespace=kube-system pod "kube-flannel-ds-v28gj" deleted [machangwei@mcw7 ~]$ kubectl get pod --all-namespaces -o wide #没啥变化,还是无效的 NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-gr7ck 0/1 CrashLoopBackOff 1 (<invalid> ago) 21s 10.0.0.138 mcw8 <none> <none> kube-system kube-flannel-ds-m6qgl 1/1 Running 1 (<invalid> ago) 6s 10.0.0.139 mcw9 <none> <none> kube-system kube-flannel-ds-vjfkz 1/1 Running 0 8h 10.0.0.137 mcw7 <none> <non
克隆虚拟机容器出各种问题,如果是创建的虚拟机没有这方面问题。
重新创建三个虚拟机,部署过程中遇到如下问题:coredns一直是peding,
[machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-nsv4x 0/1 Pending 0 8m59s kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 8m59s
排查过程:
查看错误信息: [machangwei@mcwk8s-master ~]$ kubectl describe pod coredns-6d8c4cb4d-nsv4x -namespace=kube-system Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 21s (x7 over 7m9s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 解决方案: 默认 k8s 不允许往 master 节点装东西,强行设置下允许:kubectl taint nodes --all node-role.kubernetes.io/master- [machangwei@mcwk8s-master ~]$ kubectl get nodes #查看节点,主节点未准备。执行如下命令,让主节点也作为一个node NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 16m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl taint nodes --all node-role.kubernetes.io/master- node/mcwk8s-master untainted [machangwei@mcwk8s-master ~]$ pod描述里有; Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s 允许master节点部署pod,使用命令如下: kubectl taint nodes --all node-role.kubernetes.io/master- 禁止master部署pod kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule Jan 9 11:51:52 mcw10 kubelet: I0109 11:51:52.636701 25612 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d" Jan 9 11:51:53 mcw10 kubelet: E0109 11:51:53.909336 25612 kubelet.go:2347] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized" Jan 9 11:51:57 mcw10 kubelet: I0109 11:51:57.637836 25612 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d" [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 43m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 18m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 42m
结果发现跟之前的解决貌似没有关系,这是因为没有部署网络的原因,我部署好网络,dns的两个pod就好了
如下: [machangwei@mcwk8s-master ~]$ kubectl apply -f mm.yml #部署网络 Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created [machangwei@mcwk8s-master ~]$ kubectl get nodes #查看节点还没有好 NAME STATUS ROLES AGE VERSION mcwk8s-master NotReady control-plane,master 45m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces #查看dns pod没有好,查看flannel初始化还没有好 NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 20m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 44m kube-system kube-flannel-ds-w8v9s 0/1 Init:0/2 0 14s [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces #再次查看拉取镜像失败 NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 0/1 Pending 0 20m kube-system coredns-6d8c4cb4d-t7hr6 0/1 Pending 0 45m kube-system kube-flannel-ds-w8v9s 0/1 Init:ErrImagePull 0 45s [machangwei@mcwk8s-master ~]$ kubectl describe pod kube-flannel-ds-w8v9s --namespace=kube-system #查看描述信息 Warning Failed 4m26s kubelet Error: ErrImagePull #一直是拉取镜像失败,查看网络没有问题的 Warning Failed 4m25s kubelet Error: ImagePullBackOff #三分钟才拉取镜像成功 Normal BackOff 4m25s kubelet Back-off pulling image "quay.io/coreos/flannel:v0.15.1" Normal Pulling 4m15s (x2 over 4m45s) kubelet Pulling image "quay.io/coreos/flannel:v0.15.1" Normal Pulled 3m36s kubelet Successfully pulled image "quay.io/coreos/flannel:v0.15.1" in 39.090145025s Normal Created 3m35s kubelet Created container install-cni Normal Started 3m35s kubelet Started container install-cni Normal Pulled 3m35s kubelet Container image "quay.io/coreos/flannel:v0.15.1" already present on machine Normal Created 3m35s kubelet Created container kube-flannel Normal Started 3m34s kubelet Started container kube-flannel 再次查看节点,已经是ready了,也就是说部署好网络,coredns才好,master节点作为一个node才ready [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 57m v1.23.1 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-6d8c4cb4d-t24gx 1/1 Running 0 32m kube-system coredns-6d8c4cb4d-t7hr6 1/1 Running 0 56m kube-system etcd-mcwk8s-master 1/1 Running 0 57m kube-system kube-apiserver-mcwk8s-master 1/1 Running 0 57m kube-system kube-controller-manager-mcwk8s-master 1/1 Running 0 57m kube-system kube-flannel-ds-w8v9s 1/1 Running 0 12m kube-system kube-proxy-nvw6m 1/1 Running 0 56m kube-system kube-scheduler-mcwk8s-master 1/1 Running 0 57m
node1上执行加入集群后,master上多出两个网络flannel没有ready的pod
是节点的网络,貌似不影响使用,暂时没有影响
[root@mcwk8s-node1 ~]$ kubeadm join 10.0.0.140:6443 --token 8yficm.352yz89c44mqk4y6 \ > --discovery-token-ca-cert-hash sha256:bcd36381d3de0adb7e05a12f688eee4043833290ebd39366fc47dd5233c552bf master上多出两个没有ready的pod,说明是node上的没有部署好这个网络pod呢 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 Init:1/2 0 99s kube-system kube-flannel-ds-lpmxf 0/1 Init:1/2 0 111s kube-system kube-flannel-ds-w8v9s 1/1 Running 0 16m [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 4 (50s ago) 4m37s 10.0.0.141 mcwk8s-node1 <none> <none> kube-system kube-flannel-ds-lpmxf 0/1 Init:ImagePullBackOff 0 4m49s 10.0.0.142 mcwk8s-node2 <none> <none> kube-system kube-flannel-ds-w8v9s 1/1 Running 0 19m 10.0.0.140 mcwk8s-master <none> <none> 查看nodes状态,现在已经有一个是ready了 [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 65m v1.23.1 mcwk8s-node1 Ready <none> 5m22s v1.23.1 mcwk8s-node2 NotReady <none> 5m35s v1.23.1 此时查看pod情况,虽然node已经ready了,但是网络的pod的状态,显示还是有点问题的 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 5 (44s ago) 6m5s 10.0.0.141 mcwk8s-node1 <none> <none> kube-system kube-flannel-ds-lpmxf 0/1 Init:ImagePullBackOff 0 6m17s 10.0.0.142 mcwk8s-node2 <none> <none> kube-system kube-flannel-ds-w8v9s 1/1 Running 0 21m 10.0.0.140 mcwk8s-master <none> <none> 描述pod信息,查看CrashLoopBackOff这个状态,好像是重启容器失败,容器已经存在了 Normal Created 5m10s (x4 over 5m59s) kubelet Created container kube-flannel Normal Started 5m10s (x4 over 5m58s) kubelet Started container kube-flannel Warning BackOff 4m54s (x5 over 5m52s) kubelet Back-off restarting failed container Normal Pulled 2m52s (x6 over 5m59s) kubelet Container image "quay.io/coreos/flannel:v0.15.1" already present on machine 描述pod信息,查看Init:ImagePullBackOff这个状态,是镜像拉取存在问题 Warning Failed 23s (x4 over 5m42s) kubelet Failed to pull image "quay.io/coreos/flannel:v0.15.1": rpc error: code = Unknown desc = context canceled Warning Failed 23s (x4 over 5m42s) kubelet Error: ErrImagePull
镜像导入导出
建议: 可以依据具体使用场景来选择命令 若是只想备份images,使用save、load即可 若是在启动容器后,容器内容有变化,需要备份,则使用export、import 示例 docker save -o nginx.tar nginx:latest 或 docker save > nginx.tar nginx:latest 其中-o和>表示输出到文件,nginx.tar为目标文件,nginx:latest是源镜像名(name:tag) 示例 docker load -i nginx.tar 或 docker load < nginx.tar 其中-i和<表示从文件输入。会成功导入镜像及相关元数据,包括tag信息 示例 docker export -o nginx-test.tar nginx-test 其中-o表示输出到文件,nginx-test.tar为目标文件,nginx-test是源容器名(name) docker import nginx-test.tar nginx:imp 或 cat nginx-test.tar | docker import - nginx:imp 区别: export命令导出的tar文件略小于save命令导出的 export命令是从容器(container)中导出tar文件,而save命令则是从镜像(images)中导出 基于第二点,export导出的文件再import回去时,无法保留镜像所有历史(即每一层layer信息,不熟悉的可以去看Dockerfile),不能进行回滚操作;而save是依据镜像来的,所以导入时可以完整保留下每一层layer信息。如下图所示,nginx:latest是save导出load导入的,nginx:imp是export导出import导入的。 原文链接:https://blog.csdn.net/ncdx111/article/details/79878098
Init:ImagePullBackOff这个状态的解决
查看node2上没有flannel镜像 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB 去主节点上导出一份镜像然后上传到node2上 [root@mcwk8s-master ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/coreos/flannel v0.15.1 e6ea68648f0c 8 weeks ago 69.5MB [root@mcwk8s-master ~]$ docker save quay.io/coreos/flannel >mcwflanel-image.tar.gz [root@mcwk8s-master ~]$ ls anaconda-ks.cfg jiarujiqun.txt mcwflanel-image.tar.gz [root@mcwk8s-master ~]$ scp mcwflanel-image.tar.gz 10.0.0.142:/root/ node2上导入镜像成功 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB [root@mcwk8s-node2 ~]$ ls anaconda-ks.cfg [root@mcwk8s-node2 ~]$ ls anaconda-ks.cfg mcwflanel-image.tar.gz [root@mcwk8s-node2 ~]$ docker load < mcwflanel-image.tar.gz ab9ef8fb7abb: Loading layer [==================================================>] 2.747MB/2.747MB 2ad3602f224f: Loading layer [==================================================>] 49.46MB/49.46MB 54089bc26b6b: Loading layer [==================================================>] 5.12kB/5.12kB 8c5368be4bdf: Loading layer [==================================================>] 9.216kB/9.216kB 5c32c759eea2: Loading layer [==================================================>] 7.68kB/7.68kB Loaded image: quay.io/coreos/flannel:v0.15.1 [root@mcwk8s-node2 ~]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.aliyuncs.com/google_containers/kube-proxy v1.23.1 b46c42588d51 3 weeks ago 112MB quay.io/coreos/flannel v0.15.1 e6ea68648f0c 8 weeks ago 69.5MB rancher/mirrored-flannelcni-flannel-cni-plugin v1.0.0 cd5235cd7dc2 2 months ago 9.03MB registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12 4 months ago 683kB 主节点上查看pod状态,已经变化了,变成CrashLoopBackOff。重启了很多次 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 9 (4m47s ago) 28m kube-system kube-flannel-ds-lpmxf 0/1 CrashLoopBackOff 4 (74s ago) 28m kube-system kube-flannel-ds-w8v9s 1/1 Running 0 43m
查看描述信息,重启失败。问题CrashLoopBackOff解决
[machangwei@mcwk8s-master ~]$ kubectl describe pod kube-flannel-ds-lpmxf --namespace=kube-system Warning BackOff 3m25s (x20 over 7m48s) kubelet Back-off restarting failed container 虽然节点上的这两个一直不是ready,但是node状态已经是ready了,先不管了,部署一个应用验证一下 [machangwei@mcwk8s-master ~]$ kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system kube-flannel-ds-75npz 0/1 CrashLoopBackOff 12 (114s ago) 41m kube-system kube-flannel-ds-lpmxf 0/1 CrashLoopBackOff 8 (3m46s ago) 41m [machangwei@mcwk8s-master ~]$ kubectl get nodes NAME STATUS ROLES AGE VERSION mcwk8s-master Ready control-plane,master 100m v1.23.1 mcwk8s-node1 Ready <none> 40m v1.23.1 mcwk8s-node2 Ready <none> 41m v1.23.1 查看环境是否安装好了,已经没问题可以部署应用了 [machangwei@mcwk8s-master ~]$ kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE mcw01dep-nginx 1/1 1 1 5m58s mcw02dep-nginx 1/2 2 1 71s [machangwei@mcwk8s-master ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-z7s8m 1/1 Running 0 7m21s mcw02dep-nginx-5b8b58857-7mlmh 1/1 Running 0 2m34s mcw02dep-nginx-5b8b58857-pvwdd 1/1 Running 0 2m34s 把测试资源删掉,然后保存一份虚拟机快照,省的k8s环境变化,需要重新部署等,直接恢复快照就行。 [machangwei@mcwk8s-master ~]$ kubectl get pod NAME READY STATUS RESTARTS AGE mcw01dep-nginx-5dd785954d-z7s8m 1/1 Running 0 7m21s mcw02dep-nginx-5b8b58857-7mlmh 1/1 Running 0 2m34s mcw02dep-nginx-5b8b58857-pvwdd 1/1 Running 0 2m34s [machangwei@mcwk8s-master ~]$ [machangwei@mcwk8s-master ~]$ kubectl get deployment NAME READY UP-TO-DATE AVAILABLE AGE mcw01dep-nginx 1/1 1 1 7m39s mcw02dep-nginx 2/2 2 2 2m52s [machangwei@mcwk8s-master ~]$ kubectl delete deployment mcw01dep-nginx mcw02dep-nginx deployment.apps "mcw01dep-nginx" deleted deployment.apps "mcw02dep-nginx" deleted [machangwei@mcwk8s-master ~]$ kubectl get deployment No resources found in default namespace. [machangwei@mcwk8s-master ~]$ [machangwei@mcwk8s-master ~]$ kubectl get pod No resources found in default namespace.
kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
太卡,卡了哦半边天,虚拟机
系统或者网络占用过多CPU,造成内核软死锁(soft lockup)。Soft lockup名称解释:所谓,soft lockup就是说,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域),很多情况下这个是由于内核锁的使用的问题。