二进制高可用安装报错解决

1、高可用脚本安装完etcd后启动失败

解决:所有节点重启即可解决。这样的情况遇到了三次,就是因为电脑太卡了,当时cpu利用率很高,达到了94%。脚本是正确的,跟脚本没有关系

所以最好分开安装,先安装etcd集群,然后重启所有节点,再安装k8s部分,
2、kube-apiserver报错: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 现象如下: [root@test1 ssl]# systemctl status kube-apiserver -l ● kube-apiserver.service - Kubernetes API Server Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2019-02-06 18:14:58 EST; 1h 3min ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 1684 (kube-apiserver) Tasks: 16 Memory: 11.4M CGroup: /system.slice/kube-apiserver.service └─1684 /opt/k8s/bin/kube-apiserver --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --anonymous-auth=false # --experimental-encryption-provider-config=/etc/kubernetes/encryption-config.yaml --advertise-address=192.168.0.91 --bind-address=192.168.0.91 --insecure-port=8080 --authorization-mode=Node,RBAC # --runtime-config=api/all --enable-bootstrap-token-auth --token-auth-file=/etc/kubernetes/token.csv --service-cluster-ip-range=10.254.0.0/16 --service-node-port-range=8000-30000 --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem --client-ca-file=/etc/kubernetes/cert/ca.pem --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem --etcd-cafile=/etc/kubernetes/cert/ca.pem --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem --service-account-key-file=/etc/kubernetes/cert/sa.pub --etcd-servers=https://192.168.0.91:2379,https://192.168.0.92:2379,https://192.168.0.93:2379 --enable-swagger-ui=true --secure-port=6443 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --allow-privileged=true --apiserver-count=3 --audit-log-maxage=30 --audit-log-maxbackup=3 --audit-log-maxsize=100 --audit-log-path=/var/log/kube-apiserver-audit.log --event-ttl=1h --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.055401 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:24 test1 kube-apiserver[1684]: E0206 19:18:24.650493 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.074728 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:25 test1 kube-apiserver[1684]: E0206 19:18:25.666053 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.103077 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:26 test1 kube-apiserver[1684]: E0206 19:18:26.689155 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.123484 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:27 test1 kube-apiserver[1684]: E0206 19:18:27.707282 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.246831 1684 cacher.go:272] unexpected ListAndWatch error: storage/cacher.go:/secrets: Failed to list *core.Secret: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input Feb 06 19:18:28 test1 kube-apiserver[1684]: E0206 19:18:28.729613 1684 reflector.go:134] k8s.io/client-go/informers/factory.go:132: Failed to list *v1.Secret: Internal error occurred: unable to transform key "/registry/secrets/default/default-token-859zc": invalid padding on input 解决:用脚本重新装了一遍好了。 3、kube-apiserver无法启动:external host was not specified, using 192.168.0.91 解决:kube-apiserver启动文件里面的注释都删掉即可解决 4、kubelet日志有错误:No valid private key and/or certificate found, reusing existing private key or creating a new one 下面报错是正常的,但是还是排查了一遍发现两个致命错误 [root@test4 kubernetes]# systemctl status kubelet -l ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled) Active: active (running) since Thu 2019-02-07 07:24:53 EST; 5s ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 73646 (kubelet) Tasks: 12 Memory: 15.2M CGroup: /system.slice/kubelet.service └─73646 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.021451 73646 server.go:407] Version: v1.13.0 Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024450 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.024837 73646 feature_gate.go:206] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]} Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025195 73646 plugins.go:103] No cloud provider specified. Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025304 73646 server.go:523] No cloud provider specified: "" from the config file: "" Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.025410 73646 bootstrap.go:65] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.043219 73646 bootstrap.go:96] No valid private key and/or certificate found, reusing existing private key or creating a new one Feb 07 07:24:54 test4 kubelet[73646]: I0207 07:24:54.176716 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:56 test4 kubelet[73646]: I0207 07:24:56.347469 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials Feb 07 07:24:58 test4 kubelet[73646]: I0207 07:24:58.451741 73646 bootstrap.go:239] Failed to connect to apiserver: the server has asked for the client to provide credentials 错误一: 查看生成boootstrap配置文件发现错误, 发现BOOTSTRAP_TOKEN=(kubeadm )竟然没有加$,必须要加上$符号。这是最主要的错误,还有个错误看下面 BOOTSTRAP_TOKEN=(kubeadm token create --description kubelet-bootstrap-token --groups system:bootstrappers:test1 --kubeconfig ~/.kube/config) [root@test1 profile]# cat bootstrap-kubeconfig.sh #!/bin/bash #定义变量 export MASTER_VIP="192.168.0.235" export KUBE_APISERVER="https://192.168.0.235:8443" export NODE_NAMES=(test1 test2 test3 test4) cd $HOME/ssl/ for node_name in ${NODE_NAMES[*]} do #创建 token export BOOTSTRAP_TOKEN=(kubeadm token create \ --description kubelet-bootstrap-token \ --groups system:bootstrappers:${node_name} \ --kubeconfig ~/.kube/config) #设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/cert/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig #设置默认上下文 kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig done 错误二:查看参数配置文件发现一个错误 [root@test4 ~]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 发现address: 0.0.0.0并不是真实的ip地址。在test4节点用hostname -i 看到的竟然是0.0.0.0,把address改成真实的worker节点ip即可 5、通过csr请求后发现没有node 解决:发现是kubelet停了;原因是往配置文件里面加上cavidor参数后重启了下,并没有看状态,之后才发现挂了,重启即可 6、kubectl无法查询pod资源:Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) 请仔细阅读完下面 现象如下: [root@test4 profile]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. If you don't see a command prompt, try pressing enter. Error attaching, falling back to logs: error dialing backend: dial tcp 0.0.0.0:10250: connect: connection refused deployment.apps "dns-client" deleted Error from server: Get https://test4:10250/containerLogs/default/dns-client-86c6d59f7-tzh5c/dns-client: dial tcp 0.0.0.0:10250: connect: connection refused 查看coredns.yaml 文件 [root@test4 profile]# cat coredns.yaml apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system --- apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" kubernetes.io/name: "CoreDNS" spec: selector: k8s-app: kube-dns clusterIP: cluster_dns_svc_ip ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP 发现没有ip "address": "0.0.0.0", [root@test4 profile]# cat /etc/kubernetes/kubelet.config.json { "kind": "KubeletConfiguration", "apiVersion": "kubelet.config.k8s.io/v1beta1", "authentication": { "x509": { "clientCAFile": "/etc/kubernetes/cert/ca.pem" }, "webhook": { "enabled": true, "cacheTTL": "2m0s" }, "anonymous": { "enabled": false } }, "authorization": { "mode": "Webhook", "webhook": { "cacheAuthorizedTTL": "5m0s", "cacheUnauthorizedTTL": "30s" } }, "address": "0.0.0.0", "port": 10250, "readOnlyPort": 0, "cgroupDriver": "cgroupfs", "hairpinMode": "promiscuous-bridge", "serializeImagePulls": false, "featureGates": { "RotateKubeletClientCertificate": true, "RotateKubeletServerCertificate": true }, "clusterDomain": "cluster.local.", "clusterDNS": ["10.254.0.2"] } 解决上面的问题后,扔然不管用。就怀疑是apiserver的问题,最后参照这篇文档中的apiserver启动配置文件 https://www.cnblogs.com/effortsing/p/10312081.html 需要在所有master节点kube-apiserver 启动参数中添加这句话:--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname 然后重启所有master节点 kube-apiserver,就不再报dial tcp 192.168.0.93:10250: connect: no route to host,这个错误,但是出现新的报错,报错如下: 执行查看资源报错: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) 解决:创建apiserver到kubelet的权限,就是没有给kubernetes用户rbac授权,授权即可,进行如下操作: 注意:user=kubernetes ,这个user要替换掉下面yaml文件里面的用户名 cat > apiserver-to-kubelet.yaml <<EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kubernetes-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kubernetes namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kubernetes-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kubernetes EOF 创建授权: kubectl create -f apiserver-to-kubelet.yaml [root@test4 ~]# kubectl create -f apiserver-to-kubelet.yaml clusterrole.rbac.authorization.k8s.io/system:kubernetes-to-kubelet created clusterrolebinding.rbac.authorization.k8s.io/system:kubernetes created 重新进到容器查看资源 [root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh / # exit 现在可以进到容器里面查看资源了 参照文档:https://www.jianshu.com/p/b3d8e8b8fd7e 7、无法创建flannel、coredns 问题: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system 现象如下:pod都挂掉状态 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 0/1 ContainerCreating 0 4s coredns-69d58bd968-xjqpj 0/1 ContainerCreating 0 3m6s kube-flannel-ds-4bgqb 0/1 Init:0/1 0 94s 查看pod日志发现错误: Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system [root@test4 profile]# kubectl describe pod coredns-69d58bd968-f9tn4 --namespace kube-system Name: coredns-69d58bd968-f9tn4 Namespace: kube-system Priority: 0 PriorityClassName: <none> Node: test4/192.168.0.94 Start Time: Fri, 08 Feb 2019 23:50:28 -0500 Labels: k8s-app=kube-dns pod-template-hash=69d58bd968 Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/coredns-69d58bd968 Containers: coredns: Container ID: Image: coredns/coredns:1.2.0 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-29dbl (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-29dbl: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-29dbl Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16m default-scheduler Successfully assigned kube-system/coredns-69d58bd968-f9tn4 to test4 Warning FailedMount 68s (x7 over 14m) kubelet, test4 Unable to mount volumes for pod "coredns-69d58bd968-f9tn4_kube-system(38cb8d7e-2c26-11e9-8db2-000c2935f634)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"coredns-69d58bd968-f9tn4". list of unmounted volumes=[coredns-token-29dbl]. list of unattached volumes=[config-volume coredns-token-29dbl] Warning FailedMount 7s (x16 over 16m) kubelet, test4 MountVolume.SetUp failed for volume "coredns-token-29dbl" : couldn't propagate object cache: timed out waiting for the condition 查看docker日志报错是一样的: Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" [root@test4 profile]# systemctl status docker -l ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-02-08 23:23:56 EST; 50min ago Docs: https://docs.docker.com Main PID: 956 (dockerd) CGroup: /system.slice/docker.service ├─ 956 /usr/bin/dockerd └─1152 docker-containerd --config /var/run/docker/containerd/containerd.toml Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.245990170-05:00" level=error msg="Failed to load container mount ebb0891f650ea9643caf4ec8f164a54e8c6dc9d54842ea1ea4bacc72ff4addff: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.248503580-05:00" level=error msg="Failed to load container mount f4e32003f4c0fc39d292b2dd76dd0a0016a0b1e72028c7d4910749fc7836efde: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.250961209-05:00" level=error msg="Failed to load container mount fb5ca71237d38e0bb413ac95a858ee3e41c209a936a1f41081bf2b6a57f10a45: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.253042348-05:00" level=error msg="Failed to load container mount fb8dfb7d9813b638ac24dc9b0cde97ed095c222b22f8d44f082f5130e2f233e4: mount does not exist" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.666363859-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.760913207-05:00" level=info msg="Loading containers: done." Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.864408002-05:00" level=info msg="Docker daemon" commit=0520e24 graphdriver(s)=overlay2 version=18.03.0-ce Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.867069598-05:00" level=info msg="Daemon has completed initialization" Feb 08 23:23:56 test4 dockerd[956]: time="2019-02-08T23:23:56.883546083-05:00" level=info msg="API listen on /var/run/docker.sock" Feb 08 23:23:56 test4 systemd[1]: Started Docker Application Container Engine. 解决:重启docker即可 systemctl restart docker 再次查看pod马上就正常 [root@test4 profile]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-mdskk 1/1 Running 0 3m26s coredns-69d58bd968-xjqpj 1/1 Running 0 6m28s kube-flannel-ds-4bgqb 1/1 Running 0 4m56s 再次查看docker 这才是docker和k8s结合的正常状态 [root@test4 profile]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago Docs: https://docs.docker.com Main PID: 18711 (dockerd) Tasks: 246 Memory: 98.6M CGroup: /system.slice/docker.service ├─18711 /usr/bin/dockerd ├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml ├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... └─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054 Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070 Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270 Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328 Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks" Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete" Hint: Some lines were ellipsized, use -l to show in full. 8、测试coredns功能时候,执行kubectl run -it --rm --image=infoblox/dnstools dns-client卡住 现象如下: [root@test4 ~]# kubectl run -it --rm --image=infoblox/dnstools dns-client kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. 查看pod [root@test4 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox1-54dc95466f-kmjcp 1/1 ContainerCreating 1 40m dig-5c7554b84f-sdl8k 1/1 ContainerCreating 1 40m dns-client-2-56bdd8dfd5-pn5zn 1/1 ContainerCreating 1 40m dns-client-3-6f98f9f7df-g29d6 1/1 ContainerCreating 1 40m dns-client-86c6d59f7-znnbb 1/1 ContainerCreating 1 40m dnstools-6d4979fbbf-294ns 1/1 ContainerCreating 1 40m 原因:可能是因为flannal和coredns有问题,后来查看docker日志发现有错误日志;也可能是cpu标的太高,当时cpu86%。大概就是这两种情况。 解决: 关掉一个master节点来降低cpu 重启docker,查看docker状态,docker和k8s结合的正常状态应该是下面这样的: [root@test4 profile]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2019-02-09 00:28:04 EST; 33s ago Docs: https://docs.docker.com Main PID: 18711 (dockerd) Tasks: 246 Memory: 98.6M CGroup: /system.slice/docker.service ├─18711 /usr/bin/dockerd ├─18718 docker-containerd --config /var/run/docker/containerd/containerd.toml ├─19312 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19325 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19337 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19344 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19384 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19434 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19463 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19478 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19509 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19562 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─19566 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20190 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20473 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20506 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20670 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20685 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20702 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─20741 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21002 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... ├─21054 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... └─21270 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linu... Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21054 Feb 09 00:28:16 test4 dockerd[18711]: time="2019-02-09T00:28:16-05:00" level=info msg="shim docker-containerd-shim started"...d=21070 Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=66d80ea2e0c9a995f325.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.244294325-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17-05:00" level=info msg="shim reaped" id=42e08bbdf67aabd17173.../tasks" Feb 09 00:28:17 test4 dockerd[18711]: time="2019-02-09T00:28:17.774220924-05:00" level=info msg="ignoring event" module=lib...Delete" Feb 09 00:28:33 test4 dockerd[18711]: time="2019-02-09T00:28:33-05:00" level=info msg="shim docker-containerd-shim started"...d=21270 Feb 09 00:28:34 test4 dockerd[18711]: time="2019-02-09T00:28:34-05:00" level=info msg="shim docker-containerd-shim started"...d=21328 Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35-05:00" level=info msg="shim reaped" id=5114cc9a4a74c294de17.../tasks" Feb 09 00:28:35 test4 dockerd[18711]: time="2019-02-09T00:28:35.767869814-05:00" level=info msg="ignoring event" module=lib...Delete" Hint: Some lines were ellipsized, use -l to show in full. 再次查看pod [root@test4 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE busybox1-54dc95466f-kmjcp 1/1 Running 1 40m dig-5c7554b84f-sdl8k 1/1 Running 1 40m dns-client-2-56bdd8dfd5-pn5zn 1/1 Running 1 40m dns-client-3-6f98f9f7df-g29d6 1/1 Running 1 40m dns-client-86c6d59f7-znnbb 1/1 Running 1 40m dnstools-6d4979fbbf-294ns 1/1 Running 1 40m 9、执行删除pod操作不管用 第一种情况很可能是cpu标高了,关掉一个master节点来降低cpu 第二种情况是其他组件出现了问题,比如flannal、coredns、docker,看看是否正常,尤其看docker是否有报错,这很关键 10、不管什么报错,时刻查看flannal、coredns、docker的状态,很有可能和这几个组件有关系 11、flannel处于Init:0/1状态、coredns无法创建 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s kube-flannel-ds-w2r7l 0/1 Init:0/1 0 3m32s 首先查看有没有docker容器 [root@test4 profile]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 发现一个容器都没有,之前的pod就算创建失败,从这个也可以看个id,这次连id都没有,奇怪的很,只能通过kubectl去查看日志了,如下: 再查看pod日志: [root@test4 profile]# cat /var/log/kubernetes/kubelet.test4.root.log.ERROR.20190210-071055.86336 Log file created at: 2019/02/10 07:10:55 Running on machine: test4 Binary: Built with gc go1.11.2 for linux/amd64 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0210 07:10:55.151126 86336 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache E0210 07:14:56.172087 86336 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded 发现这句话:rpc error: code = DeadlineExceeded desc = context deadline exceeded 上面这句话从网上搜说是网络问题。之前一直没事,就没动过网络,能有啥问题。这时候突然想起来了防火墙,就去看看防火墙状态吧,发现防火墙开着的,如下: 刚开始安装就已经禁用了防火墙怎么会开着的,奇怪了。猜测可能是配置ip_vs内核参数时候,自动开启了防火墙,这可能是默认规则,然后需要关闭即可 [root@test4 profile]# systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: active (running) since Sun 2019-02-10 04:56:42 EST; 2h 28min ago Docs: man:firewalld(1) Main PID: 28767 (firewalld) Tasks: 2 Memory: 372.0K CGroup: /system.slice/firewalld.service └─28767 /usr/bin/python2.7 /usr/sbin/firewalld --nofork --nopid Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?). Feb 10 07:10:27 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C PREROUTING -m addrtype -...t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C OUTPUT -m addrtype --dst...t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.17.0....t name. Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C DOCKER -i docker0 -j RET...hain?). Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -D FORWARD -i docker0 -o docker0 -...hain?). Feb 10 07:10:28 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 -o...hain?). Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -i docker0 ! ...hain?). Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -j...t name. Feb 10 07:10:29 test4 firewalld[28767]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t filter -C FORWARD -o docker0 -m...hain?). Hint: Some lines were ellipsized, use -l to show in full. 一直没动过防火墙,怎么会开着的,奇怪了都,关掉防火墙,重新创建flannel,等几分钟flannel就会处于running状态,查看如下: [root@test4 profile]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE kube-flannel-ds-w2r7l 1/1 Running 0 6m33s 再次查看docker中的容器,发现就有个flannel容器,查看如下: [root@test4 profile]# docker ps -l CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 12 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 [root@test4 profile]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 23930c2ebb47 b949a39093d6 "/opt/bin/flanneld -…" 13 minutes ago Up 13 minutes k8s_kube-flannel_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 9d886b599345 b949a39093d6 "cp -f /etc/kube-fla…" 13 minutes ago Exited (0) 13 minutes ago k8s_install-cni_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 5c85c66161fb registry.access.redhat.com/rhel7/pod-infrastructure:latest "/usr/bin/pod" 13 minutes ago Up 13 minutes k8s_POD_kube-flannel-ds-w2r7l_kube-system_2f7fc646-2d2f-11e9-89e3-000c29dbd920_0 12、pod一直处于pending状态 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-brz8w 0/1 Pending 0 9m6s coredns-69d58bd968-jvfkf 0/1 Pending 0 9m7s 查看 kubelet的日志发现总是报Unable to update cni config,干脆直接不用cni插件,直接从kubelet启动参数中剔除掉cni先关的几个参数即可解决 [root@test4 ~]# systemctl status kubelet -l ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; static; vendor preset: disabled) Active: active (running) since Sun 2019-02-10 05:21:42 EST; 12min ago Docs: https://github.com/GoogleCloudPlatform/kubernetes Main PID: 33598 (kubelet) CGroup: /system.slice/kubelet.service └─33598 /opt/k8s/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig --cert-dir=/etc/kubernetes/cert --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --config=/etc/kubernetes/kubelet.config.json --hostname-override=test4 --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest --allow-privileged=true --alsologtostderr=true --logtostderr=false --log-dir=/var/log/kubernetes --v=2 Feb 10 05:34:16 test4 kubelet[33598]: W0210 05:34:16.651679 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:16 test4 kubelet[33598]: E0210 05:34:16.652336 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:21 test4 kubelet[33598]: W0210 05:34:21.656085 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:21 test4 kubelet[33598]: E0210 05:34:21.656587 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:26 test4 kubelet[33598]: W0210 05:34:26.665157 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:26 test4 kubelet[33598]: E0210 05:34:26.666018 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:31 test4 kubelet[33598]: W0210 05:34:31.669777 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:31 test4 kubelet[33598]: E0210 05:34:31.671423 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Feb 10 05:34:36 test4 kubelet[33598]: W0210 05:34:36.677673 33598 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d Feb 10 05:34:36 test4 kubelet[33598]: E0210 05:34:36.679154 33598 kubelet.go:2192] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized 13、用kubeadm token 命令查看token总是报错说 base64 格式不正确,这是因为token过期了,重启搭建k8s集群即可解决,并不是不能用kubeadm命令了。试过, kubeadm token list --kubeconfig ~/.kube/config 14、第二次安装k8s再次出现flannel处于Init:0/1状态 [root@test4 ~]# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-69d58bd968-gqsqz 0/1 ContainerCreating 0 11m coredns-69d58bd968-wg2jf 0/1 ContainerCreating 0 11m kube-flannel-ds-kljwc 0/1 Init:0/1 0 14m 这时候发现cpu已经标到 93%,可能是cpu导致的,也可能是防火墙导致的。做了如下两步就解决了 启动防火墙然后关掉 关掉其中一个master节点,因为cpu使用率太高了 15、安装完etcd集群显示有一个etcd不健康, 解决: 不影响使用,继续安装即可,里面问题太多了。是因为之前执行脚本的时候出现报错信息,中间给停过脚本,之后安装完就出现有一个etcd节点不健康 或者重新安装一遍即可解决,亲测有效

 

posted @ 2019-02-09 15:31  effortsing  阅读(8839)  评论(0编辑  收藏  举报