K8S-错误:no metrics known for node
今天在部署完metrics-server后,查看pod日志发现一堆报错:
报错信息如下:
]# kubectl logs -f -n kube-system metrics-server-d8669575f-xl6mw I1202 09:09:31.217954 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key) I1202 09:09:37.725863 1 secure_serving.go:116] Serving securely on [::]:443 E1202 09:09:49.807117 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:09:49.807185 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:09:49.807202 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:09:50.940606 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:09:53.825493 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:09:53.825540 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:09:53.825551 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:10:05.976306 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:10:21.291923 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/nginx-deployment-7d8599fbc9-68pf8: no metrics known for pod E1202 09:10:31.601208 1 reststorage.go:135] unable to fetch node metrics for node "master": no metrics known for node E1202 09:10:31.601330 1 reststorage.go:135] unable to fetch node metrics for node "node1": no metrics known for node E1202 09:10:31.601353 1 reststorage.go:135] unable to fetch node metrics for node "node2": no metrics known for node E1202 09:10:31.610963 1 reststorage.go:160] unable to fetch pod metrics for pod kube-system/kube-flannel-ds-64qdh: no metrics known for pod E1202 09:10:31.611032 1 reststorage.go:160] unable to fetch pod metrics for pod linux40/magedu-tomcat-app1-deployment-6cd664c5bd-wprjb: no metrics known for pod
查看pod详情未发现有效的报错信息
]# kubectl describe pod metrics-server-6c97c89fd5-j2rql -n kube-system Name: metrics-server-6c97c89fd5-j2rql Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: node2/192.168.64.112 Start Time: Thu, 02 Dec 2021 16:50:50 +0800 Labels: k8s-app=metrics-server pod-template-hash=6c97c89fd5 Annotations: <none> Status: Running IP: 10.244.2.61 IPs: IP: 10.244.2.61 Controlled By: ReplicaSet/metrics-server-6c97c89fd5 Containers: metrics-server: Container ID: docker://eac4a2db02ca75315047eb778b7d3e1d7543d10ed6d33b4b1eddb006f824e34e Image: mirrorgooglecontainers/metrics-server-amd64:v0.3.6 Image ID: docker://sha256:9dd718864ce61b4c0805eaf75f87b95302960e65d4857cb8b6591864394be55b Port: 4443/TCP Host Port: 0/TCP Args: --cert-dir=/tmp --secure-port=4443 --kubelet-preferred-address-types=InternalIP --kubelet-use-node-status-port --kubelet-insecure-tls State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Thu, 02 Dec 2021 16:51:10 +0800 Finished: Thu, 02 Dec 2021 16:51:11 +0800 Ready: False Restart Count: 2 Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /tmp from tmp-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-4xrbc (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: tmp-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-server-token-4xrbc: Type: Secret (a volume populated by a Secret) SecretName: metrics-server-token-4xrbc Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/metrics-server-6c97c89fd5-j2rql to node2 Normal Pulled 16s (x3 over 34s) kubelet, node2 Container image "mirrorgooglecontainers/metrics-server-amd64:v0.3.6" already present on machine Normal Created 16s (x3 over 34s) kubelet, node2 Created container metrics-server Normal Started 16s (x3 over 34s) kubelet, node2 Started container metrics-server Warning BackOff 8s (x5 over 32s) kubelet, node2 Back-off restarting failed container
因为部署集群的时候,CA 证书并没有把各个节点的 IP 签上去,所以这里 metrics-server 通过 IP 去请求时,提示签的证书没有对应的 IP(错误:x509: cannot validate certificate for 192.168.33.11 because it doesn’t contain any IP SANs),
我们可以添加一个--kubelet-insecure-tls参数跳过证书校验:
apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server namespace: kube-system labels: k8s-app: metrics-server spec: selector: matchLabels: k8s-app: metrics-server template: metadata: name: metrics-server labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server volumes: - name: tmp-dir emptyDir: {} containers: - name: metrics-server image: mirrorgooglecontainers/metrics-server-amd64:v0.3.6 imagePullPolicy: IfNotPresent command: - /metrics-server - --kubelet-insecure-tls //跳过tls - --kubelet-preferred-address-types=InternalIP //采用内部IP通信 volumeMounts: - name: tmp-dir mountPath: /tmp resources: limits: cpu: 300m memory: 200Mi requests: cpu: 200m memory: 100Mi
越学越感到自己的无知