K8S 安装Metric监控
1、背景
查询k8s每个pod的资源使用情况
kubectl top pod
异常: error: Metrics API not available
问题:没安装指标服务 metric-server
2、安装metric-server
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml
sed -i 's/k8s.gcr.io\/metrics-server/registry.cn-hangzhou.aliyuncs.com\/google_containers/g' metrics-server-components.yaml
kubectl apply -f metrics-server-components.yaml
root@master:~# kubectl get pods -A | grep metric kube-system metrics-server-6949b4d984-vw4qr 0/1 Running 0 12m kubernetes-dashboard dashboard-metrics-scraper-6f669b9c9b-xd55h 1/1 Running 6 (47h ago) 178d
服务没有运行起来,查看异常问题
root@master:~# kubectl logs -n kube-system metrics-server-6949b4d984-vw4qr I0315 16:06:02.791000 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key) I0315 16:06:03.673111 1 secure_serving.go:266] Serving securely on [::]:4443 I0315 16:06:03.673185 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0315 16:06:03.673199 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController I0315 16:06:03.673289 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key" I0315 16:06:03.673908 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" W0315 16:06:03.674004 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed I0315 16:06:03.674072 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0315 16:06:03.674081 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0315 16:06:03.674127 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0315 16:06:03.674136 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file E0315 16:06:03.676930 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" E0315 16:06:03.686907 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master" I0315 16:06:03.773379 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController I0315 16:06:03.775261 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0315 16:06:03.775323 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file E0315 16:06:18.673101 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master" E0315 16:06:18.677498 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" I0315 16:06:26.660430 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0315 16:06:33.670483 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" E0315 16:06:33.671549 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master" I0315 16:06:36.661325 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0315 16:06:46.657357 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0315 16:06:48.668712 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master" E0315 16:06:48.679730 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" I0315 16:06:56.656447 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0315 16:07:03.666515 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" E0315 16:07:03.678459 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master" I0315 16:07:06.655265 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" I0315 16:07:16.658026 1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" E0315 16:07:18.677525 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01" E0315 16:07:18.677879 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="maste
3、解决问题
问题为:
E0315 16:07:03.678459 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
解决方案,修改配置忽略CA验证
spec: containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls # --kubelet-insecure-tls 可以忽略CA验证
重新运行
root@master:~# kubectl apply -f metrics-server-components.yaml serviceaccount/metrics-server unchanged clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged service/metrics-server unchanged deployment.apps/metrics-server configured apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
root@master:~# kubectl get pods -A | grep metrics kube-system metrics-server-55db4f88bc-ztqjv 1/1 Running 0 108s kubernetes-dashboard dashboard-metrics-scraper-6f669b9c9b-xd55h 1/1 Running 6 (2d ago) 178d
安装成功,查看指标
root@master:~# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 235m 5% 3363Mi 87%
node01 24m 1% 1591Mi 41%