庄泽波の博客

好记性不如烂笔头

K8S 安装Metric监控

1、背景

查询k8s每个pod的资源使用情况

kubectl top pod

异常: error: Metrics API not available

问题:没安装指标服务 metric-server

2、安装metric-server

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -O metrics-server-components.yaml

  sed -i 's/k8s.gcr.io\/metrics-server/registry.cn-hangzhou.aliyuncs.com\/google_containers/g' metrics-server-components.yaml

  kubectl apply -f metrics-server-components.yaml

root@master:~# kubectl get pods -A | grep metric
kube-system            metrics-server-6949b4d984-vw4qr              0/1     Running   0               12m
kubernetes-dashboard   dashboard-metrics-scraper-6f669b9c9b-xd55h   1/1     Running   6 (47h ago)     178d

服务没有运行起来,查看异常问题

root@master:~# kubectl logs -n kube-system metrics-server-6949b4d984-vw4qr
I0315 16:06:02.791000       1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0315 16:06:03.673111       1 secure_serving.go:266] Serving securely on [::]:4443
I0315 16:06:03.673185       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0315 16:06:03.673199       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0315 16:06:03.673289       1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0315 16:06:03.673908       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0315 16:06:03.674004       1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0315 16:06:03.674072       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0315 16:06:03.674081       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0315 16:06:03.674127       1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0315 16:06:03.674136       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0315 16:06:03.676930       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
E0315 16:06:03.686907       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
I0315 16:06:03.773379       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0315 16:06:03.775261       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0315 16:06:03.775323       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
E0315 16:06:18.673101       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
E0315 16:06:18.677498       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
I0315 16:06:26.660430       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0315 16:06:33.670483       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
E0315 16:06:33.671549       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
I0315 16:06:36.661325       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0315 16:06:46.657357       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0315 16:06:48.668712       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
E0315 16:06:48.679730       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
I0315 16:06:56.656447       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0315 16:07:03.666515       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
E0315 16:07:03.678459       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
I0315 16:07:06.655265       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0315 16:07:16.658026       1 server.go:187] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0315 16:07:18.677525       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.105:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.105 because it doesn't contain any IP SANs" node="node01"
E0315 16:07:18.677879       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="maste

3、解决问题

问题为:

E0315 16:07:03.678459       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.1.108:10250/metrics/resource\": x509: cannot validate certificate for 192.168.1.108 because it doesn't contain any IP SANs" node="master"
解决方案,修改配置忽略CA验证
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls  # --kubelet-insecure-tls 可以忽略CA验证

重新运行

root@master:~# kubectl apply -f metrics-server-components.yaml 
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
root@master:~# kubectl get pods -A | grep metrics
kube-system            metrics-server-55db4f88bc-ztqjv              1/1     Running   0               108s
kubernetes-dashboard   dashboard-metrics-scraper-6f669b9c9b-xd55h   1/1     Running   6 (2d ago)      178d

安装成功,查看指标

root@master:~# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 235m 5% 3363Mi 87%
node01 24m 1% 1591Mi 41%

 

 

 

posted on 2023-03-16 00:31  庄泽波  阅读(675)  评论(0编辑  收藏  举报

导航