k8s系列---资源指标API及自定义指标API
2019-01-29 10:41 dribs 阅读(5387) 评论(1) 编辑 收藏 举报https://www.linuxea.com/2112.html
以前是用heapster来收集资源指标才能看,现在heapster要废弃了。
从k8s v1.8开始后,引入了新的功能,即把资源指标引入api。
资源指标:metrics-server
自定义指标: prometheus,k8s-prometheus-adapter
因此,新一代架构:
1) 核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;cpu累计利用率、内存实时利用率、pod的资源占用率及容器的磁盘占用率
2) 监控流水线:用于从系统收集各种指标数据并提供终端用户、存储系统以及HPA,他们包含核心指标以及许多非核心指标。非核心指标不能被k8s所解析。
metrics-server是个api server,仅仅收集cpu利用率、内存利用率等。
1 2 3 4 5 6 7 8 9 10 11 | [root@master ~] # kubectl api-versions admissionregistration.k8s.io / v1beta1 apiextensions.k8s.io / v1beta1 apiregistration.k8s.io / v1 apiregistration.k8s.io / v1beta1 apps / v1 apps / v1beta1 apps / v1beta2 authentication.k8s.io / v1 authentication.k8s.io / v1beta1 authorization.k8s.io / v1 |
访问 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server 获取yaml文件,但这个里面的yaml文件更新了。和视频内的有差别
贴出我修改后的yaml文件,留作备用

[root@master metrics-server]# cat auth-delegator.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: metrics-server:system:auth-delegator labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system

[root@master metrics-server]# cat auth-reader.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: metrics-server-auth-reader namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system

[root@master metrics-server]# cat metrics-apiservice.yaml apiVersion: apiregistration.k8s.io/v1beta1 kind: APIService metadata: name: v1beta1.metrics.k8s.io labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: service: name: metrics-server namespace: kube-system group: metrics.k8s.io version: v1beta1 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100
关键是这个文件

[root@master metrics-server]# cat metrics-server-deployment.yaml apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: v1 kind: ConfigMap metadata: name: metrics-server-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration --- apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server-v0.3.1 namespace: kube-system labels: k8s-app: metrics-server kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.3.1 spec: selector: matchLabels: k8s-app: metrics-server version: v0.3.1 template: metadata: name: metrics-server labels: k8s-app: metrics-server version: v0.3.1 annotations: scheduler.alpha.kubernetes.io/critical-pod: '' seccomp.security.alpha.kubernetes.io/pod: 'docker/default' spec: priorityClassName: system-cluster-critical serviceAccountName: metrics-server containers: - name: metrics-server image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1 command: - /metrics-server - --metric-resolution=30s - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP # These are needed for GKE, which doesn't support secure communication yet. # Remove these lines for non-GKE clusters, and when GKE supports token-based auth. #- --kubelet-port=10250 #- --deprecated-kubelet-completely-insecure=true ports: - containerPort: 443 name: https protocol: TCP - name: metrics-server-nanny image: mirrorgooglecontainers/addon-resizer:1.8.4 resources: limits: cpu: 100m memory: 300Mi requests: cpu: 5m memory: 50Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: metrics-server-config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --cpu=100m - --extra-cpu=0.5m - --memory=100Mi - --extra-memory=50Mi - --threshold=5 - --deployment=metrics-server-v0.3.1 - --container=metrics-server - --poll-period=300000 - --estimator=exponential # Specifies the smallest cluster (defined in number of nodes) # # resources will be scaled to. - --minClusterSize=10 volumes: - name: metrics-server-config-volume configMap: name: metrics-server-config tolerations: - key: "CriticalAddonsOnly" operator: "Exists"

[root@master metrics-server]# cat metrics-server-service.yaml apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" kubernetes.io/name: "Metrics-server" spec: selector: k8s-app: metrics-server ports: - port: 443 protocol: TCP targetPort: https

[root@master metrics-server]# cat metrics-server-service.yaml apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" kubernetes.io/name: "Metrics-server" spec: selector: k8s-app: metrics-server ports: - port: 443 protocol: TCP targetPort: https [root@master metrics-server]# cat resource-reader.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - pods - nodes - namespaces - nodes/stats verbs: - get - list - watch - apiGroups: - "extensions" resources: - deployments verbs: - get - list - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system
如果从github上下载以上文件apply出错,就用上面的metrics-server-deployment.yaml文件,删掉重新apply一下就可以了
1 | [root@master metrics - server] # kubectl apply -f ./ |
1 | [root@master ~] # kubectl proxy --port=8080 |
确保metrics-server-v0.3.1-76b796b-4xgvp是running状态,我当时出现了Error发现是yaml里面有问题,最后该掉running了,该来该去该到上面的最终版
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [root@master metrics - server] # kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE canal - mgbc2 3 / 3 Running 12 3d23h canal - s4xgb 3 / 3 Running 23 3d23h canal - z98bc 3 / 3 Running 15 3d23h coredns - 78d4cf999f - 5shdq 1 / 1 Running 0 6m4s coredns - 78d4cf999f - xj5pj 1 / 1 Running 0 5m53s etcd - master 1 / 1 Running 13 17d kube - apiserver - master 1 / 1 Running 13 17d kube - controller - manager - master 1 / 1 Running 19 17d kube - flannel - ds - amd64 - 8xkfn 1 / 1 Running 0 <invalid> kube - flannel - ds - amd64 - t7jpc 1 / 1 Running 0 <invalid> kube - flannel - ds - amd64 - vlbjz 1 / 1 Running 0 <invalid> kube - proxy - ggcbf 1 / 1 Running 11 17d kube - proxy - jxksd 1 / 1 Running 11 17d kube - proxy - nkkpc 1 / 1 Running 12 17d kube - scheduler - master 1 / 1 Running 19 17d kubernetes - dashboard - 76479d66bb - zr4dd 1 / 1 Running 0 <invalid> metrics - server - v0. 3.1 - 76b796b - 4xgvp 2 / 2 Running 0 9s |
查看出错日志 -c指定容器名,该pod内有两个容器,metrcis-server只是其中一个,另一个查询方法一样,把名字改掉即可
1 | [root@master metrics - server] # kubectl logs metrics-server-v0.3.1-76b796b-4xgvp -c metrics-server -n kube-system |
大致出错的日志内容如下几条;
1 2 3 4 5 6 7 8 9 | 403 Forbidden ", response: " Forbidden (user = system:anonymous, verb = get, resource = nodes, subresource = stats) E0903 1 manager.go: 102 ] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https: / / <hostname>: 10250 / stats / summary / : dial tcp: lookup <hostname> on 10.96 . 0.10 : 53 : no such host no response from https: / / 10.101 . 248.96 : 443 : Get https: / / 10.101 . 248.96 : 443 : Proxy Error ( Connection refused ) E1109 09 : 54 : 49.509521 1 manager.go: 102 ] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node - 2.com : unable to fetch metrics from Kubelet linuxea.node - 2.com ( 10.10 . 240.203 ): Get https: / / 10.10 . 240.203 : 10255 / stats / summary / : dial tcp 10.10 . 240.203 : 10255 : connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node - 3.com : unable to fetch metrics from Kubelet linuxea.node - 3.com ( 10.10 . 240.143 ): Get https: / / 10.10 . 240.143 : 10255 / stats / summary / : dial tcp 10.10 . 240.143 : 10255 : connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node - 4.com : unable to fetch metrics from Kubelet linuxea.node - 4.com ( 10.10 . 240.142 ): Get https: / / 10.10 . 240.142 : 10255 / stats / summary / : dial tcp 10.10 . 240.142 : 10255 : connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master - 1.com : unable to fetch metrics from Kubelet linuxea.master - 1.com ( 10.10 . 240.161 ): Get https: / / 10.10 . 240.161 : 10255 / stats / summary / : dial tcp 10.10 . 240.161 : 10255 : connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node - 1.com : unable to fetch metrics from Kubelet linuxea.node - 1.com ( 10.10 . 240.202 ): Get https: / / 10.10 . 240.202 : 10255 / stats / summary / : dial tcp 10.10 . 240.202 : 10255 : connect: connection refused] |
当时我按照网上的方法尝试修改coredns配置,结果搞的日志出现获取所有pod都unable,如下,然后又取消掉了修改,删掉了coredns,让他自己重新生成了俩新的coredns容器
- --kubelet-insecure-tls
这种方式是禁用tls验证,一般不建议在生产环境中使用。并且由于DNS是无法解析到这些主机名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
进行规避。还有另外一种方法,修改coredns,不过,我并不建议这样做。
参考这篇:https://github.com/kubernetes-incubator/metrics-server/issues/131
1 | metrics - server unable to fetch pdo metrics for pod |
以上为遇到的问题,反正用我上面的yaml绝对保证解决以上所有问题。还有那个flannel改了directrouting之后为啥每次重启集群机器,他就失效呢,我不得不在删掉flannel然后重新生成,这个问题前面文章写到了。
此时执行如下命令就都成功了,item里也有值了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | [root@master ~] # curl http://localhost:8080/apis/metrics.k8s.io/v1beta1 { "kind" : "APIResourceList" , "apiVersion" : "v1" , "groupVersion" : "metrics.k8s.io/v1beta1" , "resources" : [ { "name" : "nodes" , "singularName" : "", "namespaced" : false, "kind" : "NodeMetrics" , "verbs" : [ "get" , "list" ] }, { "name" : "pods" , "singularName" : "", "namespaced" : true, "kind" : "PodMetrics" , "verbs" : [ "get" , "list" ] } ] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [root@master metrics - server] # curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 14868 0 14868 0 0 1521k 0 - - : - - : - - - - : - - : - - - - : - - : - - 1613k { "kind" : "PodMetricsList" , "apiVersion" : "metrics.k8s.io/v1beta1" , "metadata" : { "selfLink" : "/apis/metrics.k8s.io/v1beta1/pods" }, "items" : [ { "metadata" : { "name" : "pod1" , "namespace" : "prod" , "selfLink" : "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1" , "creationTimestamp" : "2019-01-29T02:39:12Z" }, |
1 2 3 4 5 6 7 8 9 | [root@master metrics - server] # kubectl top pods NAME CPU(cores) MEMORY(bytes) filebeat - ds - 4llpp 1m 2Mi filebeat - ds - dv49l 1m 5Mi myapp - 0 0m 1Mi myapp - 1 0m 2Mi myapp - 2 0m 1Mi myapp - 3 0m 1Mi myapp - 4 0m 2Mi |
1 2 3 4 5 | [root@master metrics - server] # kubectl top nodes NAME CPU(cores) CPU % MEMORY(bytes) MEMORY % master 206m 5 % 1377Mi 72 % node1 88m 8 % 534Mi 28 % node2 78m 7 % 935Mi 49 % |
自定义指标(prometheus)
大家看到,我们的metrics已经可以正常工作了。不过,metrics只能监控cpu和内存,对于其他指标如用户自定义的监控指标,metrics就无法监控到了。这时就需要另外一个组件叫prometheus。
prometheus的部署非常麻烦。
node_exporter是agent;
PromQL相当于sql语句来查询数据;
k8s-prometheus-adapter:prometheus是不能直接解析k8s的指标的,需要借助k8s-prometheus-adapter转换成api
kube-state-metrics是用来整合数据的。
下面开始部署。
访问 https://github.com/ikubernetes/k8s-prom
1 | [root@master pro] # git clone https://github.com/iKubernetes/k8s-prom.git |
先创建一个叫prom的名称空间:
1 2 | [root@master k8s - prom] # kubectl apply -f namespace.yaml namespace / prom created |
部署node_exporter:
1 2 3 4 5 6 | [root@master k8s - prom] # cd node_exporter/ [root@master node_exporter] # ls node - exporter - ds.yaml node - exporter - svc.yaml [root@master node_exporter] # kubectl apply -f . daemonset.apps / prometheus - node - exporter created service / prometheus - node - exporter created |
1 2 3 4 5 | [root@master node_exporter] # kubectl get pods -n prom NAME READY STATUS RESTARTS AGE prometheus - node - exporter - dmmjj 1 / 1 Running 0 7m prometheus - node - exporter - ghz2l 1 / 1 Running 0 7m prometheus - node - exporter - zt2lw 1 / 1 Running 0 7m |
部署prometheus:
1 2 3 4 5 6 7 8 9 10 | [root@master k8s - prom] # cd prometheus/ [root@master prometheus] # ls prometheus - cfg.yaml prometheus - deploy.yaml prometheus - rbac.yaml prometheus - svc.yaml [root@master prometheus] # kubectl apply -f . configmap / prometheus - config created deployment.apps / prometheus - server created clusterrole.rbac.authorization.k8s.io / prometheus created serviceaccount / prometheus created clusterrolebinding.rbac.authorization.k8s.io / prometheus created service / prometheus created |
看prom名称空间中的所有资源: pod/prometheus-server-76dc8df7b-hw8xc 处于 Pending 状态,日志显示内存不足
1 2 | [root@master prometheus] # kubectl logs prometheus-server-556b8896d6-dfqkp -n prom Warning FailedScheduling 2m52s (x2 over 2m52s ) default - scheduler 0 / 3 nodes are available: 3 Insufficient memory. |
修改prometheus-deploy.yaml,删掉内存那三行
1 2 3 | resources: limits: memory: 2Gi |
重新apply
1 | [root@master prometheus] # kubectl apply -f prometheus-deploy.yaml |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@master prometheus] # kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod / prometheus - node - exporter - dmmjj 1 / 1 Running 0 10m pod / prometheus - node - exporter - ghz2l 1 / 1 Running 0 10m pod / prometheus - node - exporter - zt2lw 1 / 1 Running 0 10m pod / prometheus - server - 65f5d59585 - 6l8m8 1 / 1 Running 0 55s NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE service / prometheus NodePort 10.111 . 127.64 <none> 9090 : 30090 / TCP 56s service / prometheus - node - exporter ClusterIP None <none> 9100 / TCP 10m NAME DESIRED CURRENT READY UP - TO - DATE AVAILABLE NODE SELECTOR AGE daemonset.apps / prometheus - node - exporter 3 3 3 3 3 <none> 10m NAME DESIRED CURRENT UP - TO - DATE AVAILABLE AGE deployment.apps / prometheus - server 1 1 1 1 56s NAME DESIRED CURRENT READY AGE replicaset.apps / prometheus - server - 65f5d59585 1 1 1 56s |
上面我们看到通过NodePorts的方式,可以通过宿主机的30090端口,来访问prometheus容器里面的应用。
最好挂载个pvc的存储,要不这些监控数据过一会就没了。
部署kube-state-metrics,用来整合数据:
1 2 3 4 5 6 7 8 9 | [root@master k8s - prom] # cd kube-state-metrics/ [root@master kube - state - metrics] # ls kube - state - metrics - deploy.yaml kube - state - metrics - rbac.yaml kube - state - metrics - svc.yaml [root@master kube - state - metrics] # kubectl apply -f . deployment.apps / kube - state - metrics created serviceaccount / kube - state - metrics created clusterrole.rbac.authorization.k8s.io / kube - state - metrics created clusterrolebinding.rbac.authorization.k8s.io / kube - state - metrics created service / kube - state - metrics created |
1 2 3 4 5 | [root@master kube - state - metrics] # kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod / kube - state - metrics - 58dffdf67d - v9klh 1 / 1 Running 0 14m NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE service / kube - state - metrics ClusterIP 10.111 . 41.139 <none> 8080 / TCP 14m |
部署k8s-prometheus-adapter,这个需要自制证书:
1 2 3 4 5 6 | [root@master k8s - prometheus - adapter] # cd /etc/kubernetes/pki/ [root@master pki] # (umask 077; openssl genrsa -out serving.key 2048) Generating RSA private key, 2048 bit long modulus ........................................................................................... + + + ............... + + + e is 65537 ( 0x10001 ) |
证书请求:
1 | [root@master pki] # openssl req -new -key serving.key -out serving.csr -subj "/CN=serving" |
开始签证:
1 2 3 4 | [root@master pki] # openssl x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650 Signature ok subject = / CN = serving Getting CA Private Key |
创建加密的配置文件:
1 2 | [root@master pki] # kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n prom secret / cm - adapter - serving - certs created |
注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件里面的名字。
1 2 3 4 5 6 | [root@master pki] # kubectl get secrets -n prom NAME TYPE DATA AGE cm - adapter - serving - certs Opaque 2 51s default - token - knsbg kubernetes.io / service - account - token 3 4h kube - state - metrics - token - sccdf kubernetes.io / service - account - token 3 3h prometheus - token - nqzbz kubernetes.io / service - account - token 3 3h |
部署k8s-prometheus-adapter:
1 2 3 4 5 6 7 | [root@master k8s - prom] # cd k8s-prometheus-adapter/ [root@master k8s - prometheus - adapter] # ls custom - metrics - apiserver - auth - delegator - cluster - role - binding.yaml custom - metrics - apiserver - service.yaml custom - metrics - apiserver - auth - reader - role - binding.yaml custom - metrics - apiservice.yaml custom - metrics - apiserver - deployment.yaml custom - metrics - cluster - role.yaml custom - metrics - apiserver - resource - reader - cluster - role - binding.yaml custom - metrics - resource - reader - cluster - role.yaml custom - metrics - apiserver - service - account.yaml hpa - custom - metrics - cluster - role - binding.yaml |
由于k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解决办法就是访问https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下载最新版的custom-metrics-apiserver-deployment.yaml文件,并把里面的namespace的名字改成prom;同时还要下载custom-metrics-config-map.yaml文件到本地来,并把里面的namespace的名字改成prom。
1 2 3 4 5 6 7 8 9 10 11 | [root@master k8s - prometheus - adapter] # kubectl apply -f . clusterrolebinding.rbac.authorization.k8s.io / custom - metrics:system:auth - delegator created rolebinding.rbac.authorization.k8s.io / custom - metrics - auth - reader created deployment.apps / custom - metrics - apiserver created clusterrolebinding.rbac.authorization.k8s.io / custom - metrics - resource - reader created serviceaccount / custom - metrics - apiserver created service / custom - metrics - apiserver created apiservice.apiregistration.k8s.io / v1beta1.custom.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io / custom - metrics - server - resources created clusterrole.rbac.authorization.k8s.io / custom - metrics - resource - reader created clusterrolebinding.rbac.authorization.k8s.io / hpa - controller - custom - metrics created |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | [root@master k8s - prometheus - adapter] # kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod / custom - metrics - apiserver - 65f545496 - 64lsz 1 / 1 Running 0 6m pod / kube - state - metrics - 58dffdf67d - v9klh 1 / 1 Running 0 4h pod / prometheus - node - exporter - dmmjj 1 / 1 Running 0 4h pod / prometheus - node - exporter - ghz2l 1 / 1 Running 0 4h pod / prometheus - node - exporter - zt2lw 1 / 1 Running 0 4h pod / prometheus - server - 65f5d59585 - 6l8m8 1 / 1 Running 0 4h NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE service / custom - metrics - apiserver ClusterIP 10.103 . 87.246 <none> 443 / TCP 36m service / kube - state - metrics ClusterIP 10.111 . 41.139 <none> 8080 / TCP 4h service / prometheus NodePort 10.111 . 127.64 <none> 9090 : 30090 / TCP 4h service / prometheus - node - exporter ClusterIP None <none> 9100 / TCP 4h NAME DESIRED CURRENT READY UP - TO - DATE AVAILABLE NODE SELECTOR AGE daemonset.apps / prometheus - node - exporter 3 3 3 3 3 <none> 4h NAME DESIRED CURRENT UP - TO - DATE AVAILABLE AGE deployment.apps / custom - metrics - apiserver 1 1 1 1 36m deployment.apps / kube - state - metrics 1 1 1 1 4h deployment.apps / prometheus - server 1 1 1 1 4h NAME DESIRED CURRENT READY AGE replicaset.apps / custom - metrics - apiserver - 5f6b4d857d 0 0 0 36m replicaset.apps / custom - metrics - apiserver - 65f545496 1 1 1 6m replicaset.apps / custom - metrics - apiserver - 86ccf774d5 0 0 0 17m replicaset.apps / kube - state - metrics - 58dffdf67d 1 1 1 4h replicaset.apps / prometheus - server - 65f5d59585 1 1 1 4h |
最终看到prom名称空间里面的所有资源都是running状态了。
1 2 | [root@master k8s - prometheus - adapter] # kubectl api-versions custom.metrics.k8s.io / v1beta1 |
可以看到custom.metrics.k8s.io/v1beta1这个api了。我那没看到上面这个东西,但是不影响使用
开个代理:
1 | [root@master k8s - prometheus - adapter] # kubectl proxy --port=8080 |
可以看到指标数据了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | [root@master pki] # curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/ { "name" : "pods/ceph_rocksdb_submit_transaction_sync" , "singularName" : "", "namespaced" : true, "kind" : "MetricValueList" , "verbs" : [ "get" ] }, { "name" : "jobs.batch/kube_deployment_created" , "singularName" : "", "namespaced" : true, "kind" : "MetricValueList" , "verbs" : [ "get" ] }, { "name" : "jobs.batch/kube_pod_owner" , "singularName" : "", "namespaced" : true, "kind" : "MetricValueList" , "verbs" : [ "get" ] }, |
下面我们就可以愉快的创建HPA了(水平Pod自动伸缩)。
另外,prometheus还可以和grafana整合。如下步骤。
先下载文件grafana.yaml,访问https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml
1 | [root@master pro] # wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml |
修改grafana.yaml文件内容:
1 2 3 4 5 6 7 8 9 10 11 | 把namespace: kube - system改成prom,有两处; 把env里面的下面两个注释掉: - name: INFLUXDB_HOST value: monitoring - influxdb 在最有一行加个 type : NodePort ports: - port: 80 targetPort: 3000 selector: k8s - app: grafana type : NodePort |
1 2 3 | [root@master pro] # kubectl apply -f grafana.yaml deployment.extensions / monitoring - grafana created service / monitoring - grafana created |
1 2 3 | [root@master pro] # kubectl get pods -n prom NAME READY STATUS RESTARTS AGE monitoring - grafana - ffb4d59bd - gdbsk 1 / 1 Running 0 5s |
如果还有问题就删掉上面的那几个,重新在apply一下
看到grafana这个pod运行起来了。
1 2 3 | [root@master pro] # kubectl get svc -n prom NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE monitoring - grafana NodePort 10.106 . 164.205 <none> 80 : 32659 / TCP 19m |
我们可以访问宿主机master ip: http://172.16.1.100:32659
上图端口号是9090,根据自己svc实际端口去填写。除了把80 改成9090.其余不变,为什么是上面的格式,因为他们都处于一个名称空间内,可以通过服务名访问到的。
1 2 3 4 5 6 7 | [root@master pro] # kubectl get svc -n prom NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE custom - metrics - apiserver ClusterIP 10.109 . 58.249 <none> 443 / TCP 52m kube - state - metrics ClusterIP 10.103 . 52.45 <none> 8080 / TCP 69m monitoring - grafana NodePort 10.110 . 240.31 <none> 80 : 31128 / TCP 17m prometheus NodePort 10.110 . 19.171 <none> 9090 : 30090 / TCP 145m prometheus - node - exporter ClusterIP None <none> 9100 / TCP 146m |
然后,就能从界面上看到相应的数据了。
登录下面的网站下载个grafana监控k8s-prometheus的模板: https://grafana.com/dashboards/6417
然后再grafana的界面中导入上面下载的模板:
导入模板之后,就能看到监控数据了:
HPA的没去实际操作,因为以前自己做过了,就不做了,直接复制过来,如有问题自己单独解决
HPA(水平pod自动扩展)
当pod压力大了,会根据负载自动扩展Pod个数以均匀压力。
目前,HPA只支持两个版本,v1版本只支持核心指标的定义(只能根据cpu利用率的指标进行pod的扩展);
1 2 | [root@master pro] # kubectl explain hpa.spec.scaleTargetRef scaleTargetRef:表示基于什么指标来计算pod伸缩的标准 |
1 2 3 | [root@master pro] # kubectl api-versions |grep auto autoscaling / v1 autoscaling / v2beta1 |
上面看到分别支持hpav1和hpav2。
下面我们用命令行的方式重新创建一个带有资源限制的pod myapp:
1 2 3 | [root@master ~] # kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80 service / myapp created deployment.apps / myapp created |
1 2 3 | [root@master ~] # kubectl get pods NAME READY STATUS RESTARTS AGE myapp - 6985749785 - fcvwn 1 / 1 Running 0 58s |
下面我们让myapp 这个pod能自动水平扩展,用kubectl autoscale,其实就是指明HPA控制器的。
1 2 | [root@master ~] # kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60 horizontalpodautoscaler.autoscaling / myapp autoscaled |
--min:表示最小扩展pod的个数
--max:表示最多扩展pod的个数
--cpu-percent:cpu利用率
1 2 3 | [root@master ~] # kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp Deployment / myapp 0 % / 60 % 1 8 1 4m |
1 2 3 | [root@master ~] # kubectl get svc NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE myapp ClusterIP 10.105 . 235.197 <none> 80 / TCP 19 |
下面我们把service改成NodePort的方式:
1 2 | [root@master ~] # kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}' service / myapp patched |
1 2 3 | [root@master ~] # kubectl get svc NAME TYPE CLUSTER - IP EXTERNAL - IP PORT(S) AGE myapp NodePort 10.105 . 235.197 <none> 80 : 31990 / TCP 22m |
1 | [root@master ~] # yum install httpd-tools #主要是为了安装ab压测工具 |
1 2 3 | [root@master ~] # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp - 6985749785 - fcvwn 1 / 1 Running 0 25m 10.244 . 2.84 node2 |
开始用ab工具压测
1 2 3 4 5 | [root@master ~] # ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html This is ApacheBench, Version 2.3 <$Revision: 1430300 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http: / / www.zeustech.net / Licensed to The Apache Software Foundation, http: / / www.apache.org / Benchmarking 172.16 . 1.100 (be patient) |
多等一会,会看到pods的cpu利用率为98%,需要扩展为2个pod了:
1 2 3 | [root@master ~] # kubectl describe hpa resource cpu on pods (as a percentage of request): 98 % ( 49m ) / 60 % Deployment pods: 1 current / 2 desired |
1 2 3 | [root@master ~] # kubectl top pods NAME CPU(cores) MEMORY(bytes) myapp - 6985749785 - fcvwn 49m (我们设置的总cpu是 50m ) 3Mi |
1 2 3 4 | [root@master ~] # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp - 6985749785 - fcvwn 1 / 1 Running 0 32m 10.244 . 2.84 node2 myapp - 6985749785 - sr4qv 1 / 1 Running 0 2m 10.244 . 1.105 node1 |
上面我们看到已经自动扩展为2个pod了,再等一会,随着cpu压力的上升,还会看到自动扩展为4个或更多的pod:
1 2 3 4 5 6 | [root@master ~] # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp - 6985749785 - 2mjrd 1 / 1 Running 0 1m 10.244 . 1.107 node1 myapp - 6985749785 - bgz6p 1 / 1 Running 0 1m 10.244 . 1.108 node1 myapp - 6985749785 - fcvwn 1 / 1 Running 0 35m 10.244 . 2.84 node2 myapp - 6985749785 - sr4qv 1 / 1 Running 0 5m 10.244 . 1.105 node1 |
等压测一停止,pod个数还会收缩为正常个数的。
上面我们用的是hpav1来做的水平pod自动扩展的功能,我们前面也说过,hpa v1版本只能根据cpu利用率括水平自动扩展pod。
下面我们介绍一下hpa v2的功能,它可以根据自定义指标利用率来水平扩展pod。
在使用hpa v2版本前,我们先把前面创建的hpa v1版本删除了,以免和我们测试的hpa v2版本冲突:
1 2 | [root@master hpa] # kubectl delete hpa myapp horizontalpodautoscaler.autoscaling "myapp" deleted |
好了,下面我们创建一个hpa v2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | [root@master hpa] # cat hpa-v2-demo.yaml apiVersion: autoscaling / v2beta1 #从这可以看出是hpa v2版本 kind: HorizontalPodAutoscaler metadata: name: myapp - hpa - v2 spec: scaleTargetRef: #根据什么指标来做评估压力 apiVersion: apps / v1 #对谁来做自动扩展 kind: Deployment name: myapp minReplicas: 1 #最少副本数量 maxReplicas: 10 metrics: #表示依据哪些指标来进行评估 - type : Resource #表示基于资源进行评估 resource: name: cpu targetAverageUtilization: 55 #表示pod cpu使用率超过55%,就自动水平扩展pod个数 - type : Resource resource: name: memory #我们知道hpa v1版本只能根据cpu来进行评估,而到了我们的hpa v2版本就可以根据内存来进行评估了 targetAverageValue: 50Mi #表示pod内存使用超过50M,就自动水平扩展pod个数 |
1 2 | [root@master hpa] # kubectl apply -f hpa-v2-demo.yaml horizontalpodautoscaler.autoscaling / myapp - hpa - v2 created |
1 2 3 | [root@master hpa] # kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp - hpa - v2 Deployment / myapp 3723264 / 50Mi , 0 % / 55 % 1 10 1 37s |
我们看到现在只有一个pod
1 2 3 | [root@master hpa] # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp - 6985749785 - fcvwn 1 / 1 Running 0 57m 10.244 . 2.84 node2 |
开始压测:
1 | [root@master ~] # ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html |
看hpa v2的检测情况:
1 2 3 4 5 6 7 | [root@master hpa] # kubectl describe hpa Metrics: ( current / target ) resource memory on pods: 3756032 / 50Mi resource cpu on pods (as a percentage of request): 82 % ( 41m ) / 55 % Min replicas: 1 Max replicas: 10 Deployment pods: 1 current / 2 desired |
1 2 3 4 | [root@master hpa] # kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp - 6985749785 - 8frq4 1 / 1 Running 0 1m 10.244 . 1.109 node1 myapp - 6985749785 - fcvwn 1 / 1 Running 0 1h 10.244 . 2.84 node2 |
看到自动扩展出了2个Pod。等压测一停止,pod个数还会收缩为正常个数的。
将来我们不光可以用hpa v2,根据cpu和内存使用率进行伸缩Pod个数,还可以根据http并发量等。
比如下面的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | [root@master hpa] # cat hpa-v2-custom.yaml apiVersion: autoscaling / v2beta1 #从这可以看出是hpa v2版本 kind: HorizontalPodAutoscaler metadata: name: myapp - hpa - v2 spec: scaleTargetRef: #根据什么指标来做评估压力 apiVersion: apps / v1 #对谁来做自动扩展 kind: Deployment name: myapp minReplicas: 1 #最少副本数量 maxReplicas: 10 metrics: #表示依据哪些指标来进行评估 - type : Pods #表示基于资源进行评估 pods: metricName: http_requests #自定义的资源指标 targetAverageValue: 800m #m表示个数,表示并发数800 |
关于并发数的hpa,具体镜像可以参考https://hub.docker.com/r/ikubernetes/metrics-app/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
2018-01-29 Django-1.10支持中文用户注册登录