prometheus学习笔记之cAdvisor
一、cAdvisor简介
监控Pod指标数据需要使⽤cadvisor, cadvisor由⾕歌开源, cadvisor不仅可以搜集⼀台机器上所有运⾏的容器信息,还提供基础查询界⾯和http接⼝,⽅便其他组件如Prometheus进⾏数据抓取
cAdvisor可以对节点机器上的资源及容器进⾏实时监控和性能数据采集,包括CPU使⽤情况、内存使⽤情况、⽹络吞吐量及⽂件系统使⽤情况。
二、DaemonSet部署cAdvisor
1.准备清单文档
清单文件参考:https://github.com/google/cadvisor/tree/master/deploy/kubernetes/base
清单文件使用了kustomize配置,我这省略了,配置文件如下
apiVersion: v1
kind: Namespace
metadata:
name: cadvisor #自定义了名称空间,按需修改
---
apiVersion: apps/v1 # for Kubernetes versions before 1.9.0 use apps/v1beta2
kind: DaemonSet
metadata:
name: cadvisor
namespace: cadvisor
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
selector:
matchLabels:
name: cadvisor
template:
metadata:
labels:
name: cadvisor
spec:
tolerations: #污点容忍,忽略master的NoSchedule,具体污点可以通过descript命令查看
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane #你的污点未必和我一致,请确认
hostNetwork: true
containers:
- name: cadvisor
image: gcr.io/cadvisor/cadvisor:v0.39.3 #默认国内无法下载,需要自行解决
resources:
requests:
memory: 400Mi
cpu: 400m
limits:
memory: 2000Mi
cpu: 800m
securityContext:
privileged: true #需要开启特权模式
volumeMounts: #删除readOnly挂载选项
- name: rootfs
mountPath: /rootfs
- name: var-run
mountPath: /var/run
- name: sys
mountPath: /sys
- name: docker
mountPath: /var/lib/docker
ports:
- name: http
containerPort: 8080
hostPort: 8080 #如果不指定则和容器的port保持一致,看实际情况修改
protocol: TCP
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/containerd/ #应为我的runc用的containerd,如果是docker,改成/var/lib/docker即可
2.应用清单配置
kubectl apply -f daemonset.yaml
kubectl get pods -n cadvisor
NAME READY STATUS RESTARTS AGE
cadvisor-5d2wq 1/1 Running 0 5m
cadvisor-lgb2b 1/1 Running 0 5m
cadvisor-wsvh7 1/1 Running 0 5m
netstat -tnlp|grep 8080 #与清单的hostPort保持一致
3.访问web界面验证
访问集群节点的8080端口
查看 metrics 接口
三、cadvisor常用指标数据及示例
常用示例
(1)获取容器CPU使用率
sum(irate(container_cpu_usage_seconds_total{image!=""}[1m])) without (cpu)
(2)查询容器内存使用量(单位:字节)
container_memory_usage_bytes{image!=""}
(3)查询容器网络接收量(速率)(单位:字节/秒)
sum(rate(container_network_receive_bytes_total{image!=""}[1m])) without(interface)
(4)容器网络传输量 字节/秒
sum(rate(container_network_transmit_bytes_total{image!=""}[1m])) without(interface)
(5)容器文件系统读取速率 字节/秒
sum(rate(container_fs_reads_bytes_total{image!=""}[1m])) without (device)
(6)容器文件系统写入速率 字节/秒
sum(rate(container_fs_writes_bytes_total{image!=""}[1m])) without (device)
(7)容器网络接收的字节数(1分钟内),根据名称查询 name=~".+"
sum(rate(container_network_receive_bytes_total{name=~".+"}[1m])) by (name)
(8)容器网络传输的字节数(1分钟内),根据名称查询 name=~".+"
sum(rate(container_network_transmit_bytes_total{name=~".+"}[1m])) by (name)
(9)所用容器system cpu的累计使用时间(1min内)
sum(rate(container_cpu_system_seconds_total[1m]))
(10)每个容器system cpu的使用时间(1min内)
sum(irate(container_cpu_system_seconds_total{image!=""}[1m])) without (cpu)
(11)每个容器的cpu使用率
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name) * 100
(12)总容器的cpu使用率
sum(sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name) * 100)
四、配置prometheus采集cadvisor
1.配置prometheus
- job_name: "cadvisor" metric_relabel_configs: #将name的值替换为container_label_io_kubernetes_pod_name,因为name的值默认为容器的ID不是pod的名称,grafana中应用的也是name字段,如果不改不方便pod查看 - source_labels: ['container_label_io_kubernetes_pod_name'] target_label: 'name' action: 'replace' static_configs: - targets: ["192.168.100.131:8080","192.168.100.132:8080","192.168.100.133:8080"] curl -X POST http://127.0.0.1:9090/-/reload #如果没有配置热更新则需要重启
2.prometheus验证cadvisor数据
五、grafana配置 模板监控pod
1.创建新的dashboard
2.导入对应的模板,这来使用的模板ID为14282
3.查看dashboard数据
"一劳永逸" 的话,有是有的,而 "一劳永逸" 的事却极少