prometheus结合node_expoter及cadvisor实现对node及pod资源监控
1.1#prometheus介绍
https://songjiayang.gitbooks.io/prometheus/content/introduction/
#容器监控的实现⽅对⽐虚拟机或者物理机来说有很⼤的区别,⽐如容器在k8s环境中可以任意横向扩容与缩容,那么就需要监控服务能够⾃动对新创建的容器进⾏监控,当容器删除后⼜能够及时的从监控服务中删除,⽽传统的zabbix的监控⽅式需要在每⼀个容器中安装启动agent,并且在容器⾃动发现注册及模板关联⽅⾯并没有⽐较好的实现⽅式。
#Prometheus:k8s的早期版本基于组件heapster实现对pod和node节点的监控功能,但是从k8s 1.8版本开始使⽤metrics API的⽅式监控,并在1.11版本 正式将heapster替换,后期的k8s监控主要是通过metricsServer提供核⼼监控指标,⽐如Node节点的CPU和内存使⽤率,其他的监控交由另外⼀个组件Prometheus 完成。
#Prometheus是基于go语⾔开发的⼀套开源的监控、报警和时间序列数据库的组合,是由SoundCloud公司开发的开源监控系统,prometheus在容器和微服务领域中得到了⼴泛的应⽤,其特点主要如下:
使⽤key-value的多维度格式保存数据
数据不使⽤MySQL这样的传统数据库,⽽是使⽤时序数据库,⽬前是使⽤的TSDB
⽀持第三⽅dashboard实现更⾼的图形界⾯,如grafana(Grafana2.5.0版本及以上)
功能组件化
不需要依赖存储,数据可以本地保存也可以远程保存
服务⾃动化发现
强⼤的数据查询语句功(PromQL,Prometheus Query Language)
1.2#组件介绍
prometheus server:主服务,接受外部http请求,收集、存储与查询数据等
prometheus targets: 静态收集的⽬标服务数据
service discovery:动态发现服务
prometheus alerting:报警通知
push gateway:数据收集代理服务器(类似于zabbix proxy)
data visualization and export: 数据可视化与数据导出(访问客户端)
1.3#prometheus 安装⽅式:
https://prometheus.io/download/ #官⽅⼆进制下载及安装,prometheus server的监听端⼝为9090
https://prometheus.io/docs/prometheus/latest/installation/ #docker镜像直接启动
https://github.com/coreos/kube-prometheus #operator部署
# apt install prometheus #使⽤apt或者yum安装
1.3.1#容器安装方式
#kubectl port-forward --help
# kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090:9090
#网页访问http://192.168.7.101:9090
#kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/grafana 3000:3000
#网页访问http://192.168.7.101:3000
#基于NodePort暴露服务:
prometheus:
# pwd
/root/kube-prometheus/manifests
# vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 39090
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
# kubectl apply -f prometheus-service.yaml
grafana:
# pwd
/root/kube-prometheus/manifests
# vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 3000
targetPort: http
nodePort: 33000
selector:
app: grafana
# kubectl apply -f grafana-service.yaml
1.3.2#⼆进制⽅式安装:
# pwd
/usr/local/src
# tar xvf prometheus-2.24.1.linux-amd64.tar.gz
# ln -sv /apps/prometheus-2.24.1.linux-amd64/apps/prometheus
'/apps/prometheus' -> '/apps/prometheus-2.24.1.linux-amd64'
# cd /apps/prometheus
# ll
prometheus.yml #配置⽂件
prometheus #prometheus服务可执⾏程序
promtool #测试⼯具,⽤于检测配置prometheus配置⽂件、检测
metrics数据等
#./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found
#创建prometheus启动脚本:
# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introductio
n/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --
config.file=/apps/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
#启动prometheus服务:
# systemctl daemon-reload && systemctl restartprometheus && systemctl enable prometheus
#web访问prometheus
http://ip:9090
2.#prometheus通过node exporter采集node监控数据
#各node节点安装node_exporter,⽤于收集各k8s node节点上的监控指标数据,默认监听端⼝为9100
2.1#⼆进制⽅式安装node exporter:
# mkdir /apps
# cd /apps/
# tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
# ln -sv /apps/node_exporter-1.0.1.linux-amd64 /apps/node_exporter
# cd /apps/node_exporter
node_exporter #可执⾏程序
2.2#创建node exporter启动脚本:
# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
2.3#启动node exporter服务:
# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service
2.4#访问node exporter web界⾯:
https://172.31.7.112:9100
2.5#prometheus采集node指标数据
配置prometheus通过node exporter采集 监控指标数据
2.5.1#prometheus默认配置⽂件:
# my global config
global: #
scrape_interval: 15s # Set the scrape interval toevery 15 seconds. Default is every 1 minute. #数据收集间隔时间,如果不配置默认为⼀分钟
evaluation_interval: 15s # Evaluate rules every 15seconds. The default is every 1 minute. #规则扫描间隔时间,如果不配置默认为⼀分钟
# scrape_timeout is set to the global default(10s).
# Alertmanager configuration
alerting: #报警通知配置
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate themaccording to the global 'evaluation_interval'.
rule_files: #规则配置
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly oneendpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #数据采集⽬标配置
# The job name is added as a label `job=<job_name>` to any timeseries scraped from thisconfig.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
2.5.2#prometheus收集node数据:
root@prometheus-server:~# cat
/apps/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval
to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15
seconds. The default is every 1 minute.
# scrape_timeout is set to the global default
(10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them
according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one
endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=
<job_name>` to any timeseries scraped from this
config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'promethues-node'
static_configs:
- targets:['172.31.7.111:9100','172.31.7.112:9100']
root@prometheus-server:/apps/prometheus# systemctl restart prometheus.service
2.5.3#重启prometheus服务:
# systemctl restart prometheus
2.5.4#prometheus验证node节点状态和node节点监控数据:
网页访问https://172.31.7.161:9090
3.#grafana:
https://grafana.com/docs/ #官⽅安装⽂档
调⽤prometheus的数据,进⾏更专业的可视化
3.1#安装grafana:
安装版本: v7.3.7
# pwd
/usr/local/src
# apt-get install -y adduser libfontconfig1
# dpkg -i grafana_<VERSION>_amd64.deb
# apt --fix-broken install -y
3.2#配置⽂件:
# vim /etc/grafana/grafana.ini
[server]
# Protocol (http, https, socket)
protocol = http
# The ip address to bind to, empty will bind to all
interfaces
http_addr = 0.0.0.0
# The http port to use
http_port = 3000
3.3#启动grafana:
# systemctl start grafana-server.service
# systemctl enable grafana-server.service
3.4#grafana web界⾯:
3.4.1#登录界面
https://172.31.7.162:3000
3.4.2#添加prometheus数据源:
web界面--设置--data source--add data source--prometheus--select--配置地址--save&test
3.4.3#import模板:
导⼊模板查看web,选择自己喜欢的模板,记住id号,在grafana填入id导入模板
3.4.4#验证图形信息
饼图插件未安装,需要提前安装
https://grafana.com/grafana/plugins/grafana-piechart-panel
在线安装:
# grafana-cli plugins install grafana-piechart-panel
离线安装:
# pwd
/var/lib/grafana/plugins
# unzip grafana-piechart-panel-v1.3.8-0-g4f34110.zip
# mv grafana-piechart-panel-4f34110 grafanapiechart-panel
# systemctl restart grafana-server
4#监控pod资源:
#cadvisor由⾕歌开源,cadvisor不仅可以搜集⼀台机器上所有运⾏的容器信息,还提供基础查询界⾯和http接⼝,⽅便其他组件如Prometheus进⾏数据抓取,cAdvisor可以对节点机器上的资源及容器进⾏实时监控和性能数据采集,包括CPU使⽤情况、内存使⽤情况、⽹络吞吐量及⽂件系统使⽤情况。
#k8s 1.12之前cadvisor集成在node节点的上kubelet服务中,从1.12版本开始分离为两个组件,因此需要在node节点单独部署cadvisor。
https://github.com/google/cadvisor
4.1#cadvisor镜像准备:
# docker load -i cadvisor-v0.38.7.tar.gz
# docker tag gcr.io/cadvisor/cadvisor:v0.38.7 harbor.magedu.net/linux46/cadvisor:v0.38.7
# docker push harbor.magedu.net/linux46/cadvisor:v0.38.7
4.2#启动cadvisor容器
# docker run -it -d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
harbor.magedu.net/linux46/cadvisor:v0.38.7
4.3#验证cadvisor web界⾯:
访问node节点的cadvisor监听端⼝:http://192.168.7.110:8080/
4.4#prometheus采集cadvisor数据:
root@k8s-master2:~# vim /usr/local/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape intervalto every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15seconds. The default is every 1 minute.
# scrape_timeout is set to the global default(10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from thisconfig.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'promethues-node'
static_configs:
- targets: ['172.31.7.111:9100','172.31.7.112:9100']
- job_name: 'prometheus-containers'
static_configs:
- targets: ["172.31.7.111:8080","172.31.7.112:8080"]
4.5# 重启prometheus:
# systemctl restart prometheus
4.6#验证prometheus数据:
https://172.31.7.161:9000
4.7#grafana添加pod监控模板:
395、893 容器模板ID
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 葡萄城 AI 搜索升级:DeepSeek 加持,客户体验更智能
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏