prometheus结合node_expoter及cadvisor实现对node及pod资源监控

1.1#prometheus介绍
https://songjiayang.gitbooks.io/prometheus/content/introduction/
#容器监控的实现⽅对⽐虚拟机或者物理机来说有很⼤的区别，⽐如容器在k8s环境中可以任意横向扩容与缩容，那么就需要监控服务能够⾃动对新创建的容器进⾏监控，当容器删除后⼜能够及时的从监控服务中删除，⽽传统的zabbix的监控⽅式需要在每⼀个容器中安装启动agent，并且在容器⾃动发现注册及模板关联⽅⾯并没有⽐较好的实现⽅式。

#Prometheus：k8s的早期版本基于组件heapster实现对pod和node节点的监控功能，但是从k8s 1.8版本开始使⽤metrics API的⽅式监控，并在1.11版本正式将heapster替换，后期的k8s监控主要是通过metricsServer提供核⼼监控指标，⽐如Node节点的CPU和内存使⽤率，其他的监控交由另外⼀个组件Prometheus 完成。

#Prometheus是基于go语⾔开发的⼀套开源的监控、报警和时间序列数据库的组合，是由SoundCloud公司开发的开源监控系统,prometheus在容器和微服务领域中得到了⼴泛的应⽤，其特点主要如下：
使⽤key-value的多维度格式保存数据
数据不使⽤MySQL这样的传统数据库，⽽是使⽤时序数据库，⽬前是使⽤的TSDB
⽀持第三⽅dashboard实现更⾼的图形界⾯，如grafana(Grafana2.5.0版本及以上)
功能组件化
不需要依赖存储，数据可以本地保存也可以远程保存
服务⾃动化发现
强⼤的数据查询语句功(PromQL,Prometheus Query Language)

1.2#组件介绍
prometheus server：主服务，接受外部http请求，收集、存储与查询数据等
prometheus targets: 静态收集的⽬标服务数据
service discovery：动态发现服务
prometheus alerting：报警通知
push gateway：数据收集代理服务器(类似于zabbix proxy)
data visualization and export：数据可视化与数据导出(访问客户端)

1.3#prometheus 安装⽅式：
https://prometheus.io/download/ #官⽅⼆进制下载及安装，prometheus server的监听端⼝为9090
https://prometheus.io/docs/prometheus/latest/installation/ #docker镜像直接启动
https://github.com/coreos/kube-prometheus #operator部署
# apt install prometheus #使⽤apt或者yum安装

1.3.1#容器安装方式
#kubectl port-forward --help
# kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/prometheus-k8s 9090:9090

#网页访问http://192.168.7.101:9090

#kubectl --namespace monitoring port-forward --address 0.0.0.0 svc/grafana 3000:3000

#网页访问http://192.168.7.101:3000

#基于NodePort暴露服务：
prometheus：
# pwd
/root/kube-prometheus/manifests
# vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 39090
selector:
app: prometheus
prometheus: k8s
sessionAffinity: ClientIP
# kubectl apply -f prometheus-service.yaml

grafana：
# pwd
/root/kube-prometheus/manifests
# vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 3000
targetPort: http
nodePort: 33000
selector:
app: grafana
# kubectl apply -f grafana-service.yaml

1.3.2#⼆进制⽅式安装：
# pwd
/usr/local/src
# tar xvf prometheus-2.24.1.linux-amd64.tar.gz
# ln -sv /apps/prometheus-2.24.1.linux-amd64/apps/prometheus
'/apps/prometheus' -> '/apps/prometheus-2.24.1.linux-amd64'
# cd /apps/prometheus
# ll
prometheus.yml #配置⽂件
prometheus #prometheus服务可执⾏程序
promtool #测试⼯具，⽤于检测配置prometheus配置⽂件、检测
metrics数据等
#./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 0 rule files found

#创建prometheus启动脚本：
# vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introductio
n/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --
config.file=/apps/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target

#启动prometheus服务：
# systemctl daemon-reload && systemctl restartprometheus && systemctl enable prometheus

#web访问prometheus
http://ip:9090

2.#prometheus通过node exporter采集node监控数据
#各node节点安装node_exporter，⽤于收集各k8s node节点上的监控指标数据，默认监听端⼝为9100

2.1#⼆进制⽅式安装node exporter：
# mkdir /apps
# cd /apps/
# tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
# ln -sv /apps/node_exporter-1.0.1.linux-amd64 /apps/node_exporter
# cd /apps/node_exporter
node_exporter #可执⾏程序

2.2#创建node exporter启动脚本：
# vim /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target

2.3#启动node exporter服务：
# systemctl daemon-reload && systemctl restart node-exporter && systemctl enable node-exporter.service

2.4#访问node exporter web界⾯：
https://172.31.7.112:9100

2.5#prometheus采集node指标数据
配置prometheus通过node exporter采集监控指标数据

2.5.1#prometheus默认配置⽂件：
# my global config
global: #
scrape_interval: 15s # Set the scrape interval toevery 15 seconds. Default is every 1 minute. #数据收集间隔时间，如果不配置默认为⼀分钟
evaluation_interval: 15s # Evaluate rules every 15seconds. The default is every 1 minute. #规则扫描间隔时间，如果不配置默认为⼀分钟
# scrape_timeout is set to the global default(10s).
# Alertmanager configuration
alerting: #报警通知配置
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate themaccording to the global 'evaluation_interval'.
rule_files: #规则配置
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly oneendpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #数据采集⽬标配置
# The job name is added as a label `job=<job_name>` to any timeseries scraped from thisconfig.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']

2.5.2#prometheus收集node数据：
root@prometheus-server:~# cat
/apps/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval
to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15
seconds. The default is every 1 minute.
# scrape_timeout is set to the global default
(10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them
according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one
endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=
<job_name>` to any timeseries scraped from this
config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'promethues-node'
static_configs:
- targets:['172.31.7.111:9100','172.31.7.112:9100']
root@prometheus-server:/apps/prometheus# systemctl restart prometheus.service

2.5.3#重启prometheus服务：
# systemctl restart prometheus

2.5.4#prometheus验证node节点状态和node节点监控数据：
网页访问https://172.31.7.161:9090

3.#grafana：
https://grafana.com/docs/ #官⽅安装⽂档
调⽤prometheus的数据，进⾏更专业的可视化

3.1#安装grafana：
安装版本： v7.3.7
# pwd
/usr/local/src
# apt-get install -y adduser libfontconfig1
# dpkg -i grafana_<VERSION>_amd64.deb
# apt --fix-broken install -y

3.2#配置⽂件：
# vim /etc/grafana/grafana.ini
[server]
# Protocol (http, https, socket)
protocol = http
# The ip address to bind to, empty will bind to all
interfaces
http_addr = 0.0.0.0
# The http port to use
http_port = 3000

3.3#启动grafana：
# systemctl start grafana-server.service
# systemctl enable grafana-server.service

3.4#grafana web界⾯：
3.4.1#登录界面
https://172.31.7.162:3000

3.4.2#添加prometheus数据源：
web界面--设置--data source--add data source--prometheus--select--配置地址--save&test

3.4.3#import模板：
导⼊模板查看web，选择自己喜欢的模板，记住id号，在grafana填入id导入模板

3.4.4#验证图形信息
饼图插件未安装，需要提前安装
https://grafana.com/grafana/plugins/grafana-piechart-panel
在线安装：
# grafana-cli plugins install grafana-piechart-panel
离线安装：
# pwd
/var/lib/grafana/plugins
# unzip grafana-piechart-panel-v1.3.8-0-g4f34110.zip
# mv grafana-piechart-panel-4f34110 grafanapiechart-panel
# systemctl restart grafana-server

4#监控pod资源：
#cadvisor由⾕歌开源，cadvisor不仅可以搜集⼀台机器上所有运⾏的容器信息，还提供基础查询界⾯和http接⼝，⽅便其他组件如Prometheus进⾏数据抓取，cAdvisor可以对节点机器上的资源及容器进⾏实时监控和性能数据采集，包括CPU使⽤情况、内存使⽤情况、⽹络吞吐量及⽂件系统使⽤情况。

#k8s 1.12之前cadvisor集成在node节点的上kubelet服务中，从1.12版本开始分离为两个组件，因此需要在node节点单独部署cadvisor。

https://github.com/google/cadvisor

4.1#cadvisor镜像准备：
# docker load -i cadvisor-v0.38.7.tar.gz
# docker tag gcr.io/cadvisor/cadvisor:v0.38.7 harbor.magedu.net/linux46/cadvisor:v0.38.7
# docker push harbor.magedu.net/linux46/cadvisor:v0.38.7

4.2#启动cadvisor容器
# docker run -it -d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
harbor.magedu.net/linux46/cadvisor:v0.38.7

4.3#验证cadvisor web界⾯：
访问node节点的cadvisor监听端⼝：http://192.168.7.110:8080/

4.4#prometheus采集cadvisor数据：
root@k8s-master2:~# vim /usr/local/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape intervalto every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15seconds. The default is every 1 minute.
# scrape_timeout is set to the global default(10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from thisconfig.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']

- job_name: 'promethues-node'
static_configs:
- targets: ['172.31.7.111:9100','172.31.7.112:9100']

- job_name: 'prometheus-containers'
static_configs:
- targets: ["172.31.7.111:8080","172.31.7.112:8080"]

4.5# 重启prometheus：
# systemctl restart prometheus

4.6#验证prometheus数据：
https://172.31.7.161:9000

4.7#grafana添加pod监控模板：
395、893 容器模板ID

posted @ 2023-10-08 09:13 小糊涂90 阅读(633) 评论(0) 收藏举报

刷新页面返回顶部

小糊涂

为什么大多数人宁愿吃生活的苦，也不愿吃学习的苦？学习的苦需要主动去吃，生活的苦，你躺着不动它就来了~~~

prometheus结合node_expoter及cadvisor实现对node及pod资源监控

公告