Prometheus(三)搭建prometheus联邦集群和VictoriaMetrics存储集群

一、实现基于prometheus联邦收集node的指标数据

1.1 部署prometheusprometheus server、联邦节点方法相同

下载

mkdir /apps
cd /apps
wget https://github.com/prometheus/prometheus/releases/download/v2.40.7/prometheus-2.40.7.linux-amd64.tar.gz
tar -xvf prometheus-2.40.7.linux-amd64.tar.gz
ln -s /apps/prometheus-2.40.7.linux-amd64 /apps/prometheus

启动prometheus服务

cat >>/etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml --web.enable-lifecycle
[Install]
WantedBy=multi-user.target
EOF

启动服务

systemctl daemon-reload
systemctl enable --now prometheus.service

1.2 部署node_exporter

说明:若k8s环境中已通过其他方式部署prometheus node-exporter,需先停止或更改监听端口,防止端口冲突

1.2.1 下载二进制程序

下载地址:https://github.com/prometheus/node_exporter/releases

mkdir /apps
cd /apps
wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
tar -xvf node_exporter-1.4.0.linux-amd64.tar.gz
ln -s /apps/node_exporter-1.4.0.linux-amd64 /apps/node_exporter

1.2.2 创建service

cat >>/etc/systemd/system/node-exporter.service <<EOF
[Unit]
Description=Prometheus Node Exporter
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
EOF

1.2.3 启动node-exporter服务

systemctl daemon-reload
systemctl enable --now node-exporter.service

1.2.4 验证状态

# 查看服务状态
[root@k8s-node1 apps]#systemctl is-active node-exporter.service
active
# 查看监听端口
[root@k8s-node1 apps]#netstat -ntlp|grep 9100
tcp6 0 0 :::9100 :::* LISTEN 3276156/node_export

1.2.5 验证node-exporter web页面

1.2.6 查看node-exporter指标数据

https://knowledge.zhaoweiguo.com/build/html/cloudnative/prometheus/metrics/kubernetes-nodes.html

1.2.7 常见指标说明

node_boot_time 系统自启动以后的总运行时间
node_cpu 系统CPU使用量
node_disk* 磁盘IO
node_filesystem* 系统文件使用量
node_load1 系统CPU负载
node_memory* 内存使用量
node_network* 网络带宽指标
go_* node exporter中go相关指标
process_* node exporter自身进程相关运行指标

1.3 联邦节点配置监控

联邦节点1监控node1

vim /apps/prometheus/prometheus.yml
......
- job_name: "prometheus-node1"
static_configs:
- targets: ["10.0.0.84:9100"]

联邦节点2监控node2

vim /apps/prometheus/prometheus.yml
......
- job_name: "prometheus-node2"
static_configs:
- targets: ["10.0.0.85:9100"]

验证

1.4 server采集联邦节点

数据采集配置

scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "prometheus-federate-1-82"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '10.0.0.82:9090'
- job_name: "prometheus-federate-2-83"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '10.0.0.83:9090'
#k8s集群prometheus
- job_name: "prometheus-k8s-11"
scrape_interval: 10s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- '10.0.0.11:9090'

验证状态

​​

查询指标数据

​​

1.5 二进制部署grafana

下载:https://grafana.com/grafana/download

国内镜像源下载:https://mirrors.tuna.tsinghua.edu.cn/grafana/

安装说明:https://grafana.com/docs/grafana/latest/setup-grafana/installation/

1.5.1 下载并安装

wget https://mirrors.tuna.tsinghua.edu.cn/grafana/apt/pool/main/g/grafana-enterprise/grafana-enterprise_9.3.0_amd64.deb
apt update
apt-get install -y adduser libfontconfig1
dpkg -i grafana-enterprise_9.3.0_amd64.deb

1.5.2 修改grafana配置文件

vim /etc/grafana/grafana.ini
......
# 配置端口类型、地址、端口号
[server]
protocol = http
http_addr = 10.0.0.62
http_port = 3000

1.5.3 启动服务

systemctl enable grafana-server.service
systemctl restart grafana-server.service

查看端口

[root@grafana opt]#netstat -ntlp|grep 3000
tcp 0 0 10.0.0.62:3000 0.0.0.0:* LISTEN 5268/grafana-server

1.5.4 验证grafana web界面

  1. 登录http://10.0.0.62:3000
  1. 进入首页

1.5.5 添加数据源

选择prometheus

设置数据源名称,访问prometheus server的URL地址

配置数据源信息如下:

1.5.6 导入模板

11074

8919

二、总结prometheus单机存储、实现victoriametrics单机远程存储

2.1 prometheus单机存储

Prometheus有着非常高效的时间序列数据存储方法,每个采样数据仅仅占用3.5byte左右空间,上百万条时间序列,30秒间隔,保留60天,大概200多G空间。

2.1.1 本地存储简介

默认情况下,prometheus将采集到的数据存储在本地的TSDB数据库中,路径默认为prometheus安装目录的data目录,数据写入过程为先把数据写入wal日志并放在内存,然后2小时后将内存数据保存至一个新的block块,同时再把新采集的数据写入内存并在2小时后再保存至一个新的block 块,以此类推。

2.1.2 block简介

每个block为一个data目录中以01开头的存储目录

2.1.3 block特性

block会压缩、合并历史数据块,已经删除过期的块,随着压缩、合并,block的数量会减少,在压缩过程中会发生三件事:定期执行压缩、合并小的block到大的block、清理过期的块。

每个块有4部分组成:

~# tree /apps/prometheus/data/01FQNCYZOBPFA8AQDDZM1C5PRN/
/apps/prometheus/data/01FQNCYZOBPFA8AQDDZM1C5PRN/
├── chunks
│   └── 000001 #数据目录每个大小为512MB超过会被切分为多个
├── index #索引文件,记录存储的数据的索引信息,通过文件内的几个表来查找时序数据
├── meta.json #block元数据信息,包含了样本数、采集数据的起始时间、压缩历史
└── tombstones #逻辑数据,主要记载删除记录和标记要删除的内客,删除标记,可在查询块时排除样本。

2.1.4 本地存储配置参数

--config.file="prometheus.yml" #指定配置文件
--web.listen-address="0.0.0.0:9090" #指定监听地址
--storage.tsdb.path="data/" #指定数存储目录
--storage.tsdb.retention.size=Bl KB,MB,GB,TB,PB,EB #指定chunk 大小,默认512MB
--storage.tsdb.retention.time= #数据保存时长,默认15天
--query.timeout=2m #最大查询超时时间
-query.max-concurrency=20 #最大查询并发数
--web.read-timeout=5m #最大空闲超时时间
--web.max-connections=512 #最大并发连接数
--web.enable-lifecycle #启用API动态加载配置功能

2.2 victoriaMetrics单机远程存储

2.2.1 下载

https://github.com/VictoriaMetrics/VictoriaMetrics/releases

https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1.tar.gz

wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1.tar.gz
tar xvf victoria-metrics-linux-amd64-v*.*.*.tar.gz
mv victoria-metrics-prod /usr/local/bin/

2.2.2 service启动文件

cat >> /etc/systemd/system/victoria-metrics-prod.service <<EOF
[Unit]
Description=For Victoria-metrics-prod Service
After=network.target
[Service]
ExecStart=/usr/local/bin/victoria-metrics-prod -httpListenAddr=0.0.0.0:8428 -storageDataPath=/data/victoria -retentionPeriod=3
[Install]
WantedBy=multi-user.target
EOF
systemctl start victoria-metrics-prod.service
systemctl status victoria-metrics-prod.service

参数

-httpListenAddr=O.0.0.0:8428 #监听地址及端口
-storageDataPath #VictoriaMetrics将所有数据存储在此目录中,默认为执行启动victoria的当前目录下的victoria-metrics-data目录中。
-retentionPeriod #存储数据的保留,较旧的数据会自动删除,默认保留期为1个月,默认单位为m(月),支持的单位有h (hour), d (day), w (week),y (year)。

2.2.3 访问网页

2.2.4 配置Prometheus

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# 单机配置
remote_write:
- url: http://10.0.0.84:8428/api/v1/write

2.2.5 验证VictoriaMetrics数据

进入web页面

查询node_load1

2.2.6 grafana设置数据源

数据源设置Victoria地址

2.2.7 grafana模板设置

导入模板8919

三、实现prometheus 基于victoriametrics 集群远程存储

3.1 架构

3.2 组件介绍

3.2.1 vminsert

写入组件(写),vminsert负责接收数据写入,并根据对度量名称及其所有标签的一致hash结果将数据分散写入不同的后端vmstorage节点,vminsert默认端口8480

3.2.2 vmstroage

存储原始数据并返回给定时间范围内给定标签过滤器的查询数据,默认端口8482

3.2.3 vmselect

查询组件(读),连接vmstorage,默认端口8481

3.2.4 其他可选组件

vmagent

是一个很小但功能强大的代理,它可以从node_exporter各种来源收集度量数据,并将它们存储在VictoriaMetrics或任何其他支持远程写入协议的与 prometheus兼容的存储系统中,有替代prometheus server的意向。

vmalert

替代Prometheus server,以VictoriaMetrics为数据源,基于兼容Prometheus的告警规则,判断数据是否异常,并将产生的通知发送给alertmanager

vmgateway

读写VictoriaMetrics数据的代理网关,可实现限速和访问控制等功能,目前为企业组件

vmctl

VictoriaMetrics的命令行工具,目前主要用于将prometheus,opentsdb等数据源的数据迁移到VictoriaMetrics

3.3 下载安装包

主机清单

vm1 10.0.0.86
vm2 10.0.0.87
vm3 10.0.0.88

https://github.com/VictoriaMetrics/VictoriaMetrics/releases

https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.1/victoria-metrics-linux-amd64-v1.93.1-cluster.tar.gz

tar xvf victoria-metrics-linux-amd64-v*.*-cluster.tar.gz
cp vm* /usr/local/bin/

3.4 service

3.4.1 vmstorage-prod

负责数据的持久化,监控端口:API 8482,数据写入端口:8400,数据读取端口:8401

cat >> /etc/systemd/system/vmstorage.service <<EOF
[Unit]
Description=Vmstorage Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/usr/local/bin/vmstorage-prod -loggerTimezone=Asia/Shanghai -storageDataPath=/data/vmstorage-data -httpListenAddr=:8482 -vminsertAddr=:8400 -vmselectAddr=:8401
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload && systemctl enable vmstorage.service && systemctl start vmstorage.service

主要参数

-httpListenAddr string
Address to listen for http connections (default ":8482")
-vminsertAddr string
TCP address to accept connections from vminsert services (default ":8400")
-vmselectAddr string
TCP address to accept connections from vmselect services(default ":8401")

3.4.2 vminsert-prod

接收外部的写请求,默认端口8480

cat >> /etc/systemd/system/vminsert.service <<EOF
[Unit]
Description=Vminsert Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/usr/local/bin/vminsert-prod -httpListenAddr=:8480 -storageNode=10.0.0.86:8400,10.0.0.87:8400,10.0.0.88:8400
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload && systemctl enable vminsert.service && systemctl start vminsert.service

3.4.3 vmselect-prod

负责接收外部的读请求,默认端口8481

cat >> /etc/systemd/system/vmselect.service <<EOF
[Unit]
Description=Vmselect Server
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/tmp
ExecStart=/usr/local/bin/vmselect-prod -httpListenAddr=:8481 -storageNode=10.0.0.86:8401,10.0.0.87:8401,10.0.0.88:8401
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload && systemctl enable vmselect.service && systemctl start vmselect.service

3.5 验证服务端口

vm1

[root@vm1 opt]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 0.0.0.0:8480 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8481 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8482 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8400 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8401 0.0.0.0:*
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 127.0.0.1:6010 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 [::1]:6010 [::]:*

vm2

[root@vm2 opt]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 0.0.0.0:8400 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8401 0.0.0.0:*
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 127.0.0.1:6010 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8480 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8481 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8482 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 [::1]:6010 [::]:*

vm3

[root@vm3 opt]#ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 0.0.0.0:8400 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8401 0.0.0.0:*
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 127.0.0.1:6010 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8480 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8481 0.0.0.0:*
LISTEN 0 4096 0.0.0.0:8482 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 128 [::1]:6010 [::]:*

可网页访问测试

http://10.0.0.86:8480/metrics

http://10.0.0.86:8481/metrics

http://10.0.0.86:8482/metrics

http://10.0.0.87:8480/metrics

http://10.0.0.87:8481/metrics

http://10.0.0.87:8482/metrics

http://10.0.0.88:8480/metrics

http://10.0.0.88:8481/metrics

http://10.0.0.88:8482/metrics

3.6 配置prometheus

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# 单机配置
#remote_write:
# - url: http://10.0.0.84:8428/api/v1/write
# 集群配置
remote_write:
- url: http://10.0.0.86:8480/insert/0/prometheus
- url: http://10.0.0.87:8480/insert/0/prometheus
- url: http://10.0.0.88:8480/insert/0/prometheus

3.7 grafana设置数据源

设置集群查询地址

http://10.0.0.86:8481/select/0/prometheus,可配置VIP实现高可用

3.8 grafana导入模板

13824

3.9 开启数据复制

https://docs.victoriametrics.com/Cluster-VictoriaMetrics.html#replication-and-data-safety

默认情况下,数据被vminsert的组件基于hash算法分别将数据持久化到不同的vmstroage节点,可以启用vminsert组件支持的-replicationFactor=N复制功能,将数据分别在各节点保存一份完整的副本以实现数据的高可用。

posted @   areke  阅读(2366)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
点击右上角即可分享
微信分享提示