Prometheus联邦集群+VictoriaMetrics集群搭建部署
网络架构图
一、环境准备
操作系统版本:Rocky8.6
Prometheus版本:2.38.0
victoriametrics版本:1.81.2
grafana版本:9.2.0
prometheus-server-01 172.16.88.30 2vcpu 4G victoriametrics-node-01 172.16.88.41 2vcpu 4G victoriametrics-node-02 172.16.88.41 2vcpu 4G victoriametrics-node-03 172.16.88.41 2vcpu 4G
grafana-dashboard-01 172.16.88.220 2vcpu 4G
二、配置Prometheus联邦集群
2.1、在172.16.88.30节点安装Prometheus服务
wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz
tar -xf prometheus-2.38.0.linux-amd64.tar.gz -C /opt/
cd /opt/
mv prometheus-2.38.0.linux-amd64/ prometheus
vi /etc/systemd/system/prometheus.service
[Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] Restart=on-failure WorkingDirectory=/opt/prometheus/ ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml [Install] WantedBy=multi-user.target
systemctl enable --now prometheus.service
2.2、配置Prometheus联邦
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: 'prometheus-federate' scrape_interval: 10s honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="prometheus"}' - '{__name__=~"job:.*"}' - '{__name__=~"node.*"}' static_configs: - targets: - '172.16.88.20:9090' - '172.16.88.154:50513' - '172.16.88.154:40900'
重启Prometheus服务
通过GUI界面验证采集的数据信息
Prometheus-node数据采集
Prometheus-k8s数据采集
Prometheus-ceph数据采集
注意:由于Prometheus配置文件只采集node信息,所以此处无法搜集到任何相关数据信息
三、安装部署VictoriaMetrics存储集群
3.1、VictoriaMetrics介绍
官网地址:https://github.com/VictoriaMetrics/VictoriaMetrics
https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html
VictoriaMetrics架构图
vminsert #写入组件(写), vminsert 负责接收数据写入并根据对度量名称及其所有标签的一致 hash 结果将数据分散写入不同的后端 vmstorage 节点之间 vmstorage, vminsert 默认端口 8480 vmstorage #存储原始数据并返回给定时间范围内给定标签过滤器的查询数据, 默认端口 8482 vmselect #查询组件(读), 连接 vmstorage , 默认端口 8481 其它可选组件: vmagent #是一个很小但功能强大的代理, 它可以从 node_exporter 各种来源收集度量数据, 并将它们存储在 VictoriaMetrics 或任何其他支持远程写入协议的与 prometheus 兼容的存储系统中, 有替代prometheus server 的意向。 vmalert: 替换 prometheus server, 以 VictoriaMetrics 为数据源, 基于兼容 prometheus 的告警规则,判断数据是否异常, 并将产生的通知发送给 alertermanager Vmgateway: 读写 VictoriaMetrics 数据的代理网关, 可实现限速和访问控制等功能, 目前为企业版组件 vmctl: VictoriaMetrics 的命令行工具, 目前主要用于将 prometheus、 opentsdb 等数据源的数据迁移到VictoriaMetrics。
3.2、在41、42、43集群节点下载软件包,并解压到/usr/local/bin下
注意下载带cluster版本
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.81.2/victoria-metrics-linux-amd64-v1.81.2-cluster.tar.gz
tar -xf victoria-metrics-linux-amd64-v1.81.2-cluster.tar.gz -C /usr/local/bin/
3.3、配置vmstorage-prod 组件服务
vim /etc/systemd/system/vmstorage.service
[Unit] Description=Vmstorage Server After=network.target [Service] Restart=on-failure WorkingDirectory=/tmp ExecStart=/usr/local/bin/vmstorage-prod -loggerTimezone Asia/Shanghai -storageDataPath /data/vmstorage-data -httpListenAddr :8482 -vminsertAddr :8400 -vmselectAddr :8401 [Install] WantedBy=multi-user.target
mkdir -p /data/vmstorage-data #生产环境建议换成磁盘目录,换性能更好的ssd硬盘
同步启动文集到其他节点
scp /etc/systemd/system/vmstorage.service root@172.16.88.42:/etc/systemd/system/
scp /etc/systemd/system/vmstorage.service root@172.16.88.43:/etc/systemd/system/
systemctl enable --now vmstorage.service
3.4、部署 vminsert-prod 组件
vim /etc/systemd/system/vminsert.service
[Unit] Description=Vminsert Server After=network.target [Service] Restart=on-failure WorkingDirectory=/tmp ExecStart=/usr/local/bin/vminsert-prod -httpListenAddr :8480 -storageNode=172.16.88.41:8400,172.16.88.42:8400,172.16.88.43:8400 [Install] WantedBy=multi-user.target
scp /etc/systemd/system/vminsert.service root@172.16.88.42:/etc/systemd/system/
scp /etc/systemd/system/vminsert.service root@172.16.88.43:/etc/systemd/system/
systemctl enable --now vminsert.service
3.5、部署 vmselect-prod 组件
vim /etc/systemd/system/vmselect.service
[Unit] Description=Vminsert Server After=network.target [Service] Restart=on-failure WorkingDirectory=/tmp ExecStart=/usr/local/bin/vmselect-prod -httpListenAddr :8481 -storageNode=172.16.88.41:8401,172.16.88.42:8401,172.16.88.43:8401 [Install] WantedBy=multi-user.target
scp /etc/systemd/system/vmselect.service root@172.16.88.42:/etc/systemd/system/
scp /etc/systemd/system/vmselect.service root@172.16.88.43:/etc/systemd/system/
systemctl enable --now vmselect.service
3.6、验证服务端口
172.16.88.41、42、43执行:
curl http://172.16.88.41:8480/metrics
curl http://172.16.88.41:8481/metrics
curl http://172.16.88.41:8482/metrics
curl http://172.16.88.42:8480/metrics
curl http://172.16.88.42:8481/metrics
curl http://172.16.88.42:8482/metrics
curl http://172.16.88.43:8480/metrics
curl http://172.16.88.43:8481/metrics
curl http://172.16.88.43:8482/metrics
四、配置Prometheus远程写入victoriametrics
4.1、配置Prometheus远程写入victoriametrics集群
vim prometheus.yml
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). #集群写入 remote_write: - url: http://172.16.88.41:8480/insert/0/prometheus - url: http://172.16.88.42:8480/insert/0/prometheus - url: http://172.16.88.43:8480/insert/0/prometheus # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: 'prometheus-federate' scrape_interval: 10s honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="prometheus"}' - '{__name__=~"job:.*"}' - '{__name__=~"node.*"}' static_configs: - targets: - '172.16.88.20:9090' - '172.16.88.154:50513' - '172.16.88.154:40900'
4.2、重启Prometheus服务
五、配置grafana
5.1、配置grafana读victoriametrics集群负载均衡
在之前文章,keepalive+haproxy所有节点配置VIP haproxy代理
可参考:https://www.cnblogs.com/cyh00001/p/16520847.html (二、配置haproxy+keepalived高可用)
vim /etc/haproxy/haproxy.conf
listen kubernetes-dashboard-30000
bind 172.16.88.200:30000
mode tcp
server easzlab-k8s-master-01 172.16.88.154:30000 check inter 2000 fall 3 rise 5
server easzlab-k8s-master-02 172.16.88.155:30000 check inter 2000 fall 3 rise 5
server easzlab-k8s-master-03 172.16.88.156:30000 check inter 2000 fall 3 rise 5
listen prometheus-victoriametrics-8481
bind 172.16.88.200:8481
mode tcp
server victoriametrics-node-01 172.16.88.41:8481 check inter 2000 fall 3 rise 5
server victoriametrics-node-02 172.16.88.42:8481 check inter 2000 fall 3 rise 5
server victoriametrics-node-03 172.16.88.43:8481 check inter 2000 fall 3 rise 5
重启所有节点haproxy服务
5.2、下载安装grafana服务
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.2.0~beta1-1.x86_64.rpm
yum localinstall grafana-enterprise-9.2.0~beta1-1.x86_64.rpm -y
systemctl enable --now grafana-server.service
5.3、添加Prometheus数据源
victoriametrics-clusterdata-source #数据源名称
http://172.16.88.200:8481/select/0/prometheus #集群数据源地址
5.4、导入模板