61、Prometheus-Consul分布式集群部署
1、简介
1.1、Consul介绍
Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value 存储、多数 据中心和分布式一致性保证等功能。Prometheus 通过 Consul 可以很方便的实现服务自动发现和维护,同时 Consul 支持分布式集群部署,将大大提高了稳定性,通过 Prometheus 跟 Consul 集群二者结合起来,能够高效的进行数据维护同时保证系统稳定。
2、Consul布署
2.1、环境准备
2.1.1、准备3个主机
这里准备如下IP地址主机 192.168.10.34 192.168.10.30 192.168.10.29
2.1.2、3个主机下载consul软件
https://releases.hashicorp.com/consul/1.8.0/consul_1.8.0_linux_amd64.zip
2.1.3、解压软件
unzip consul_1.8.0_linux_amd64.zip -d /usr/local/bin/ mkdir /data/
2.2、启动consul服务
2.2.1、192.168.10.34
]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.34 -bind=192.168.10.34 -client=0.0.0.0 -datacenter=consulManager -ui & ]# tail -f nohup.out 2023-04-12T10:37:13.129+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp 2023-04-12T10:37:13.129+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.10.34:8300 [Follower]" leader= 2023-04-12T10:37:13.130+0800 [INFO] agent.server: Adding LAN server: server="192.168.10.34 (Addr: tcp/192.168.10.34:8300) (DC: consulmanager)" 2023-04-12T10:37:13.130+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=192.168.10.34.consulmanager area=wan 2023-04-12T10:37:13.130+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2023-04-12T10:37:13.130+0800 [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp 2023-04-12T10:37:13.130+0800 [INFO] agent: started state syncer ==> Consul agent running! 2023-04-12T10:37:20.180+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2.2.2、192.168.10.30
]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.30 -bind=192.168.10.30 -client=0.0.0.0 -datacenter=consulManager -ui &
]# tail -f nohup.out 2023-04-12T10:37:45.562+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: 192.168.10.30 192.168.10.30 2023-04-12T10:37:45.562+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp 2023-04-12T10:37:45.562+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.10.30:8300 [Follower]" leader= 2023-04-12T10:37:45.563+0800 [INFO] agent.server: Adding LAN server: server="192.168.10.30 (Addr: tcp/192.168.10.30:8300) (DC: consulmanager)" 2023-04-12T10:37:45.563+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=192.168.10.30.consulmanager area=wan 2023-04-12T10:37:45.563+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2023-04-12T10:37:45.563+0800 [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp 2023-04-12T10:37:45.563+0800 [INFO] agent: started state syncer ==> Consul agent running! 2023-04-12T10:37:51.172+0800 [WARN] agent.server.raft: no known peers, aborting election 2023-04-12T10:37:52.598+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2023-04-12T10:37:57.242+0800 [INFO] agent: Newer Consul version available: new_version=1.15.2 current_version=1.8.0 2023-04-12T10:38:18.841+0800 [ERROR] agent: Coordinate update error: error="No cluster leader"
2.2.3、192.168.10.29
]# nohup consul agent -server -bootstrap-expect=3 -data-dir=/data/consul -node=192.168.10.29 -bind=192.168.10.29 -client=0.0.0.0 -datacenter=consulManager -ui & ]# tail -f nohup.out 2023-04-12T10:37:37.124+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.10.29:8300 [Follower]" leader= 2023-04-12T10:37:37.124+0800 [INFO] agent.server: Adding LAN server: server="192.168.10.29 (Addr: tcp/192.168.10.29:8300) (DC: consulmanager)" 2023-04-12T10:37:37.124+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=192.168.10.29.consulmanager area=wan 2023-04-12T10:37:37.124+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2023-04-12T10:37:37.125+0800 [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp 2023-04-12T10:37:37.125+0800 [INFO] agent: started state syncer ==> Consul agent running! 2023-04-12T10:37:43.767+0800 [WARN] agent.server.raft: no known peers, aborting election 2023-04-12T10:37:44.168+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2.2.4、检查启动端口
]# netstat -tunlp | grep consul tcp 0 0 192.168.10.34:8300 0.0.0.0:* LISTEN 28148/consul tcp 0 0 192.168.10.34:8301 0.0.0.0:* LISTEN 28148/consul tcp 0 0 192.168.10.34:8302 0.0.0.0:* LISTEN 28148/consul tcp6 0 0 :::8500 :::* LISTEN 28148/consul tcp6 0 0 :::8600 :::* LISTEN 28148/consul udp 0 0 192.168.10.34:8301 0.0.0.0:* 28148/consul udp 0 0 192.168.10.34:8302 0.0.0.0:* 28148/consul udp6 0 0 :::8600 :::* 28148/consul
2.2.5、注意事项
此时三台机器还未 join,不能算是一个集群,三台机器上的 consul 均不能正常工作,因为leader 未选出。
2.3、给集群加入节点
在任意一台consul主机上执行
2.3.1、选择192.168.10.34主机加入节点
]# consul join 192.168.10.29 Successfully joined cluster by contacting 1 nodes.
]# consul join 192.168.10.30 Successfully joined cluster by contacting 1 nodes.
2.3.2、分析日志
...2023-04-12T10:43:58.685+0800 [INFO] agent.server: New leader elected: payload=192.168.10.34 2023-04-12T10:43:58.685+0800 [INFO] agent.server.raft: pipelining replication: peer="{Voter c2f9e827-afcf-54c3-b702-9ec3111491d9 192.168.10.30:8300}" 2023-04-12T10:43:58.686+0800 [WARN] agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter 696892ba-c77e-c5ad-5709-2cd4bdbc06dc 192.168.10.29:8300}" next=1 2023-04-12T10:43:58.687+0800 [INFO] agent.server.raft: pipelining replication: peer="{Voter 696892ba-c77e-c5ad-5709-2cd4bdbc06dc 192.168.10.29:8300}" 2023-04-12T10:43:58.687+0800 [INFO] agent.leader: started routine: routine="federation state anti-entropy" 2023-04-12T10:43:58.687+0800 [INFO] agent.leader: started routine: routine="federation state pruning" 2023-04-12T10:43:58.687+0800 [INFO] agent.leader: started routine: routine="CA root pruning" 2023-04-12T10:43:58.687+0800 [INFO] agent.server: member joined, marking health alive: member=192.168.10.34 2023-04-12T10:43:58.691+0800 [INFO] agent.server: member joined, marking health alive: member=192.168.10.29 2023-04-12T10:43:58.692+0800 [INFO] agent.server: federation state anti-entropy synced 2023-04-12T10:43:58.692+0800 [INFO] agent: Synced node info 2023-04-12T10:43:58.692+0800 [INFO] agent.server: member joined, marking health alive: member=192.168.10.30
# 此时说明leader和成员都有了
2.4、集群检查
2.4.1、查看集群状态
]# consul operator raft list-peers Node ID Address State Voter RaftProtocol 192.168.10.29 696892ba-c77e-c5ad-5709-2cd4bdbc06dc 192.168.10.29:8300 follower true 3 192.168.10.30 c2f9e827-afcf-54c3-b702-9ec3111491d9 192.168.10.30:8300 follower true 3 192.168.10.34 5edfe595-4f0e-b507-bd8d-5f04fd1109d2 192.168.10.34:8300 leader true 3
2.4.2、查看成员状态
]# consul members Node Address Status Type Build Protocol DC Segment 192.168.10.29 192.168.10.29:8301 alive server 1.8.0 2 consulmanager <all> 192.168.10.30 192.168.10.30:8301 alive server 1.8.0 2 consulmanager <all> 192.168.10.34 192.168.10.34:8301 alive server 1.8.0 2 consulmanager <all>
2.4.3、集群测试
# 设置值 ]# consul kv put name cyc Success! Data written to: name # 获取值 ]# consul kv get name cyc
其他两台机器查看该 key 值 也是返回 shanwaiyun 这个 说明 key 值已经在集群中同步
2.4.4、web浏览
http://192.168.10.34:8500/ http://192.168.10.30:8500/ http://192.168.10.29:8500/
3、Prometheus 与 consul 整合
3.1、原理流程
1、通过在 consul 注册服务或注销服务(监控 targets) 2、Prometheus 一直监视(watch)consul 服务,当发现 consul 中符合要求的服务有新变化是更新 Prometheus 监控对象
3.2、准备一台新的node_exporter
192.168.10.30:9100 已经安装node_exporter
3.3、consul注册与注销
3.3.1、node_exporter服务注册到consul
curl -X PUT -d '{"id": "node-exporter-30","name":"node-exporter","address": "192.168.10.30","port": 9100,"tags":["linux","prome"],"checks": [{"http": "http://192.168.10.30:9100/metrics","interval": "5s"}]}' http://192.168.10.34:8500/v1/agent/service/register
3.3.2、node_exporter服务从consul注销
curl -X PUT http://192.168.10.34:8500/v1/agent/service/deregister/node-exporter-30
3.4、将consul增加至prometheus的配置
3.4.1、配置prometheus.yaml
]# vi /data/server/prometheus/etc/prometheus.yml scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: 'node_discovery_by_consul' metrics_path: /metrics scheme: http consul_sd_configs: - server: 192.168.10.29:8500 services: - node-exporter - server: 192.168.10.30:8500 services: - node-exporter - server: 192.168.10.34:8500 services: - node-exporter
3.4.2、检查语法
]# promtool check config /data/server/prometheus/etc/prometheus.yml Checking /data/server/prometheus/etc/prometheus.yml SUCCESS: 1 rule files found SUCCESS: /data/server/prometheus/etc/prometheus.yml is valid prometheus config file syntax Checking /data/server/prometheus/rules/metrics_request_rules.yaml SUCCESS: 2 rules found
3.4.3、重启prometheus服务
systemctl restart prometheus
3.4.4、prometheus Web查询
说明增加节点监控增加成功
3.5、再增加多一个node_exporter
3.5.1、往consul注册多一个node_exporter
curl -X PUT -d '{"id": "node-exporter-29","name":"node-exporter","address": "192.168.10.29","port": 9100,"tags":["linux","prome"],"checks": [{"http": "http://192.168.10.29:9100/metrics","interval": "5s"}]}' http://192.168.10.34:8500/v1/agent/service/register
3.5.2、查询consul注册情况
3.5.3、prometheus Web查询
3.5.4、使用PromQL查询