redis集群监控:prometheus+redis_exporter+grafana
一、环境
操作系统:centos7.6
redis集群:已搭建三主三从(三台主机)
prometheus、grafana、node_exporter已安装(上篇已写文档)
我从redis_exporter部署开始
二、安装redis_exporter(三台都要操作)
wget https://github.com/oliver006/redis_exporter/releases/download/v1.17.1/redis_exporter-v1.17.1.linux-amd64.tar.gz
tar -zxvf redis_exporter-v1.17.1.linux-amd64.tar.gz -o /usr/local/prometheus/
三、系统system启动(三台都要操作,以此内推后面节点端口修改下)
一台主机2个节点
vim /lib/systemd/system/redis_exporter7000.service
[Unit]
Description=redis_exporter7000
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/prometheus/redis_exporter/redis_exporter -redis.addr 10.200.0.1:7000 -redis.password YH2020s2b -web.listen-address :59121
Restart=always
[Install]
WantedBy=multi-user.target
vim /lib/systemd/system/redis_exporter7001.service
[Unit]
Description=redis_exporter7000
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/prometheus/redis_exporter/redis_exporter -redis.addr 10.200.0.1:7001 -redis.password YH2020s2b -web.listen-address :59122
Restart=always
[Install]
WantedBy=multi-user.target
4、查看下端口是否正常启动
netstat -lntp
5、修改Prometheus配置
- job_name: 'redis-cluster'
scrape_interval: 5s
static_configs:
- targets:
- redis://10.200.0.1:7000
- redis://10.200.0.1:7001
- redis://10.200.0.2:7002
- redis://10.200.0.2:7003
- redis://10.200.0.3:7004
- redis://10.200.0.3:7005
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.200.0.1:59121
- job_name: 'redis_exporter'
scrape_interval: 5s
static_configs:
- targets:
- 10.200.0.1:59121
- 10.200.0.1:59122
- 10.200.0.2:59121
- 10.200.0.2:59122
- 10.200.0.3:59121
- 10.200.0.3:59122
6、先重新启动Prometheus服务再访问http://10.200.13.50:9090/targets查看是否正常
systemctl restart prometheus.service
7、使用 Grafana 的 Redis Data Source 插件监控的你的 Redis
需Grafana 7.0及以后的版本一起使用。如果您已经有了Grafana 7.0,您可以使用这个grafana-cli命令来安装。
grafana-cli plugins install redis-datasource
如果报错下载不了可以先下载安装包后在解压
wget -P /var/lib/grafana/plugins https://storage.googleapis.com/plugins-community/redis-datasource/release/1.3.1/redis-datasource-1.3.1.zip
cd /var/lib/grafana/plugins
unzip redis-datasource-1.3.1.zip
如果你没有安装grafana,可以在容器里快速部署
docker run -d -p 3000:3000 --name=grafana -e "GF_INSTALL_PLUGINS=redis-datasource" grafana/grafana
进入grafana,选择Data Sources
8、告警规则后续补充
Redis报警规则
报警名称 | 表达式 | 采集数据时间(分钟) | 报警触发条件 |
---|---|---|---|
RedisDown | redis_up == 0 | 5 | Redis下线 |
RedisMissingMaster | count(redis_instance_info{role=“master”}) == 0 | 5 | Master缺失 |
RedisTooManyMasters | count(redis_instance_info{role=“master”}) > 1 | 5 | Master过多 |
RedisDisconnectedSlaves | count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1 | 5 | Slave连接断开 |
RedisReplicationBroken | delta(redis_connected_slaves[1m]) < 0 | 5 | 复制中断 |
RedisClusterFlapping | changes(redis_connected_slaves[5m]) > 2 | 5 | 副本连接识别变更 |
RedisMissingBackup | time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24 | 5 | 备份中断 |
RedisOutOfMemory | redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90 | 5 | 内存不足 |
RedisTooManyConnections | redis_connected_clients > 100 | 5 | 连接过多 |
RedisNotEnoughConnections | redis_connected_clients < 5 | 5 | 连接不足 |
RedisRejectedConnections | increase(redis_rejected_connections_total[1m]) > 0 | 5 | 连接被拒绝 |
redis_master_link_up | redis_master_link_up == 0 | 5 | 复制连接当前断开 |