redis哨兵部署有问题导致单节点故障
背景
中午吃完饭回来,刚要准备午休,同事告诉我说现在某个区域的服务出现了故障,Redis无法连接。
Redis我们是部署的哨兵模式,一主两从,哨兵分别部署在另外三台节点,也就是说我们的Redis哨兵一共用了6台机器。
登录服务器查看为什么Redis无法连接,发现Redis只是没有启动而已。怪了,谁会关闭Redis呢?
排查故障原因
首先我通过last
命令+jumpserver
操作日志来查看到底是谁误操作,关闭了Redis服务。
发现最近一次登录是在12:30登录的,但是故障发生时间是12:25,在12:25故障发生之前有人登录的最近一次则是好几天之前了。开始怀疑不是人为误操作的。
并且从同事那里得知,服务器好像是被重启了。因为从监控图标中来看,发现12:25左右有几分钟的时间内Grafana是中断没有数据的。
通过uptime
以及last
确实看到是被reboot了(服务器才运行了36分钟)。既然没有人误操作,那到底是为什么会自动重启的呢?
开始联系机房管理人员,询问他们是否误操作(查错电源等等)导致机器重启了。
和机房管理人员进行一番你拉我扯的交流之后,给出了一个截图
得出结论是:在这段时间内Redis master节点因为CPU故障自动重启了。
好吧,我也认为是这个原因,因为并不是人为误操作的,大概率确实是硬件故障导致服务器重启的。
需要关注的点
但是关于此次故障还有几个地方需要继续研究
第一点我们在Grafana上点击查看就可以知道。
我们主要来看下第二点:Redis哨兵为啥没有重新选举master?
我先贴出日志
哨兵日志:
153155:X 21 Feb 2024 12:23:12.901 # +sdown master mymaster 192.168.2.192 6379
153155:X 21 Feb 2024 12:30:53.274 * +reboot master mymaster 192.168.2.192 6379
153155:X 21 Feb 2024 12:30:53.340 # -sdown master mymaster 192.168.2.192 6379
Redis 从节点日志:
Feb 21 12:30:48 as03 redis-server: 120365:S 21 Feb 2024 12:30:48.665 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 * MASTER <-> REPLICA sync started
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 * MASTER <-> REPLICA sync started
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 * MASTER <-> REPLICA sync started
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 * MASTER <-> REPLICA sync started
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * MASTER <-> REPLICA sync started
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * Non blocking connect for SYNC fired the event.
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.684 * Master replied to PING, replication can continue...
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.684 * Trying a partial resynchronization (request 74d950677ffe36c881520eb8ef9ba41ce4e3f203:1513331046603).
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.686 * Full resync from master: da2196bdf1a0884ea2c15c0bcd24c132cc160de5:305
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.686 * Discarding previously cached master state.
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.766 * MASTER <-> REPLICA sync: receiving 3370447 bytes from master to disk
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.797 * MASTER <-> REPLICA sync: Flushing old data
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.809 * MASTER <-> REPLICA sync: Loading DB in memory
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.811 * Loading RDB produced by version 6.2.12
从日志中我们可以知道:
哨兵日志:当master节点挂掉之后 哨兵节点可以得知master节点挂掉了,但是一直没有重新选主。
Redis从节点日志:当master节点宕机之后一直在报连接Redis master error。
我准备先在自己的虚拟机实验一下 是否是某个配置错误导致的哨兵模式有问题
实验一
IP地址 | 主机名 | 操作系统 | 软件 |
---|---|---|---|
10.0.0.10 | master | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12;redis-sentinel |
10.0.0.11 | slave01 | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12;redis-sentinel |
10.0.0.12 | slave02 | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12;redis-sentinel |
修改主机名
hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02
设置时间
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 安装基础软件
apt install -y lrzsz net-tools ntpdate
# 同步时间
/usr/sbin/ntpdate ntp1.aliyun.com
crontab -l > crontab_conf ; echo "*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2>&1" >> crontab_conf && crontab crontab_conf && rm -f crontab_conf
timedatectl set-local-rtc 1
# 安装编译工具
apt install make build-essential pkg-config -y
安装Redis单实例
此处密码设置为2023@666.168,如果修改请记得修改/usr/lib/systemd/system/redis.service中的密码
# 消除警告
sysctl vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo "511" > /proc/sys/net/core/somaxconn
sysctl net.core.somaxconn=4096
# 开始安装
wget https://download.redis.io/releases/redis-6.2.12.tar.gz
mv redis-6.2.12.tar.gz /etc/
cd /etc/
tar xf redis-6.2.12.tar.gz
mv redis-6.2.12 redis
rm -rf redis-6.2.12.tar.gz
cd redis/
make && make install
echo $?
mkdir /etc/redis/conf
cp redis.conf ./conf
#sed -i 's#daemonize no#daemonize yes#g' /etc/redis/conf/redis.conf
groupadd redis
useradd redis -g redis -M -s /sbin/nologin
chown -R redis:redis /etc/redis
cat >/usr/lib/systemd/system/redis.service<<EOF
[Unit]
Description=Redis persistent key-value database
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-server /etc/redis/conf/redis.conf --supervised systemd
ExecStop=/usr/local/bin/redis-cli -a 2023@666.168 shutdown
User=root
Group=root
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start redis
ps -ef |grep redis
设置主从
master 配置文件
vim /etc/redis/conf/redis.conf
# 设置绑定IP
bind 0.0.0.0 -::1
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"
slave配置文件
vim /etc/redis/conf/redis.conf
# 设置绑定IP
bind 0.0.0.0 -::1
#配置master节点IP地址
slaveof 10.0.0.10 6379
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"
重启Redis服务
systemctl restart redis
检查验证
Master
redis-cli -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:master
slave
redis-cli -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:slave
注意此时的从节点只可以读 不可以写
127.0.0.1:6379> set 1 2 (error) READONLY You can't write against a read only replica.
设置哨兵
重点配置项解释
sentinel monitor master-name host port quorum 例子:sentinel monitor mymaster 127.0.0.1 6379 2 例子表示的是声明该Sentinel监控的master的名字叫做mymaster,地址为127.0.0.1:6379,最后一个2表示的意思是当集群中有2个Sentinel认为master宕机了或者1个Sentinel有2次认为master宕机了,就会真正认为该master彻底宕机了。
sentinel auth-pass master-name password 如果监控的Redis服务器设置了密码,这需要配置这个选项
sentinel down-after-milliseconds master-name milliseconds Sentinel会向master发送心跳PING来确认master是否运行,如果master在一定时间(down-after-milliseconds,单位毫秒)内不回应PONG 或者是回复了一个错误消息,那么Sentinel会认为master已经宕机了。
# 拷贝哨兵配置文件
cp /etc/redis/sentinel.conf /etc/redis/conf/sentinel.conf
# 配置哨兵配置文件,一主两从都一样的配置
bind 0.0.0.0
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
sentinel down-after-milliseconds mymaster 1000
logfile "/var/log/redis-sentinel.log"
# 完整的配置如下
root@master:/etc/redis# grep -Ev '^$|#' /etc/redis/conf/sentinel.conf
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
bind 0.0.0.0
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/var/log/redis-sentinel.log"
dir /tmp
sentinel down-after-milliseconds mymaster 1000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
# 配置systemd管理
cat >/usr/lib/systemd/system/redis-sentinel.service<<EOF
[Unit]
Description=Redis persistent key-value database
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-sentinel /etc/redis/conf/sentinel.conf --supervised systemd
#ExecStop=/usr/local/bin/redis-cli shutdown
User=root
Group=root
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
# 启动redis
systemctl daemon-reload
systemctl start redis
# 启动哨兵
systemctl daemon-reload
systemctl restart redis-sentinel
ps -ef |grep redis
写入数据进行验证
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
@FileName:
@Time : 2022/4/10 22:38
@Author : 热气球
@Software: PyCharm
@Version : 1.0
@Contact : 2573514647@qq.com
@Des :
"""
import random
import requests
from lxml import etree
from fake_useragent import UserAgent
from redis.sentinel import Sentinel
# url = "https://www.17k.com/list/3546918.html"
url = "https://www.17k.com/list/493239.html"
headers = {'User-Agent': UserAgent().random}
html = requests.get(url=url, headers=headers, timeout=3)
html.encoding = 'utf-8'
sentinel = Sentinel([('10.0.0.10', 26379),
('10.0.0.11', 26379),
('10.0.0.12', 26379)])
# master = sentinel.discover_master('mymaster')
# print(master)
# # 输出:('192.168.196.132', 6379)
# # 获取从服务器地址
# slave = sentinel.discover_slaves('mymaster')
# print(slave)
# # 输出:[('192.168.196.129', 6379)]
# 获取主服务器进行写入
master = sentinel.master_for('mymaster', password='2023@666.168', db=0)
# w_ret = master.set('foo', 'bar')
# 输出:True
slave = sentinel.slave_for('mymaster', password='2023@666.168', db=0)
for key in range(1000000000000):
v = key + 1
res = master.set(key, str(v))
print(res)
get_res = slave.get(key)
print(get_res)
tail -f /var/log/redis.log
14033:S 21 Feb 2024 22:13:01.924 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:01.924 # Error condition on socket for SYNC: Connection refused
14033:S 21 Feb 2024 22:13:02.033 * Connecting to MASTER 10.0.0.10:6379
14033:S 21 Feb 2024 22:13:02.033 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:02.033 # Error condition on socket for SYNC: Connection refused
14033:S 21 Feb 2024 22:13:03.044 * Connecting to MASTER 10.0.0.10:6379
14033:S 21 Feb 2024 22:13:03.045 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:03.046 # Error condition on socket for SYNC: Connection refused
14033:M 21 Feb 2024 22:13:03.268 * Discarding previously cached master state.
14033:M 21 Feb 2024 22:13:03.269 # Setting secondary replication ID to ecb0d33568791925d6c67f2eb2721ea0a3e1b921, valid up to offset: 148512. New replication ID is c070780c961fb11b774cf67aabece30004366513
14033:M 21 Feb 2024 22:13:03.269 * MASTER MODE enabled (user request from 'id=8 addr=10.0.0.12:49539 laddr=10.0.0.11:6379 fd=13 name=sentinel-198e50c7-cmd age=312 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=188 qbuf-free=40766 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
14033:M 21 Feb 2024 22:13:03.276 # CONFIG REWRITE executed with success.
14033:M 21 Feb 2024 22:13:03.940 * Replica 10.0.0.12:6379 asks for synchronization
14033:M 21 Feb 2024 22:13:03.940 * Partial resynchronization request from 10.0.0.12:6379 accepted. Sending 422 bytes of backlog starting from offset 148512.
tail -f /var/log/redis-sentinel.log
14161:X 21 Feb 2024 22:13:03.014 # +sdown master mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.015 # +sdown sentinel b1e7aa5d90b30cdae7449aa0107a7c148b0110c6 10.0.0.10 26379 @ mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.099 # +new-epoch 1
14161:X 21 Feb 2024 22:13:03.099 # +vote-for-leader 198e50c704121f6018ecb6be983507c45cb381f0 1
14161:X 21 Feb 2024 22:13:03.930 # +config-update-from sentinel 198e50c704121f6018ecb6be983507c45cb381f0 10.0.0.12 26379 @ mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.931 # +switch-master mymaster 10.0.0.10 6379 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:03.931 * +slave slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:03.931 * +slave slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:04.987 # +sdown slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379
关键信息:带有+switch-master mymaster xxxx
说明正在重新选主
实验二
IP地址 | 主机名 | 操作系统 | 软件 |
---|---|---|---|
10.0.0.10 | master | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12 |
10.0.0.11 | slave01 | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12 |
10.0.0.12 | slave02 | Ubuntu 20.04.4 LTS arm64架构 | redis-6.2.12 |
10.0.0.13 | sentinel01 | Ubuntu 20.04.4 LTS arm64架构 | redis-sentinel |
10.0.0.14 | sentinel02 | Ubuntu 20.04.4 LTS arm64架构 | redis-sentinel |
10.0.0.15 | sentinel03 | Ubuntu 20.04.4 LTS arm64架构 | redis-sentinel |
修改主机名
hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02
hostnamectl set-hostname sentinel01
hostnamectl set-hostname sentinel02
hostnamectl set-hostname sentinel03
设置时间
# 设置时区
timedatectl set-timezone Asia/Shanghai
# 安装基础软件
apt install -y lrzsz net-tools ntpdate
# 同步时间
/usr/sbin/ntpdate ntp1.aliyun.com
crontab -l > crontab_conf ; echo "*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2>&1" >> crontab_conf && crontab crontab_conf && rm -f crontab_conf
timedatectl set-local-rtc 1
# 安装编译工具
apt install make build-essential pkg-config -y
安装Redis单实例
此处密码设置为2023@666.168,如果修改请记得修改/usr/lib/systemd/system/redis.service中的密码
# 消除警告
sysctl vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo "511" > /proc/sys/net/core/somaxconn
sysctl net.core.somaxconn=4096
# 开始安装
wget https://download.redis.io/releases/redis-6.2.12.tar.gz
mv redis-6.2.12.tar.gz /etc/
cd /etc/
tar xf redis-6.2.12.tar.gz
mv redis-6.2.12 redis
rm -rf redis-6.2.12.tar.gz
cd redis/
make && make install
echo $?
mkdir /etc/redis/conf
cp redis.conf ./conf
#sed -i 's#daemonize no#daemonize yes#g' /etc/redis/conf/redis.conf
groupadd redis
useradd redis -g redis -M -s /sbin/nologin
chown -R redis:redis /etc/redis
cat >/usr/lib/systemd/system/redis.service<<EOF
[Unit]
Description=Redis persistent key-value database
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-server /etc/redis/conf/redis.conf --supervised systemd
ExecStop=/usr/local/bin/redis-cli -a 2023@666.168 shutdown
User=root
Group=root
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start redis
systemctl enable redis
ps -ef |grep redis
设置主从
master 配置文件
vim /etc/redis/conf/redis.conf
# 设置绑定IP
bind 0.0.0.0 -::1
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"
slave配置文件
vim /etc/redis/conf/redis.conf
# 设置绑定IP
bind 0.0.0.0 -::1
#配置master节点IP地址
slaveof 10.0.0.10 6379
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"
重启Redis服务
systemctl restart redis
检查验证
Master
redis-cli -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:master
slave
redis-cli -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:slave
注意此时的从节点只可以读 不可以写
127.0.0.1:6379> set 1 2 (error) READONLY You can't write against a read only replica.
设置哨兵
重点配置项解释
sentinel monitor master-name host port quorum 例子:sentinel monitor mymaster 127.0.0.1 6379 2 例子表示的是声明该Sentinel监控的master的名字叫做mymaster,地址为127.0.0.1:6379,最后一个2表示的意思是当集群中有2个Sentinel认为master宕机了或者1个Sentinel有2次认为master宕机了,就会真正认为该master彻底宕机了。
sentinel auth-pass master-name password 如果监控的Redis服务器设置了密码,这需要配置这个选项
sentinel down-after-milliseconds master-name milliseconds Sentinel会向master发送心跳PING来确认master是否运行,如果master在一定时间(down-after-milliseconds,单位毫秒)内不回应PONG 或者是回复了一个错误消息,那么Sentinel会认为master已经宕机了。
# 拷贝哨兵配置文件
cp /etc/redis/sentinel.conf /etc/redis/conf/sentinel.conf
# 配置哨兵配置文件,一主两从都一样的配置
bind 0.0.0.0
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
sentinel down-after-milliseconds mymaster 1000
logfile "/var/log/redis-sentinel.log"
# 完整的配置如下
root@master:/etc/redis# grep -Ev '^$|#' /etc/redis/conf/sentinel.conf
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
bind 0.0.0.0
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/var/log/redis-sentinel.log"
dir /tmp
sentinel down-after-milliseconds mymaster 1000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no
# 配置systemd管理
cat >/usr/lib/systemd/system/redis-sentinel.service<<EOF
[Unit]
Description=Redis persistent key-value database
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-sentinel /etc/redis/conf/sentinel.conf --supervised systemd
#ExecStop=/usr/local/bin/redis-cli shutdown
User=root
Group=root
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
# 启动redis
systemctl daemon-reload
systemctl restart redis
# 启动哨兵
systemctl daemon-reload
systemctl restart redis-sentinel
ps -ef |grep redis
写入数据进行验证
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
@FileName:
@Time : 2022/4/10 22:38
@Author : 热气球
@Software: PyCharm
@Version : 1.0
@Contact : 2573514647@qq.com
@Des :
"""
import random
import requests
from lxml import etree
from fake_useragent import UserAgent
from redis.sentinel import Sentinel
# url = "https://www.17k.com/list/3546918.html"
url = "https://www.17k.com/list/493239.html"
headers = {'User-Agent': UserAgent().random}
html = requests.get(url=url, headers=headers, timeout=3)
html.encoding = 'utf-8'
sentinel = Sentinel([('10.0.0.13', 26379),
('10.0.0.14', 26379),
('10.0.0.15', 26379)])
# master = sentinel.discover_master('mymaster')
# print(master)
# # 输出:('192.168.196.132', 6379)
# # 获取从服务器地址
# slave = sentinel.discover_slaves('mymaster')
# print(slave)
# # 输出:[('192.168.196.129', 6379)]
# 获取主服务器进行写入
master = sentinel.master_for('mymaster', password='2023@666.168', db=0)
# w_ret = master.set('foo', 'bar')
# 输出:True
slave = sentinel.slave_for('mymaster', password='2023@666.168', db=0)
for key in range(1000000000000):
v = key + 1
res = master.set(key, str(v))
print(res)
get_res = slave.get(key)
print(get_res)
tail -f /var/log/redis-sentinel.log
14148:X 22 Feb 2024 11:11:39.058 # +sdown master mymaster 10.0.0.12 6379
14148:X 22 Feb 2024 11:11:39.267 # +new-epoch 2
14148:X 22 Feb 2024 11:11:39.274 # +vote-for-leader 4e11b4d6fb7fad6ca09acd8961ce7790a0264436 2
14148:X 22 Feb 2024 11:11:40.106 # +config-update-from sentinel 4e11b4d6fb7fad6ca09acd8961ce7790a0264436 10.0.0.15 26379 @ mymaster 10.0.0.12 6379
14148:X 22 Feb 2024 11:11:40.106 # +switch-master mymaster 10.0.0.12 6379 10.0.0.11 6379
14148:X 22 Feb 2024 11:11:40.106 * +slave slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379
14148:X 22 Feb 2024 11:11:40.106 * +slave slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379
14148:X 22 Feb 2024 11:12:10.174 # +sdown slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379
通过上述两个实验表明部署流程是没有问题,应该是线上环境我在部署时忽略了某些关键点。
通过对比线上环境的配置,我发现三台哨兵配置文件中的id都是一样的,应该是当时我在部署Redis哨兵时为了快速部署而直接copy一台的配置文件到其他两台导致的。
将配置文件删除后重新配置并重启Redis哨兵让其自动生成ID后 再次测试Redis哨兵,发现已经可以正常选主。
如何验证部署的Redis哨兵状态
可以通过下面的方式来登录到Redis哨兵
redis-cli -p 26379
# 执行命令 查看最后一行 setinels是否为哨兵节点数量。
info setinel