redis哨兵部署有问题导致单节点故障

背景

中午吃完饭回来,刚要准备午休,同事告诉我说现在某个区域的服务出现了故障,Redis无法连接。

Redis我们是部署的哨兵模式,一主两从,哨兵分别部署在另外三台节点,也就是说我们的Redis哨兵一共用了6台机器。

登录服务器查看为什么Redis无法连接,发现Redis只是没有启动而已。怪了,谁会关闭Redis呢?

排查故障原因

首先我通过last命令+jumpserver操作日志来查看到底是谁误操作,关闭了Redis服务。

发现最近一次登录是在12:30登录的,但是故障发生时间是12:25,在12:25故障发生之前有人登录的最近一次则是好几天之前了。开始怀疑不是人为误操作的。

并且从同事那里得知,服务器好像是被重启了。因为从监控图标中来看,发现12:25左右有几分钟的时间内Grafana是中断没有数据的。

通过uptime 以及last确实看到是被reboot了(服务器才运行了36分钟)。既然没有人误操作,那到底是为什么会自动重启的呢?

开始联系机房管理人员,询问他们是否误操作(查错电源等等)导致机器重启了。

和机房管理人员进行一番你拉我扯的交流之后,给出了一个截图

得出结论是:在这段时间内Redis master节点因为CPU故障自动重启了。

好吧,我也认为是这个原因,因为并不是人为误操作的,大概率确实是硬件故障导致服务器重启的。

需要关注的点

但是关于此次故障还有几个地方需要继续研究

第一点我们在Grafana上点击查看就可以知道。

我们主要来看下第二点:Redis哨兵为啥没有重新选举master?

我先贴出日志

哨兵日志:

153155:X 21 Feb 2024 12:23:12.901 # +sdown master mymaster 192.168.2.192 6379
153155:X 21 Feb 2024 12:30:53.274 * +reboot master mymaster 192.168.2.192 6379
153155:X 21 Feb 2024 12:30:53.340 # -sdown master mymaster 192.168.2.192 6379

Redis 从节点日志:

Feb 21 12:30:48 as03 redis-server: 120365:S 21 Feb 2024 12:30:48.665 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 * MASTER <-> REPLICA sync started
Feb 21 12:30:49 as03 redis-server: 120365:S 21 Feb 2024 12:30:49.669 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 * MASTER <-> REPLICA sync started
Feb 21 12:30:50 as03 redis-server: 120365:S 21 Feb 2024 12:30:50.672 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 * MASTER <-> REPLICA sync started
Feb 21 12:30:51 as03 redis-server: 120365:S 21 Feb 2024 12:30:51.676 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 * MASTER <-> REPLICA sync started
Feb 21 12:30:52 as03 redis-server: 120365:S 21 Feb 2024 12:30:52.679 # Error condition on socket for SYNC: Connection refused
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * Connecting to MASTER 192.168.2.192:6379
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * MASTER <-> REPLICA sync started
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.683 * Non blocking connect for SYNC fired the event.
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.684 * Master replied to PING, replication can continue...
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.684 * Trying a partial resynchronization (request 74d950677ffe36c881520eb8ef9ba41ce4e3f203:1513331046603).
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.686 * Full resync from master: da2196bdf1a0884ea2c15c0bcd24c132cc160de5:305
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.686 * Discarding previously cached master state.
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.766 * MASTER <-> REPLICA sync: receiving 3370447 bytes from master to disk
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.797 * MASTER <-> REPLICA sync: Flushing old data
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.809 * MASTER <-> REPLICA sync: Loading DB in memory
Feb 21 12:30:53 as03 redis-server: 120365:S 21 Feb 2024 12:30:53.811 * Loading RDB produced by version 6.2.12

从日志中我们可以知道:

哨兵日志:当master节点挂掉之后 哨兵节点可以得知master节点挂掉了,但是一直没有重新选主。

Redis从节点日志:当master节点宕机之后一直在报连接Redis master error。

我准备先在自己的虚拟机实验一下 是否是某个配置错误导致的哨兵模式有问题

实验一

IP地址 主机名 操作系统 软件
10.0.0.10 master Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12;redis-sentinel
10.0.0.11 slave01 Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12;redis-sentinel
10.0.0.12 slave02 Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12;redis-sentinel

修改主机名

hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02

设置时间

# 设置时区
timedatectl set-timezone Asia/Shanghai
# 安装基础软件
apt install -y lrzsz net-tools ntpdate
# 同步时间
/usr/sbin/ntpdate ntp1.aliyun.com
crontab -l > crontab_conf ; echo "*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2>&1" >> crontab_conf && crontab crontab_conf && rm -f crontab_conf
timedatectl set-local-rtc 1
# 安装编译工具
apt install make  build-essential pkg-config -y

安装Redis单实例

此处密码设置为2023@666.168,如果修改请记得修改/usr/lib/systemd/system/redis.service中的密码

# 消除警告
sysctl vm.overcommit_memory=1
echo  never > /sys/kernel/mm/transparent_hugepage/enabled 
echo "511" > /proc/sys/net/core/somaxconn
sysctl  net.core.somaxconn=4096
# 开始安装
wget https://download.redis.io/releases/redis-6.2.12.tar.gz
mv redis-6.2.12.tar.gz  /etc/
cd /etc/
tar xf redis-6.2.12.tar.gz
mv redis-6.2.12 redis
rm -rf redis-6.2.12.tar.gz
cd redis/
make  && make  install
echo $?
mkdir /etc/redis/conf
cp redis.conf ./conf
#sed  -i 's#daemonize no#daemonize yes#g' /etc/redis/conf/redis.conf
groupadd redis
useradd redis  -g redis -M -s /sbin/nologin
chown -R redis:redis /etc/redis
 
cat >/usr/lib/systemd/system/redis.service<<EOF
[Unit]
Description=Redis persistent key-value database 
After=network.target 
After=network-online.target 
Wants=network-online.target
 
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-server /etc/redis/conf/redis.conf --supervised systemd 
ExecStop=/usr/local/bin/redis-cli -a 2023@666.168 shutdown
User=root
Group=root 
RuntimeDirectory=redis 
RuntimeDirectoryMode=0755
 
[Install] 
WantedBy=multi-user.target 
EOF
systemctl daemon-reload
systemctl start  redis
ps -ef |grep redis

设置主从

master 配置文件

vim /etc/redis/conf/redis.conf

# 设置绑定IP
bind 0.0.0.0 -::1
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"

slave配置文件

vim /etc/redis/conf/redis.conf

# 设置绑定IP
bind 0.0.0.0 -::1
#配置master节点IP地址
slaveof 10.0.0.10 6379
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"

重启Redis服务

systemctl restart redis

检查验证

Master

redis-cli  -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:master

slave

redis-cli  -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:slave

注意此时的从节点只可以读 不可以写

127.0.0.1:6379> set 1 2
(error) READONLY You can't write against a read only replica.

设置哨兵

重点配置项解释

sentinel monitor master-name host port quorum

例子:sentinel monitor mymaster 127.0.0.1 6379 2

例子表示的是声明该Sentinel监控的master的名字叫做mymaster,地址为127.0.0.1:6379,最后一个2表示的意思是当集群中有2个Sentinel认为master宕机了或者1个Sentinel有2次认为master宕机了,就会真正认为该master彻底宕机了。
sentinel auth-pass master-name password

如果监控的Redis服务器设置了密码,这需要配置这个选项

sentinel down-after-milliseconds master-name milliseconds

Sentinel会向master发送心跳PING来确认master是否运行,如果master在一定时间(down-after-milliseconds,单位毫秒)内不回应PONG 或者是回复了一个错误消息,那么Sentinel会认为master已经宕机了。
# 拷贝哨兵配置文件
cp /etc/redis/sentinel.conf  /etc/redis/conf/sentinel.conf
# 配置哨兵配置文件,一主两从都一样的配置
bind 0.0.0.0
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
sentinel down-after-milliseconds mymaster 1000
logfile "/var/log/redis-sentinel.log"

# 完整的配置如下
root@master:/etc/redis# grep -Ev '^$|#' /etc/redis/conf/sentinel.conf
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
bind 0.0.0.0 
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/var/log/redis-sentinel.log"
dir /tmp
sentinel down-after-milliseconds mymaster 1000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no

# 配置systemd管理
cat >/usr/lib/systemd/system/redis-sentinel.service<<EOF
[Unit]
Description=Redis persistent key-value database 
After=network.target 
After=network-online.target 
Wants=network-online.target
 
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-sentinel /etc/redis/conf/sentinel.conf --supervised systemd 
#ExecStop=/usr/local/bin/redis-cli shutdown 
User=root
Group=root 
RuntimeDirectory=redis 
RuntimeDirectoryMode=0755
 
[Install] 
WantedBy=multi-user.target 
EOF
# 启动redis
systemctl daemon-reload
systemctl start  redis
# 启动哨兵
systemctl daemon-reload
systemctl restart  redis-sentinel
ps -ef |grep redis

写入数据进行验证

#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
@FileName: 
@Time    : 2022/4/10 22:38
@Author  : 热气球
@Software: PyCharm
@Version : 1.0
@Contact : 2573514647@qq.com
@Des     : 
"""
import random

import requests
from lxml import etree
from fake_useragent import UserAgent
from redis.sentinel import Sentinel

# url = "https://www.17k.com/list/3546918.html"
url = "https://www.17k.com/list/493239.html"
headers = {'User-Agent': UserAgent().random}
html = requests.get(url=url, headers=headers, timeout=3)
html.encoding = 'utf-8'
sentinel = Sentinel([('10.0.0.10', 26379),
                     ('10.0.0.11', 26379),
                     ('10.0.0.12', 26379)])
# master = sentinel.discover_master('mymaster')
# print(master)
# # 输出:('192.168.196.132', 6379)
# # 获取从服务器地址
# slave = sentinel.discover_slaves('mymaster')
# print(slave)
# # 输出:[('192.168.196.129', 6379)]


# 获取主服务器进行写入
master = sentinel.master_for('mymaster', password='2023@666.168', db=0)
# w_ret = master.set('foo', 'bar')
# 输出:True

slave = sentinel.slave_for('mymaster', password='2023@666.168', db=0)

for key in range(1000000000000):
    v = key + 1
    res = master.set(key, str(v))
    print(res)
    get_res = slave.get(key)
    print(get_res)

tail -f /var/log/redis.log

14033:S 21 Feb 2024 22:13:01.924 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:01.924 # Error condition on socket for SYNC: Connection refused
14033:S 21 Feb 2024 22:13:02.033 * Connecting to MASTER 10.0.0.10:6379
14033:S 21 Feb 2024 22:13:02.033 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:02.033 # Error condition on socket for SYNC: Connection refused
14033:S 21 Feb 2024 22:13:03.044 * Connecting to MASTER 10.0.0.10:6379
14033:S 21 Feb 2024 22:13:03.045 * MASTER <-> REPLICA sync started
14033:S 21 Feb 2024 22:13:03.046 # Error condition on socket for SYNC: Connection refused
14033:M 21 Feb 2024 22:13:03.268 * Discarding previously cached master state.
14033:M 21 Feb 2024 22:13:03.269 # Setting secondary replication ID to ecb0d33568791925d6c67f2eb2721ea0a3e1b921, valid up to offset: 148512. New replication ID is c070780c961fb11b774cf67aabece30004366513
14033:M 21 Feb 2024 22:13:03.269 * MASTER MODE enabled (user request from 'id=8 addr=10.0.0.12:49539 laddr=10.0.0.11:6379 fd=13 name=sentinel-198e50c7-cmd age=312 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=188 qbuf-free=40766 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default redir=-1')
14033:M 21 Feb 2024 22:13:03.276 # CONFIG REWRITE executed with success.
14033:M 21 Feb 2024 22:13:03.940 * Replica 10.0.0.12:6379 asks for synchronization
14033:M 21 Feb 2024 22:13:03.940 * Partial resynchronization request from 10.0.0.12:6379 accepted. Sending 422 bytes of backlog starting from offset 148512.

tail -f /var/log/redis-sentinel.log

14161:X 21 Feb 2024 22:13:03.014 # +sdown master mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.015 # +sdown sentinel b1e7aa5d90b30cdae7449aa0107a7c148b0110c6 10.0.0.10 26379 @ mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.099 # +new-epoch 1
14161:X 21 Feb 2024 22:13:03.099 # +vote-for-leader 198e50c704121f6018ecb6be983507c45cb381f0 1
14161:X 21 Feb 2024 22:13:03.930 # +config-update-from sentinel 198e50c704121f6018ecb6be983507c45cb381f0 10.0.0.12 26379 @ mymaster 10.0.0.10 6379
14161:X 21 Feb 2024 22:13:03.931 # +switch-master mymaster 10.0.0.10 6379 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:03.931 * +slave slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:03.931 * +slave slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379
14161:X 21 Feb 2024 22:13:04.987 # +sdown slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379

关键信息:带有+switch-master mymaster xxxx说明正在重新选主

实验二

IP地址 主机名 操作系统 软件
10.0.0.10 master Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12
10.0.0.11 slave01 Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12
10.0.0.12 slave02 Ubuntu 20.04.4 LTS arm64架构 redis-6.2.12
10.0.0.13 sentinel01 Ubuntu 20.04.4 LTS arm64架构 redis-sentinel
10.0.0.14 sentinel02 Ubuntu 20.04.4 LTS arm64架构 redis-sentinel
10.0.0.15 sentinel03 Ubuntu 20.04.4 LTS arm64架构 redis-sentinel

修改主机名

hostnamectl set-hostname master
hostnamectl set-hostname slave01
hostnamectl set-hostname slave02
hostnamectl set-hostname sentinel01
hostnamectl set-hostname sentinel02
hostnamectl set-hostname sentinel03

设置时间

# 设置时区
timedatectl set-timezone Asia/Shanghai
# 安装基础软件
apt install -y lrzsz net-tools ntpdate
# 同步时间
/usr/sbin/ntpdate ntp1.aliyun.com
crontab -l > crontab_conf ; echo "*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2>&1" >> crontab_conf && crontab crontab_conf && rm -f crontab_conf
timedatectl set-local-rtc 1
# 安装编译工具
apt install make  build-essential pkg-config -y

安装Redis单实例

此处密码设置为2023@666.168,如果修改请记得修改/usr/lib/systemd/system/redis.service中的密码

# 消除警告
sysctl vm.overcommit_memory=1
echo  never > /sys/kernel/mm/transparent_hugepage/enabled 
echo "511" > /proc/sys/net/core/somaxconn
sysctl  net.core.somaxconn=4096
# 开始安装
wget https://download.redis.io/releases/redis-6.2.12.tar.gz
mv redis-6.2.12.tar.gz  /etc/
cd /etc/
tar xf redis-6.2.12.tar.gz
mv redis-6.2.12 redis
rm -rf redis-6.2.12.tar.gz
cd redis/
make  && make  install
echo $?
mkdir /etc/redis/conf
cp redis.conf ./conf
#sed  -i 's#daemonize no#daemonize yes#g' /etc/redis/conf/redis.conf
groupadd redis
useradd redis  -g redis -M -s /sbin/nologin
chown -R redis:redis /etc/redis
 
cat >/usr/lib/systemd/system/redis.service<<EOF
[Unit]
Description=Redis persistent key-value database 
After=network.target 
After=network-online.target 
Wants=network-online.target
 
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-server /etc/redis/conf/redis.conf --supervised systemd 
ExecStop=/usr/local/bin/redis-cli -a 2023@666.168 shutdown
User=root
Group=root 
RuntimeDirectory=redis 
RuntimeDirectoryMode=0755
 
[Install] 
WantedBy=multi-user.target 
EOF
systemctl daemon-reload
systemctl start  redis
systemctl enable  redis
ps -ef |grep redis

设置主从

master 配置文件

vim /etc/redis/conf/redis.conf

# 设置绑定IP
bind 0.0.0.0 -::1
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"

slave配置文件

vim /etc/redis/conf/redis.conf

# 设置绑定IP
bind 0.0.0.0 -::1
#配置master节点IP地址
slaveof 10.0.0.10 6379
# 设置密码
requirepass "2023@666.168"
# 设置master密码
masterauth 2023@666.168
# 设置日志文件
logfile "/var/log/redis.log"

重启Redis服务

systemctl restart redis

检查验证

Master

redis-cli  -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:master

slave

redis-cli  -a '2023@666.168'
127.0.0.1:6379> info Replication
# Replication
role:slave

注意此时的从节点只可以读 不可以写

127.0.0.1:6379> set 1 2
(error) READONLY You can't write against a read only replica.

设置哨兵

重点配置项解释

sentinel monitor master-name host port quorum

例子:sentinel monitor mymaster 127.0.0.1 6379 2

例子表示的是声明该Sentinel监控的master的名字叫做mymaster,地址为127.0.0.1:6379,最后一个2表示的意思是当集群中有2个Sentinel认为master宕机了或者1个Sentinel有2次认为master宕机了,就会真正认为该master彻底宕机了。
sentinel auth-pass master-name password

如果监控的Redis服务器设置了密码,这需要配置这个选项

sentinel down-after-milliseconds master-name milliseconds

Sentinel会向master发送心跳PING来确认master是否运行,如果master在一定时间(down-after-milliseconds,单位毫秒)内不回应PONG 或者是回复了一个错误消息,那么Sentinel会认为master已经宕机了。
# 拷贝哨兵配置文件
cp /etc/redis/sentinel.conf  /etc/redis/conf/sentinel.conf
# 配置哨兵配置文件,一主两从都一样的配置
bind 0.0.0.0
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
sentinel down-after-milliseconds mymaster 1000
logfile "/var/log/redis-sentinel.log"

# 完整的配置如下
root@master:/etc/redis# grep -Ev '^$|#' /etc/redis/conf/sentinel.conf
sentinel monitor mymaster 10.0.0.10 6379 2
sentinel auth-pass mymaster 2023@666.168
bind 0.0.0.0 
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/var/log/redis-sentinel.log"
dir /tmp
sentinel down-after-milliseconds mymaster 1000
acllog-max-len 128
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
SENTINEL resolve-hostnames no
SENTINEL announce-hostnames no

# 配置systemd管理
cat >/usr/lib/systemd/system/redis-sentinel.service<<EOF
[Unit]
Description=Redis persistent key-value database 
After=network.target 
After=network-online.target 
Wants=network-online.target
 
[Service]
LimitNOFILE=65536
ExecStart=/usr/local/bin/redis-sentinel /etc/redis/conf/sentinel.conf --supervised systemd 
#ExecStop=/usr/local/bin/redis-cli shutdown 
User=root
Group=root 
RuntimeDirectory=redis 
RuntimeDirectoryMode=0755
 
[Install] 
WantedBy=multi-user.target 
EOF
# 启动redis
systemctl daemon-reload
systemctl restart  redis
# 启动哨兵
systemctl daemon-reload
systemctl restart  redis-sentinel
ps -ef |grep redis

写入数据进行验证

#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
@FileName: 
@Time    : 2022/4/10 22:38
@Author  : 热气球
@Software: PyCharm
@Version : 1.0
@Contact : 2573514647@qq.com
@Des     : 
"""
import random

import requests
from lxml import etree
from fake_useragent import UserAgent
from redis.sentinel import Sentinel

# url = "https://www.17k.com/list/3546918.html"
url = "https://www.17k.com/list/493239.html"
headers = {'User-Agent': UserAgent().random}
html = requests.get(url=url, headers=headers, timeout=3)
html.encoding = 'utf-8'
sentinel = Sentinel([('10.0.0.13', 26379),
                     ('10.0.0.14', 26379),
                     ('10.0.0.15', 26379)])
# master = sentinel.discover_master('mymaster')
# print(master)
# # 输出:('192.168.196.132', 6379)
# # 获取从服务器地址
# slave = sentinel.discover_slaves('mymaster')
# print(slave)
# # 输出:[('192.168.196.129', 6379)]


# 获取主服务器进行写入
master = sentinel.master_for('mymaster', password='2023@666.168', db=0)
# w_ret = master.set('foo', 'bar')
# 输出:True

slave = sentinel.slave_for('mymaster', password='2023@666.168', db=0)

for key in range(1000000000000):
    v = key + 1
    res = master.set(key, str(v))
    print(res)
    get_res = slave.get(key)
    print(get_res)

tail -f /var/log/redis-sentinel.log

14148:X 22 Feb 2024 11:11:39.058 # +sdown master mymaster 10.0.0.12 6379
14148:X 22 Feb 2024 11:11:39.267 # +new-epoch 2
14148:X 22 Feb 2024 11:11:39.274 # +vote-for-leader 4e11b4d6fb7fad6ca09acd8961ce7790a0264436 2
14148:X 22 Feb 2024 11:11:40.106 # +config-update-from sentinel 4e11b4d6fb7fad6ca09acd8961ce7790a0264436 10.0.0.15 26379 @ mymaster 10.0.0.12 6379
14148:X 22 Feb 2024 11:11:40.106 # +switch-master mymaster 10.0.0.12 6379 10.0.0.11 6379
14148:X 22 Feb 2024 11:11:40.106 * +slave slave 10.0.0.10:6379 10.0.0.10 6379 @ mymaster 10.0.0.11 6379
14148:X 22 Feb 2024 11:11:40.106 * +slave slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379
14148:X 22 Feb 2024 11:12:10.174 # +sdown slave 10.0.0.12:6379 10.0.0.12 6379 @ mymaster 10.0.0.11 6379

通过上述两个实验表明部署流程是没有问题,应该是线上环境我在部署时忽略了某些关键点。

通过对比线上环境的配置,我发现三台哨兵配置文件中的id都是一样的,应该是当时我在部署Redis哨兵时为了快速部署而直接copy一台的配置文件到其他两台导致的。

将配置文件删除后重新配置并重启Redis哨兵让其自动生成ID后 再次测试Redis哨兵,发现已经可以正常选主。

如何验证部署的Redis哨兵状态

可以通过下面的方式来登录到Redis哨兵

redis-cli -p 26379
# 执行命令 查看最后一行 setinels是否为哨兵节点数量。
info setinel

参考文档

redis 哨兵模式常用命令_redis查看哨兵状态命令-CSDN博客

posted @ 2024-02-26 16:33  热气球!  阅读(25)  评论(0编辑  收藏  举报