Redis、Zabbix

一、简述 redis 特点及其应用场景

Redis 特点

速度快：10W QPS，基于内存，C 语言实现
持久化
支持多种数据结构：支持 string(字符串)、hash(哈希数据)、list(列表)、set(集合)、zset(有序集合)
支持多种编码语言
功能丰富：支持 Lua 脚本，发布订阅，事务，pipeline 等功能
简单：代码短小精悍（单机核心代码只有 23000 行左右），单线程开发容易，不依赖外部库，使用简单
主从复制
支持高可用和分布式

Redis 典型应用场景

session 共享：常见于 Web 集群中的 Tomcat 或 PHP 中多 Web 服务器 session 共享
缓存：数据查询、电商网站商品信息、新闻内容
计数器：访问排行榜、商品浏览数等和次数相关的数值统计场景
微博/微信社交场合：共同好友，粉丝数，关注，点赞评论等
消息队列：ELK 的日志缓存、部分业务的订阅发布系统
地理位置：基于 GEO（地理信息定位），实现摇一摇，附件的人，外卖等功能

二、对比 redis 的 RDB、AOF 模式的优缺点

1. RDB（Redis DataBase）模式

RDB 工作原理

RDB 基于时间的快照，其默认只保留当前最新的一次快照，特点是执行速度比较快，缺点是可能会丢失从上次快照到当前时间点之间未做快照的数据。

RDB bgsave（异步）实现快照具体过程

RDB 模式优缺点

优点

RDB 快照保存了某个时间点的数据，可以通过脚本执行 redis 指令 bgsave(非阻塞，后台执行)或者 save(会阻塞写操作,不推荐)命令自定义时间点备份，可以保留多个备份，当出现问题可以恢复到不同时间点的版本,很适合备份,并且此文件格式也支持有不少第三方工具可以进行后续的数据分析。

比如: 可以在最近的 24 小时内，每小时备份一次 RDB 文件，并且在每个月的每一天，也备份一个 RDB 文件。这样的话，即使遇上问题，也可以随时将数据集还原到不同的版本。
RDB 可以最大化 Redis 的性能，父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程，然后这个子进程就会处理接下来的所有保存工作，父进程无须执行任何磁盘工/0 操作。
RDB 在大量数据,比如几个 G 的数据，恢复的速度比 AOF 的快

缺点

不能实时保存数据，可能会丢失自上一次执行 RDB 备份到当前的内存数据

如果需要尽量避免在服务器故障时丢失数据，那么 RDB 不适合。虽然 Redis 允许设置不同的保存点（save point）来控制保存 RDB 文件的频率，但是，因为 RDB 文件需要保存整个数据集的状态，所以它并不是一个轻松快速的操作。因此一般会超过 5 分钟以上才保存一次 RDB 文件。在这种情况下，一旦发生故障停机，就可能会丢失好几分钟的数据。
当数据量非常大的时候，从父进程 fork 子进程进行保存至 RDB 文件时需要一点时间，可能是毫秒或者秒，取决于磁盘 IO 性能

在数据集比较庞大时，fork()可能会非常耗时，造成服务器在一定时间内停止处理客户端﹔如果数据集非常巨大，并且 CPU 时间非常紧张的话，那么这种停止时间甚至可能会长达整整一秒或更久。虽然 AOF 重写也需要进行 fork()，但无论 AOF 重写的执行间隔有多长，数据的持久性都不会有任何损失。

AOF（AppendOnlyFile）模式

AOF 工作原理

AOF 按照操作顺序依次将操作追加到指定的日志文件末尾。

注意：

同时启用 RDB 和 AOF,进行恢复时,默认 AOF 文件优先级高于 RDB 文件,即会使用 AOF 文件进行恢复；

AOF 模式默认是关闭的,第一次开启 AOF 后,并重启服务生效后,会因为 AOF 的优先级高于 RDB,而 AOF 默认没有文件存在,从而导致所有数据丢失。

AOF rewrite 重写

将一些重复的,可以合并的,过期的数据重新写入一个新的 AOF 文件,从而节约 AOF 备份占用的硬盘空间,也能加速恢复过程；可以手动执行 bgrewriteaof 触发 AOF,或定义自动 rewrite 策略。

AOF rewrite 过程

AOF 模式优缺点

优点

数据安全性相对较高，根据所使用的 fsync 策略(fsync 是同步内存中 redis 所有已经修改的文件到存储设备)，默认是 appendfsync everysec，即每秒执行一次 fsync,在这种配置下，Redis 仍然可以保持良好的性能，并且就算发生故障停机，也最多只会丢失一秒钟的数据( fsync 会在后台线程执行，所以主线程可以继续努力地处理命令请求)
由于该机制对日志文件的写入操作采用的是 append 模式，因此在写入过程中不需要 seek, 即使出现宕机现象，也不会破坏日志文件中已经存在的内容。然而如果本次操作只是写入了一半数据就出现了系统崩溃问题，不用担心，在 Redis 下一次启动之前，可以通过 redis-check-aof 工具来解决数据一致性的问题
Redis 可以在 AOF 文件体积变得过大时，自动地在后台对 AOF 进行重写,重写后的新 AOF 文件包含了恢复当前数据集所需的最小命令集合。整个重写操作是绝对安全的，因为 Redis 在创建新 AOF 文件的过程中，append 模式不断的将修改数据追加到现有的 AOF 文件里面，即使重写过程中发生停机，现有的 AOF 文件也不会丢失。而一旦新 AOF 文件创建完毕，Redis 就会从旧 AOF 文件切换到新 AOF 文件，并开始对新 AOF 文件进行追加操作。
AOF 包含一个格式清晰、易于理解的日志文件用于记录所有的修改操作。事实上，也可以通过该文件完成数据的重建

AOF 文件有序地保存了对数据库执行的所有写入操作，这些写入操作以 Redis 协议的格式保存，因此 AOF 文件的内容非常容易被人读懂，对文件进行分析(parse)也很轻松。导出（export)AOF 文件也非常简单:举个例子，如果不小心执行了 FLUSHALL.命令，但只要 AOF 文件未被重写，那么只要停止服务器，移除 AOF 文件末尾的 FLUSHAL 命令，并重启 Redis ,就可以将数据集恢复到 FLUSHALL 执行之前的状态。

缺点

即使有些操作是重复的也会全部记录，AOF 的文件大小要大于 RDB 格式的文件
AOF 在恢复大数据集时的速度比 RDB 的恢复速度要慢
根据 fsync 策略不同,AOF 速度可能会慢于 RDB
bug 出现的可能性更多

RDB 和 AOF 适用场景

如果主要充当缓存功能,或者可以承受数分钟数据的丢失, 通常生产环境一般只需启用 RDB 即可,此也是默认值
如果数据需要持久保存,一点不能丢失,可以选择同时开启 RDB 和 AOF
一般不建议只开启 AOF

三、实现 redis 哨兵，模拟 master 故障场景

1. 工作原理

2. 实现哨兵（sentinel）模式

graph LR M[Sentinel10.0.0.7master] S1[Sentinel10.0.0.17slave1] S2[Sentinel10.0.0.27slave2] M---->S1 M---->S2

1）配置一主两从

一键编译 redis 安装脚本

#!/bin/bash
# 编译安装Redis

source /etc/init.d/functions
#Redis版本
Redis_version=redis-5.0.9
suffix=tar.gz
Redis=${Redis_version}.${suffix}
Password=123456

#redis源码下载地址
redis_url=http://download.redis.io/releases/${Redis}
#redis安装路径
redis_install_DIR=/apps/redis

# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`

color () {
if [[ $2 -eq 0 ]];then
    echo -e "\e[1;32m$1\t\t\t\t\t\t[  OK  ]\e[0;m"
else
    echo $2
    echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m"
fi
}


download_redis (){
# 安装依赖包
yum -y install gcc jemalloc-devel || { color "安装依赖包失败，请检查网络" 1 ;exit 1;}

cd /opt
if [ -e ${Redis} ];then
	color "Redis源码包已存在" 0
else
	color "开始下载Redis源码包" 0
	wget ${redis_url}
	if [ $? -ne 0 ];then
		color "下载Redis源码包失败，退出！" 1
		exit 1
	fi
fi
}


install_redis (){
# 解压源码包
tar xvf /opt/${Redis} -C /usr/local/src
ln -s /usr/local/src/${Redis_version} /usr/local/src/redis

# 编译安装
cd /usr/local/src/redis
make -j ${CPUS} install PREFIX=${redis_install_DIR}
if [ $? -ne 0 ];then
	color "redis 编译安装失败！" 1
	exit 1
else
	color "redis编译安装成功" 0
fi

ln -s ${redis_install_DIR}/bin/redis-* /usr/sbin/

# 添加用户
if id redis &> /dev/null;then
	color "redis用户已存在" 1
else
	useradd -r -s /sbin/nologin redis
	color "redis用户已创建完成" 0
fi
mkdir -p ${redis_install_DIR}/{etc,log,data,run}

#准备redis配置文件
cp redis.conf ${redis_install_DIR}/etc/
sed -i "s/bind 127.0.0.1/bind 0.0.0.0/" ${redis_install_DIR}/etc/redis.conf
sed -i "/# requirepass/a requirepass ${Password}" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^dir .*\$@dir ${redis_install_DIR}\/data@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^logfile .*\$@logfile ${redis_install_DIR}\/log\/redis-6379.log@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^pidfile .*\$@pidfile ${redis_install_DIR}\/run\/redis-6379.pid@" ${redis_install_DIR}/etc/redis.conf

chown -R redis:redis ${redis_install_DIR}

cat >> /etc/sysctl.conf <<EOF
net.core.somaxconn = 1024
vm.overcommit_memory = 1
EOF
sysctl -p

echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
source /etc/rc.d/rc.local


# 准备service服务
cat > /usr/lib/systemd/system/redis.service <<EOF
[Unit]
Description=redis persistent key-value database
After=network.target

[Service]
ExecStart=${redis_install_DIR}/bin/redis-server ${redis_install_DIR}/etc/redis.conf --supervised systemd
ExecStop=/bin/kill -s QUIT \$MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target
EOF

chown -R redis:redis ${redis_install_DIR}
systemctl daemon-reload
systemctl enable --now redis
systemctl is-active redis

if [ $? -ne 0 ];then
	color "redis服务启动失败！" 1
	exit 1
else
	color "redis服务启动成功" 0
	color "redis安装已完成" 0
fi
}


download_redis

install_redis

exit 0

master 节点配置

#修改redis.conf配置
vim /apps/redis/etc/redis.conf
bind 0.0.0.0
masterauth "123456"
requirepass "123456"

#重启redis
systemctl restart redis

slave 节点配置

#修改redis.conf配置
vim /apps/redis/etc/redis.conf
bind 0.0.0.0
masterauth "123456"
requirepass "123456"
replicaof 10.0.0.7 6379

#重启redis
systemctl restart redis

状态查看

master

[root@master ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.27,port=6379,state=online,offset=28,lag=1
slave1:ip=10.0.0.17,port=6379,state=online,offset=28,lag=1
master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:28
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:28
127.0.0.1:6379>

slave1

[root@slave1 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:9
master_sync_in_progress:0
slave_repl_offset:154
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:154
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:154
127.0.0.1:6379>

slave2

[root@slave2 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:210
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:210
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:210
127.0.0.1:6379>

2）编辑哨兵配置文件

Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口。

哨兵可以不和Redis服务器部署在一起，但一般部署在一起。

a）配置sentinel文件

cp /usr/local/src/redis/sentinel.conf /apps/redis/etc/redis-sentinel.conf
cd /apps/redis/etc/
#配置sentinel
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes

#启动sentinel
[root@master etc]# redis-sentinel /apps/redis/etc/redis-sentinel.conf 
#查看sentinel配置信息
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data

sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel parallel-syncs mymaster 1
sentinel down-after-milliseconds mymaster 3000
sentinel auth-pass mymaster 123456
sentinel config-epoch mymaster 0
#以下内容为自动生成
sentinel myid c663d4b9db845d721cd6dccf608c7904d896b745      #myid必须唯一
protected-mode no
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.0.27 6379
sentinel known-replica mymaster 10.0.0.17 6379
sentinel known-sentinel mymaster 10.0.0.27 26379 66f276f274802c6f0243007a2be4b04001b9867e
sentinel known-sentinel mymaster 10.0.0.17 26379 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
sentinel current-epoch 0

b）配置sentinel服务

[root@shichu ~]# cat /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/redis-sentinel.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target

c）启动sentinel服务

chown -R redis:redis /apps/redis
systemctl daemon-reload
systemctl enable --now redis-sentinel

d）sentinel配置参数说明

sentinel monitor mymaster 10.0.0.8 6379 2 # 指定当前mymaster集群中master服务器的地址和端口,2为法定人数限制(quorum)，即有几个sentinel认为master down了就进行故障转移，一般此值是所有sentinel节点(一般总数是>=3的奇数,如:3,5,7等)的一半以上的整数值，比如，总数是3，即3/2=1.5，取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456 #mymaster集群中master的密码，注意此行要在上面行的下面

sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间，单位：毫秒，建议3000

sentinel parallel-syncs mymaster 1 #发生故障转移后，同时向新master同步数据的slave数量，数字越小总同步时间越长，但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间，单位：毫秒

sentinel deny-scripts-reconfig yes #禁止修改脚本

e）查看端口

[root@master etc]# ss -ntl
State       Recv-Q Send-Q                                  Local Address:Port                                                 Peer Address:Port            
LISTEN      0      100                                         127.0.0.1:25                                                              *:*                
LISTEN      0      511                                                 *:26379                                                           *:*                
LISTEN      0      511                                                 *:6379                                                            *:*                
LISTEN      0      128                                                 *:111                                                             *:*                
LISTEN      0      128                                                 *:22                                                              *:*                
LISTEN      0      100                                             [::1]:25                                                           [::]:*                
LISTEN      0      128                                              [::]:111                                                          [::]:*                
LISTEN      0      128                                              [::]:22

f）查看sentinel日志

master日志

[root@master redis]# tail /apps/redis/log/sentinel_26379.log
1491:X 11 Jul 2022 16:38:43.636 * supervised by systemd, will signal readiness
1491:X 11 Jul 2022 16:38:43.637 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1491:X 11 Jul 2022 16:38:43.637 * Running mode=sentinel, port=26379.
1491:X 11 Jul 2022 16:38:43.638 # Sentinel ID is c663d4b9db845d721cd6dccf608c7904d896b745
1491:X 11 Jul 2022 16:38:43.638 # +monitor master mymaster 10.0.0.7 6379 quorum 2
1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:39:20.763 # -sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:39:48.855 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379

slave1日志

[root@slave1 ~]# tail /apps/redis/log/sentinel_26379.log
1293:X 11 Jul 2022 16:39:19.722 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1293, just started
1293:X 11 Jul 2022 16:39:19.722 # Configuration loaded
1293:X 11 Jul 2022 16:39:19.722 * supervised by systemd, will signal readiness
1293:X 11 Jul 2022 16:39:19.723 * Increased maximum number of open files to 4096 (it was originally set to 1024).
1293:X 11 Jul 2022 16:39:19.724 * Running mode=sentinel, port=26379.
1293:X 11 Jul 2022 16:39:19.724 # Sentinel ID is 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
1293:X 11 Jul 2022 16:39:19.724 # +monitor master mymaster 10.0.0.7 6379 quorum 2
1293:X 11 Jul 2022 16:39:22.777 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
1293:X 11 Jul 2022 16:39:48.988 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379

slave2日志

[root@slave2 ~]# tail /apps/redis/log/sentinel_26379.log
900:X 11 Jul 2022 16:32:23.322 # +sdown sentinel 605f713c7e6554ae0bfed0b98304e29d6a69e678 10.0.0.37 26379 @ mymaster 10.0.0.7 6379
1256:X 11 Jul 2022 16:39:48.523 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1256:X 11 Jul 2022 16:39:48.523 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1256, just started
1256:X 11 Jul 2022 16:39:48.523 # Configuration loaded
1256:X 11 Jul 2022 16:39:48.523 * supervised by systemd, will signal readiness
1256:X 11 Jul 2022 16:39:48.524 * Increased maximum number of open files to 4096 (it was originally set to 1024).
1256:X 11 Jul 2022 16:39:48.525 * Running mode=sentinel, port=26379.
1256:X 11 Jul 2022 16:39:48.525 # Sentinel ID is 66f276f274802c6f0243007a2be4b04001b9867e
1256:X 11 Jul 2022 16:39:48.525 # +monitor master mymaster 10.0.0.7 6379 quorum 2

g）查看sentinel状态

[root@master redis]# redis-cli -a 123456 -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> info sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.7:6379,slaves=2,sentinels=3
#两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突

3. 模拟故障转移

停止master redis

[root@master etc]# systemctl stop redis
[root@master etc]# ss -ntl
State       Recv-Q Send-Q                                  Local Address:Port                                                 Peer Address:Port      
LISTEN      0      100                                         127.0.0.1:25                                                              *:*          
LISTEN      0      511                                                 *:26379                                                           *:*          
LISTEN      0      128                                                 *:111                                                             *:*          
LISTEN      0      128                                                 *:22                                                              *:*          
LISTEN      0      100                                             [::1]:25                                                           [::]:*          
LISTEN      0      128                                              [::]:111                                                          [::]:*          
LISTEN      0      128                                              [::]:22

故障转移时sentinel信息

[root@master redis]# tail -f /apps/redis/log/sentinel_26379.log 
1491:X 11 Jul 2022 17:07:16.959 # +sdown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.044 # +odown master mymaster 10.0.0.7 6379 #quorum 2/2
1491:X 11 Jul 2022 17:07:17.044 # +new-epoch 4
1491:X 11 Jul 2022 17:07:17.044 # +try-failover master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.045 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.048 # 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.050 # 66f276f274802c6f0243007a2be4b04001b9867e voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.102 # +elected-leader master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.102 # +failover-state-select-slave master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 # +selected-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 * +failover-state-send-slaveof-noone slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.269 * +failover-state-wait-promotion slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +promoted-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +failover-state-reconf-slaves master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.145 * +slave-reconf-sent slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-inprog slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-done slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # -odown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +failover-end master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379        #可看出master节点已转移到10.0.0.27上
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:22.276 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379

日志参数说明

+reset-master ：主服务器已被重置。
+slave ：一个新的从服务器已经被 Sentinel 识别并关联。
+failover-state-reconf-slaves ：故障转移状态切换到了 reconf-slaves 状态。
+failover-detected ：另一个 Sentinel 开始了一次故障转移操作，或者一个从服务器转换成了主服务器。
+slave-reconf-sent ：领头（leader）的 Sentinel 向实例发送了 SLAVEOF 命令，为实例设置新的主服务器。
+slave-reconf-inprog ：实例正在将自己设置为指定主服务器的从服务器，但相应的同步过程仍未完成。
+slave-reconf-done ：从服务器已经成功完成对新主服务器的同步。
-dup-sentinel ：对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候，就会出现这种情况。
+sentinel ：一个监视给定主服务器的新 Sentinel 已经被识别并添加。
+sdown ：给定的实例现在处于主观下线状态。
-sdown ：给定的实例已经不再处于主观下线状态。
+odown ：给定的实例现在处于客观下线状态。
-odown ：给定的实例已经不再处于客观下线状态。
+new-epoch ：当前的纪元（epoch）已经被更新。
+try-failover ：一个新的故障迁移操作正在执行中，等待被大多数 Sentinel 选中（waiting to be elected by the majority）。
+elected-leader ：赢得指定纪元的选举，可以进行故障迁移操作了。
+failover-state-select-slave ：故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。
no-good-slave ：Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级，又或者直接放弃执行故障转移操作。
selected-slave ：Sentinel 顺利找到适合进行升级的从服务器。
failover-state-send-slaveof-noone ：Sentinel 正在将指定的从服务器升级为主服务器，等待升级功能完成。
failover-end-for-timeout ：故障转移因为超时而中止，不过最终所有从服务器都会开始复制新的主服务器（slaves will eventually be configured to replicate with the new master anyway）。
failover-end ：故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。
+switch-master ：配置变更，主服务器的 IP 和地址已经改变。这是绝大多数外部用户都关心的信息。
+tilt ：进入 tilt 模式。
-tilt ：退出 tilt 模式。

故障转移后

redis配置文件中replicaof的master IP自动修改

[root@slave1 ~]# grep "^replicaof" /apps/redis/etc/redis.conf 
replicaof 10.0.0.27 6379

sentinel配置文件的sentinel monitor IP自动修改

[root@slave1 ~]# grep "^sentinel monitor" /apps/redis/etc/redis-sentinel.conf 
sentinel monitor mymaster 10.0.0.27 6379 2

redis状态

新master状态

[root@slave2 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.17,port=6379,state=online,offset=4290787,lag=1
master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
master_repl_offset:4290787
second_repl_offset:3910006
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3242212
repl_backlog_histlen:1048576
127.0.0.1:6379>

另一个slave指向新的master

[root@slave1 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:4296387
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
master_repl_offset:4296387
second_repl_offset:3910006
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3247812
repl_backlog_histlen:1048576
127.0.0.1:6379>

恢复原故障master重新加入redis集群

[root@master redis]# systemctl start redis

原master状态

#redis配置指向新的master节点
[root@master redis]# grep "^replicaof" /apps/redis/etc/redis.conf
replicaof 10.0.0.27 6379

#查看redis状态
[root@master redis]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:4366815
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:4366815
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:4343555
repl_backlog_histlen:23261

#查看sentinel状态
[root@master redis]# redis-cli -a 123456 -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.27:6379,slaves=2,sentinels=3

新master状态

#redis状态
[root@slave2 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.17,port=6379,state=online,offset=4407027,lag=0
slave1:ip=10.0.0.7,port=6379,state=online,offset=4407160,lag=0
master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
master_repl_offset:4407293
second_repl_offset:3910006
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3358718
repl_backlog_histlen:1048576


#sentinel日志
[root@slave2 ~]# tail -f /apps/redis/log/sentinel_26379.log
1256:X 11 Jul 2022 17:07:17.049 # +new-epoch 4
1256:X 11 Jul 2022 17:07:17.052 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
1256:X 11 Jul 2022 17:07:17.068 # +odown master mymaster 10.0.0.7 6379 #quorum 3/2
1256:X 11 Jul 2022 17:07:17.068 # Next failover delay: I will not start a failover before Mon Jul 11 17:13:17 2022
1256:X 11 Jul 2022 17:07:18.149 # +config-update-from sentinel c663d4b9db845d721cd6dccf608c7904d896b745 10.0.0.7 26379 @ mymaster 10.0.0.7 6379
1256:X 11 Jul 2022 17:07:18.149 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379
1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1256:X 11 Jul 2022 17:07:21.189 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1256:X 11 Jul 2022 17:43:54.361 # -sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379

sentinel运维

手动让主节点下线

sentinel failover <masterName>

范例

#可指定优先级,值越小sentinel会优先将之选为新的master,默为值为100
[root@slave1 ~]# grep 'replica-priority' /apps/redis/etc/redis.conf 
replica-priority 30

[root@slave1 ~]# redis-cli -a 123456 -p 26379
127.0.0.1:26379> sentinel failover mymaster
OK
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3

四、简述 redis 集群的实现原理

Redis Cluster特点

所有Redis节点使用(PING机制)互联
集群中某个节点的是否失效，是由整个集群中超过半数的节点监测都失效，才能算真正的失效
客户端不需要proxy即可直接连接redis，应用程序中需要配置有全部的redis服务器IP
redis cluster把所有的redis node 平均映射到 0-16383个槽位(slot)上，读写需要到指定的redis node上进行操作，因此有多少个redis node相当于redis 并发扩展了多少倍，每个redis node 承担16384/N个槽位
Redis cluster预先分配16384个(slot)槽位，当需要在redis集群中写入一个key -value的时候，会使用CRC16(key) mod 16384之后的值，决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上，从而有效解决单机瓶颈。

Redis cluster 架构

五、基于 redis5 的 redis cluster 部署

官方文档：https://redis.io/topics/cluster-tutorial

创建Redis Cluster准备条件

每个redis 节点采用相同的硬件配置、相同的密码、相同的redis版本
所有redis服务器必须没有任何数据
准备6台机器，三主三从架构

#集群节点
Redis-node1:10.0.0.7
Redis-node2:10.0.0.17
Redis-node3:10.0.0.27
Redis-node4: 10.0.0.37
Redis-node5: 10.0.0.47
Redis-node6: 10.0.0.57
#预留节点
10.0.0.67
10.0.0.77

部署redis cluster

1. 安装redis

修改redis配置

[root@node1 etc]# cat redis.conf 
...
bind 0.0.0.0
masterauth 123456   #建议配置，否则后期的master和slave主从复制无法成功，还需再配置
requirepass 123456
cluster-enabled yes #取消此行注释,必须开启集群，开启后redis 进程会有cluster显示
cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护
cluster-require-full-coverage no   #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能
...

[root@node1 etc]#systemctl enable --now redis

2. 查看当前redis状态

#查看端口
[root@node1 ~]# ss -ntl
State      Recv-Q Send-Q                Local Address:Port                               Peer Address:Port        
LISTEN     0      511                               *:6379                                          *:*            
LISTEN     0      128                               *:111                                           *:*            
LISTEN     0      128                               *:22                                            *:*            
LISTEN     0      100                       127.0.0.1:25                                            *:*            
LISTEN     0      511                               *:16379                                         *:*            
LISTEN     0      128                            [::]:111                                        [::]:*            
LISTEN     0      128                            [::]:22                                         [::]:*            
LISTEN     0      100                           [::1]:25                                         [::]:*

#查看进程有[cluster]状态
[root@node1 ~]# ps aux|grep redis
redis     24754  0.2  0.3 153996  3172 ?        Ssl  21:28   0:02 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]
root      24822  0.0  0.0 112812   980 pts/0    R+   21:44   0:00 grep --color=auto redis

3. 创建集群

[root@node1 ~]# redis-cli -a 123456 --cluster create 10.0.0.7:6379 10.0.0.17:6379 10.0.0.27:6379 10.0.0.37:6379 \
10.0.0.47:6379 10.0.0.57:6379 --cluster-replicas 1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.47:6379 to 10.0.0.7:6379
Adding replica 10.0.0.57:6379 to 10.0.0.17:6379
Adding replica 10.0.0.37:6379 to 10.0.0.27:6379
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379	#带M的为master
   slots:[0-5460] (5461 slots) master				#当前master的槽位起始和结束位
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379	#带S的slave
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   replicates 12fdc235442ed40a838e77b246025799b4b3357b
Can I set the above configuration? (type 'yes' to accept): yes	#输入yes自动创建集群
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.7:6379)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
   slots:[0-5460] (5461 slots) master				#已经分配的槽位
   1 additional replica(s)					#分配了一个slave
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
   slots: (0 slots) slave					#slave没有分配槽位
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7		#对应的master的10.0.0.27的ID
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   slots: (0 slots) slave
   replicates 12fdc235442ed40a838e77b246025799b4b3357b		#对应的master的10.0.0.17的ID
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   slots: (0 slots) slave
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec		#对应的master的10.0.0.7的ID
[OK] All nodes agree about slots configuration.		#所有节点槽位分配完成
>>> Check for open slots...				#检查打开的槽位
>>> Check slots coverage...				#检查插槽覆盖范围
[OK] All 16384 slots covered.				 #所有槽位(16384个)分配完成	
[root@node1 ~]#

观察以上结果，可以看到3组master/slave

master:10.0.0.7-->slave:10.0.0.47
master:10.0.0.17-->slave:10.0.0.57
master:10.0.0.27-->slave:10.0.0.37

4. 查看主从状态

node1（10.0.0.7）

[root@node1 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.47,port=6379,state=online,offset=1008,lag=1
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node2（10.0.0.17）

[root@node2 etc]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.57,port=6379,state=online,offset=1008,lag=0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node3（10.0.0.27）

[root@node3 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.37,port=6379,state=online,offset=1008,lag=0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node4（10.0.0.37）

[root@node4 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node5（10.0.0.47）

[root@node5 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:4
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node6（10.0.0.57）

[root@node6 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:10
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

查看指定master节点的slave节点信息

#获取所有节点信息
[root@node1 ~]# redis-cli -a 123456 cluster nodes 2>/dev/null
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554345797 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657554345000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657554343746 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657554344770 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657554344000 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657554344000 5 connected

#查看master节点ID对应的slave节点信息，16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7为10.0.0.27 master节点ID
[root@node1 ~]# redis-cli -a 123456 cluster slaves 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 2>/dev/null
1) "59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554778157 4 connected"

5. 验证集群状态

[root@node1 ~]# redis-cli -a 123456 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6		#6个节点
cluster_size:3			#3组集群
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3639
cluster_stats_messages_pong_sent:3625
cluster_stats_messages_sent:7264
cluster_stats_messages_ping_received:3620
cluster_stats_messages_pong_received:3639
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:7264

#查看任意节点的集群状态
[root@node1 ~]# redis-cli -a 123456 --cluster info 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

查看集群node对应关系

#获取集群中所有节点
[root@node1 ~]# redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657556036000 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657556036000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657556036033 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657556038079 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657556037057 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657556036000 5 connected


[root@node1 ~]# redis-cli -a 123456 --cluster check 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.27:6379)
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
   slots: (0 slots) slave
   replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
   slots: (0 slots) slave
   replicates 12fdc235442ed40a838e77b246025799b4b3357b
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
   slots: (0 slots) slave
   replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

验证集群写入

#连接节点，可能会出现槽位不在当前node所以无法写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7
10.0.0.7:6379> set key1 v1
(error) MOVED 9189 10.0.0.17:6379
#需要连接指定node，才可写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.17
10.0.0.17:6379> set key1 values1
OK
10.0.0.17:6379> get key1
"values1"


#使用选项-c以集群方式连接，连接至集群中任意一节点均可
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 -c
10.0.0.7:6379> set key1 v1
-> Redirected to slot [9189] located at 10.0.0.17:6379
OK
10.0.0.17:6379> get key1
"v1"

六、部署 Zabbix 监控

官网下载地址：https://www.zabbix.com/cn/download

官网文档：https://www.zabbix.com/manuals

https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz

使用LNMP编译安装Zabbix 5

L：Linux（CentOS7）https://mirrors.aliyun.com/centos/7/isos/x86_64/
N：Nginx（1.18.0） https://nginx.org/en/download.html
M：MySQL（8.0.19） https://dev.mysql.com/downloads/mysql/
P：PHP（7.4.11）   http://php.net/downloads.php
Zabbix （5.0.25）  https://cdn.zabbix.com/zabbix/sources/

graph LR A[Client] B[LinuxNginxPHPZabbix10.0.0.100] C[LinuxMySQL10.0.0.200] A--->B--->C

1. 安装MySQL

参考：

安装完成后创建zabbix用户

mysql -uroot -p123456 -e "create database zabbix character set utf8 collate utf8_bin;"
mysql -uroot -p123456 -e "create user zabbix@'10.0.0.%' identified by '123456'"
mysql -uroot -p123456 -e "grant all privileges on zabbix.* to zabbix@'10.0.0.%'"
mysql -uroot -p123456 -e "use mysql;\
alter user zabbix@'10.0.0.%' identified with mysql_native_password by '123456';\
flush privileges;"

2. 安装Nginx

参考：[]https://www.cnblogs.com/areke/p/16482954.html

3. 安装PHP

参考：[]https://www.cnblogs.com/areke/p/16482958.html

4. 安装Zabbix

安装zabbix_server

#!/bin/bash
# 编译安装Zabbix

source /etc/init.d/functions
#Zabbix版本
Zabbix_Version=zabbix-5.0.25
Suffix=tar.gz
Zabbix=${Zabbix_Version}.${Suffix}

Password=123456

#Zabbix源码下载地址
Zabbix_url=https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz

#Zabbix安装路径
Zabbix_install_DIR=/apps/zabbix

# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`

color () {
if [[ $2 -eq 0 ]];then
    echo -e "\e[1;32m$1\t\t\t\t\t\t[  OK  ]\e[0;m"
else
    echo $2
    echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m"
fi
}


install_Zabbix (){
#----------------------------下载源码包-----------------------------
cd /opt
if [ -e ${Zabbix} ];then
	color "Zabbix源码包已存在" 0
else
	color "开始下载Zabbix源码包" 0
	wget ${Zabbix_url}
	if [ $? -ne 0 ];then
		color "下载Zabbix源码包失败，退出！" 1
		exit 1
	fi
fi


#----------------------------解压源码包-----------------------------
color "开始解压源码包" 0
tar -zxvf /opt/${Zabbix} -C /usr/local/src
ln -s /usr/local/src/${Zabbix_Version} /usr/local/src/zabbix


#----------------------------安装依赖包-------------------------------- 
color "开始安装依赖包" 0

#wget https://dev.mysql.com/get/mysql80-community-release-el7-6.noarch.rpm

yum install -y gcc libxml2-devel net-snmp net-snmp-devel curl curl-devel php-gd php-bcmath php-xml \
php-mbstring mariadb mariadb-devel OpenIPMI-devel libevent-devel java-1.8.0-openjdk-devel \
|| { color "安装依赖包失败，请检查网络" 1 ;exit 1;}


#---------------------------创建Zabbix用户---------------------------
if id zabbix &> /dev/null ;then
	color "Zabbix用户已存在" 1
else
	groupadd --system zabbix
	useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix
	color "Zabbix用户已创建完成" 0
fi

#---------------------------编译---------------------------
color "开始编译zabbix" 0
cd /usr/local/src/zabbix
./configure --prefix=${Zabbix_install_DIR} \
--enable-server \
--enable-agent \
--with-mysql \
--with-net-snmp \
--with-libcurl \
--with-libxml2 \
--with-openipmi \
--enable-proxy \
--enable-java

make -j ${CPUS} install
if [ $? -ne 0 ];then
	color "Zabbix 编译安装失败！" 1
	exit 1
else
	color "Zabbix编译安装成功" 0
fi

#复制web界面相关文件
mkdir -pv /home/nginx/zabbix
cp -rf /usr/local/src/zabbix/ui/* /home/nginx/zabbix/
chown nginx:nginx -R /home/nginx/zabbix

/apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
if [ $? -eq 0 ];then
	color "zabbix_server测试能正常启动" 0
	pkill zabbix
fi

color "zabbix安装完成" 0
}

install_Zabbix

exit 0

修改配置文件

修改/apps/nginx/conf/nginx.conf配置文件

worker_processes   1;
pid 		logs/nginx.pid;
events {
		worker_connections	1024;
}
http {
	include			mime.types;
	default_type	application/octet-stream;
	sendfile		on;
	keepalive_timeout	65;
	server {
		listen		80;
		server_name	10.0.0.100;				#指定主机名
		server_tokens off;						#隐藏nginx版本信息

		location / {
			root	/home/nginx/zabbix;				#指定数据目录
			index	index.php index.html index.htm;			#指定默认主页
		}

		error_page	500 502 503 504 /50x.html;

		location = /50x.html {
			root	html;
		}

		location ~ \.php$ {						#实现php-fpm
			root		/home/nginx/zabbix;
			fastcgi_pass	127.0.0.1:9000;
			fastcgi_index	index.php;
			fastcgi_param	SCRIPT_FILENAME	$document_root$fastcgi_script_name;
			include		fastcgi_params;
			fastcgi_hide_header X-Powered-By;			#隐藏php版本信息
		}

		location ~ ^/(ping|pm_status)$ {				#实现状态页
			include		fastcgi_params;
			fastcgi_pass	127.0.0.1:9000;
			fastcgi_param	PATH_TRANSLATED	$document_root$fastcgi_script_name;
		}
	}
}

修改php配置文件

#修改/etc/php.ini
sed -i -e "/memory_limit/c memory_limit = 256M" \
-e "/post_max_size/c post_max_size = 30M" \
-e "/upload_max_filesize/c upload_max_filesize = 20M" \
-e "/max_execution_time/c max_execution_time = 300" \
-e "/max_input_time/c max_input_time = 300" \
-e "/;date.timezone/c date.timezone = Asia/Shanghai" \
/etc/php.ini

#修改/apps/php/etc/php-fpm.d/www.conf
sed -i -e "/user = www/c user = nginx" \
-e "/group = www/c group = nginx" /apps/php/etc/php-fpm.d/www.conf

重启服务

systemctl restart nginx php-fpm

导入mysql数据

mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/schema.sql
mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/images.sql
mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/data.sql

修改zabbix配置文件

sed -i "/# DBHost=localhost/aDBHost=10.0.0.200" /apps/zabbix/etc/zabbix_server.conf
sed -i "/# DBPassword=/aDBPassword=123456" /apps/zabbix/etc/zabbix_server.conf
sed -i "/# DBPort=/aDBPort=3306" /apps/zabbix/etc/zabbix_server.conf
sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /apps/zabbix/etc/zabbix_server.conf

设置zabbix_server启动服务脚本

cat /lib/systemd/system/zabbix-server.service

[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target

[Service]
Environment="CONFFILE=/apps/zabbix/etc/zabbix_server.conf"
EnvironmentFile=-/etc/default/zabbix-server
Type=forking
Restart=on-failure
PIDFile=/tmp/zabbix_server.pid
KillMode=control-group
ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target

启动服务

systemctl daemon-reload
systemctl enable --now zabbix-server

设置zabbix_agent启动服务脚本

cat /lib/systemd/system/zabbix-agent.service

[Unit]
Description=Zabbix Agent
After=syslog.target
After=network.target

[Service]
Environment="CONFFILE=/apps/zabbix/etc/zabbix_agentd.conf"
EnvironmentFile=-/etc/default/zabbix-agent
Type=forking
Restart=on-failure
PIDFile=/tmp/zabbix_agentd.pid
KillMode=control-group
ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
User=zabbix
Group=zabbix

[Install]
WantedBy=multi-user.target

启动服务

systemctl daemon-reload
systemctl enable --now zabbix-agent

查看状态

10050、10051端口启动正常

#可看到10050（agent）、10051（server）端口
[root@shichu apps]# ss -ntl
State      Recv-Q Send-Q               Local Address:Port                              Peer Address:Port          
LISTEN     0      128                              *:22                                           *:*              
LISTEN     0      100                      127.0.0.1:25                                           *:*              
LISTEN     0      128                              *:10050                                        *:*              
LISTEN     0      128                              *:10051                                        *:*              
LISTEN     0      128                      127.0.0.1:9000                                         *:*              
LISTEN     0      128                              *:111                                          *:*              
LISTEN     0      128                              *:80                                           *:*              
LISTEN     0      128                           [::]:22                                        [::]:*              
LISTEN     0      100                          [::1]:25                                        [::]:*              
LISTEN     0      128                           [::]:111                                       [::]:*

zabbix-sever服务状态

[root@shichu apps]# systemctl status zabbix-server
● zabbix-server.service - Zabbix Server
   Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-07-14 00:47:09 CST; 52s ago
  Process: 8346 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS)
  Process: 8352 ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 8360 (zabbix_server)
   CGroup: /system.slice/zabbix-server.service
           ├─8360 /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
           ├─8362 /apps/zabbix/sbin/zabbix_server: configuration syncer [synced configuration in 0.059399 sec, idle 6...
           ├─8363 /apps/zabbix/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.027609 sec durin...
           ├─8364 /apps/zabbix/sbin/zabbix_server: alerter #1 started
           ├─8365 /apps/zabbix/sbin/zabbix_server: alerter #2 started
           ├─8366 /apps/zabbix/sbin/zabbix_server: alerter #3 started
           ├─8367 /apps/zabbix/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 11 values, idle 5.00...
           ├─8368 /apps/zabbix/sbin/zabbix_server: preprocessing worker #1 started
           ├─8369 /apps/zabbix/sbin/zabbix_server: preprocessing worker #2 started
           ├─8370 /apps/zabbix/sbin/zabbix_server: preprocessing worker #3 started
           ├─8371 /apps/zabbix/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.008702sec during 5.0...
           ├─8372 /apps/zabbix/sbin/zabbix_server: lld worker #1 started
           ├─8373 /apps/zabbix/sbin/zabbix_server: lld worker #2 started
           ├─8374 /apps/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
           ├─8375 /apps/zabbix/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001868 sec, id...
           ├─8376 /apps/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.001502 sec, idle 5 sec]
           ├─8377 /apps/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.004759 sec, idle 60 sec]
           ├─8378 /apps/zabbix/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000050 sec,...
           ├─8379 /apps/zabbix/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000175 sec,...
           ├─8380 /apps/zabbix/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000029 sec,...
           ├─8381 /apps/zabbix/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000019 sec,...
           ├─8382 /apps/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.004440 sec, idle 3 sec]...
           ├─8383 /apps/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000028 sec, id...
           ├─8384 /apps/zabbix/sbin/zabbix_server: self-monitoring [processed data in 0.000016 sec, idle 1 sec]
           ├─8385 /apps/zabbix/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000836 sec, idle 5 sec]
           ├─8386 /apps/zabbix/sbin/zabbix_server: poller #1 [got 0 values in 0.000050 sec, idle 1 sec]
           ├─8387 /apps/zabbix/sbin/zabbix_server: poller #2 [got 0 values in 0.000048 sec, idle 1 sec]
           ├─8388 /apps/zabbix/sbin/zabbix_server: poller #3 [got 1 values in 0.001602 sec, idle 1 sec]
           ├─8389 /apps/zabbix/sbin/zabbix_server: poller #4 [got 0 values in 0.000019 sec, idle 1 sec]
           ├─8390 /apps/zabbix/sbin/zabbix_server: poller #5 [got 0 values in 0.001402 sec, idle 1 sec]
           ├─8391 /apps/zabbix/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000039 sec, idle 5 sec]
           ├─8392 /apps/zabbix/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection...
           ├─8393 /apps/zabbix/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection...
           ├─8394 /apps/zabbix/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection...
           ├─8395 /apps/zabbix/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection...
           ├─8396 /apps/zabbix/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection...
           ├─8397 /apps/zabbix/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000020 sec, idle 5 sec]
           └─8398 /apps/zabbix/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001557 ...

Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Server...
Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Server.

zabbix-agent服务状态

[root@shichu apps]# systemctl status zabbix-agent
● zabbix-agent.service - Zabbix Agent
   Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-07-14 00:47:09 CST; 58s ago
  Process: 8349 ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 8353 (zabbix_agentd)
   CGroup: /system.slice/zabbix-agent.service
           ├─8353 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf
           ├─8354 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
           ├─8355 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
           ├─8356 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
           ├─8357 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
           └─8358 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]

Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Agent...
Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Agent.

启动

5. 配置Web界面

初始化设置

浏览器访问本地IP（10.0.0.100）

本地环境检查
配置数据库信息

配置zabbix信息

信息确认

创建配置

需要手动下载配置文件上传至zabbix sever的/home/nginx/zabbix/conf/目录下

完成安装

默认用户名：Admin	#注意A是大写
密码：zabbix

进入首页

优化设置

设置中文菜单

显示中文

解决监控项乱码

监控项存在乱码

从Windows选择一种字体，如楷体（simkai.ttf）

具体路径为:/home/nginx/zabbix/assets/fonts

修改zabbix调用字体

vim /home/nginx/zabbix/include/defines.inc.php
#修改如下两处即可
//define('ZBX_GRAPH_FONT_NAME',     'DejaVuSans'); // font file name
define('ZBX_GRAPH_FONT_NAME',       'simkai'); // font file name 


#define('ZBX_FONT_NAME', 'DejaVuSans');
define('ZBX_FONT_NAME', 'simkai');

验证字体生效

字体自动生效，无需重启zabbix及nginx服务

七、实现 Nginx、Mysql 的监控

flowchart TB zabbix[Zabbix Server10.0.0.100] mysql-m[Master10.0.0.17] mysql-s[Slave10.0.0.27] nginx[Nginx10.0.0.7] subgraph Mysql mysql-m<-->mysql-s end zabbix--->nginx zabbix--->Mysql

1. 安装zabbix agent

通过yum安装agent yum install zabbix50-agent

修改agent配置文件

[root@nginx ~]# grep '^[a-Z]' /etc/zabbix_agentd.conf 
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.0.100		#zabbix-server的IP或Proxy的IP
ListenPort=10050		#监听端口，默认值
StartAgents=3			#被动状态是默认启动的进程数，为0不监听任何端口
ServerActive=10.0.0.100		#主动模式下的zabbix-server的IP或Proxy的IP
Hostname=10.0.0.7		#区分大小写且在zabbix server中值唯一，默认填本机IP
Include=/etc/zabbix_agentd.conf.d/*.conf	#在文件末尾新增子配置文件路径

启动服务

mkdir -p /etc/zabbix_agentd.conf.d

systemctl start zabbix-agent

查看状态

[root@nginx ~]# systemctl status zabbix-agent
● zabbix-agent.service - Zabbix Monitor Agent
   Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-07-14 16:07:35 CST; 1s ago
 Main PID: 1511 (zabbix_agentd)
   CGroup: /system.slice/zabbix-agent.service
           ├─1511 /usr/sbin/zabbix_agentd -f
           ├─1512 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
           ├─1513 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
           ├─1514 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
           └─1515 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]

Jul 14 16:07:35 nginx systemd[1]: Stopped Zabbix Monitor Agent.
Jul 14 16:07:35 nginx systemd[1]: Started Zabbix Monitor Agent.
Jul 14 16:07:35 nginx zabbix_agentd[1511]: Starting Zabbix Agent [10.0.0.7]. Zabbix 5.0.21 (revision 47104dd574).
Jul 14 16:07:35 nginx zabbix_agentd[1511]: Press Ctrl+C to exit.

web界面添加被监控主机

配置——主机——创建主机

2. 实现监控Nginx

准备nginx状态页

#添加nginx状态配置
[root@nginx ~]# cat /etc/nginx/nginx.conf
#在server块中添加状态页信息
...
        location /nginx_status {
            stub_status;
            allow 10.0.0.0/24;
            allow 127.0.0.1;
        }

准备nginx监控脚本

[root@nginx etc]# cat /etc/zabbix_agentd.d/nginx_status.sh
#!/bin/bash 

nginx_status_fun(){			#函数内容
	NGINX_PORT=$1			#端口，函数的第一个参数是脚本的第二个参数，即脚本的第二个参数是端口号
	NGINX_COMMAND=$2 		#命令，函数的第二个参数是脚本的第三个参数，即脚本的第三个参数是命令
	nginx_active(){			#获取nginx_active数量，以下相同，这是开启了nginx状态但是只能从本机看到
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Active' | awk '{print $NF}'
		}
	nginx_reading(){		#获取状态的数量
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Reading' | awk '{print $2}'
		}
	nginx_writing(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Writing' | awk '{print $4}'
		}
	nginx_waiting(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Waiting' | awk '{print $6}'
		}
	nginx_accepts(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $1}'
		}
	nginx_handled(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $2}'
		}
	nginx_requests(){
        /usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $3}'
		}
  	case $NGINX_COMMAND in
		active)
			nginx_active;
			;;
		reading)
			nginx_reading;
			;;
		writing)
			nginx_writing;
			;;
		waiting)
			nginx_waiting;
			;;
		accepts)
			nginx_accepts;
			;;
		handled)
			nginx_handled;
			;;
		requests)
			nginx_requests;
		esac 
}

main(){							#主函数内容
	case $1 in
		nginx_status)				#分支结构，用于判断用户的输入而进行响应的操作
			nginx_status_fun $2 $3;		#当输入nginx_status就调用nginx_status_fun，并传递第二和第三个参数
			;;
		status)					#获取状态码
			curl -I -s http://127.0.0.1/nginx_status 2>/dev/null | awk 'NR==1{print $2}';
	            	;;				# -I仅输出HTTP请求头，-s不输出任何东西
		*)					#其他的输入打印帮助信息
			echo $"Usage: $0 {nginx_status key}"
	esac
}

main $1 $2 $3

添加zabbix agent自定义监控项（通过子配置文件方式）

创建子配置文件

[root@nginx etc]# cat /etc/zabbix_agentd.conf.d/nginx_monitor.conf 
UserParameter=nginx_status[*],/etc/zabbix_agentd.d/nginx_status.sh "$1" "$2" "$3"

验证测试

#重启服务
systemctl restart nginx zabbix-agent

#本地获取所有nginx状态
[root@nginx zabbix_agentd.d]# curl 127.0.0.1/nginx_status
Active connections: 1 
server accepts handled requests
 21 21 21 
Reading: 0 Writing: 1 Waiting: 0 

#本机获取active连接数
[root@nginx zabbix_agentd.d]# /etc/zabbix_agentd.d/nginx_status.sh nginx_status 80 active
1

#server获取active连接数
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.7 -p 10050 -k "nginx_status["nginx_status",80,"active"]"
1

导入监控模板

模板参考：https://files.cnblogs.com/files/blogs/744193/nginx-template.xml

关联模板

查看导入的nginx模板监控项
验证监控

3. 实现监控Mysql

1）搭建mysql主从

master（10.0.0.17）

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=17
log-bin

#重启数据库
systemctl restart mariadb


#创建复制用户
MariaDB [(none)]> create user 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#授权复制用户权限
MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)

#备份数据
[root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 \
--lock-tables > /opt/backup.sql

#将备份数据复制到slave节点
[root@mysql-master ~]# scp /opt/backup.sql 10.0.0.27:/opt/

#查看二进制文件和位置
[root@mysql-master ~]# mysql
MariaDB [(none)]> show master logs;
+--------------------+-----------+
| Log_name           | File_size |
+--------------------+-----------+
| mariadb-bin.000001 |     29733 |
| mariadb-bin.000002 |       245 |
+--------------------+-----------+

2 rows in set (0.00 sec)

slave（10.0.0.27）

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=27
read-only

#重启数据库
systemctl restart mariadb

# 导入master节点备份数据
[root@slave ~]# mysql < /opt/backup.sql

#根据master信息开启同步设置
#其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size（可通过命令show master logs查看）
[root@mysql-slave ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
  MASTER_HOST='10.0.0.17',
  MASTER_USER='repluser',
  MASTER_PASSWORD='',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='mariadb-bin.000001',
  MASTER_LOG_POS=29733,
  MASTER_CONNECT_RETRY=10;

#开启slave
MariaDB [(none)]> start slave;

#显示状态信息
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.17
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: mariadb-bin.000002
          Read_Master_Log_Pos: 245
               Relay_Log_File: mariadb-relay-bin.000003
                Relay_Log_Pos: 531
        Relay_Master_Log_File: mariadb-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
......
             Master_Server_Id: 17

2）利用percona工具实现监控

官网下载地址：https://www.percona.com/downloads/

安装包：https://www.percona.com/downloads/percona-monitoring-plugins/LATEST/

安装percona插件

#下载
wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/redhat/7/x86_64/percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装
yum install -y percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装php
yum install -y php php-mysql

#复制模板
cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix_agentd.conf.d/

#创建连接mysql数据库的php配置文件
vim /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf
<?php
$mysql_user = 'root';
$mysql_pass = ''; 

#重启
systemctl restart zabbix-agent

在zabbix-server上测试

[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.17 -p 10050 -k MySQL.Key-reads
19
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.27 -p 10050 -k MySQL.Key-reads
0

关联主机模板

注意：默认的模板/var/lib/zabbix/percona/templates/zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.8.xml不可用，需要进行修改。

参考：https://files.cnblogs.com/files/blogs/744193/PerconaMySQLServer.xml

查看监控状态

监控类型更改为主动式

验证监控

4. 问题

1. 主动模式下监控数据正常，但ZBX图标为灰色未变绿

解决方法：将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理，再添加Template Module Zabbix agent模板。

ZBX图标变绿

八、zabbix实现故障和恢复的邮件通知

1. 实现故障自治愈

1）agent开启远程执行命令权限

[root@nginx tmp]# grep '^[a-Z]' /etc/zabbix_agentd.conf
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1		#开启远程执行命令功能
Server=10.0.0.100
ListenPort=10050
StartAgents=3
ServerActive=10.0.0.100
Hostname=10.0.0.7
User=zabbix
UnsafeUserParameters=1		#允许远程执行命令的时候使用不安全的参数（特殊的字符串）
Include=/etc/zabbix_agentd.conf.d/*.conf

2）agent添加zabbix用户授权

[root@nginx ~]# vim /etc/sudoers
......
root    ALL=(ALL)   ALL
zabbix ALL=NOPASSWD:ALL		#授权zabbix用户执行特殊命令不再需要密码，比如sudo命令

重启服务

systemctl restart zabbix-agent

3）创建动作

添加动作名称和执行条件

添加具体操作指令

远程执行的命令要写绝对路径

2. 实现邮件通知

1）邮箱开启SMTP

进入个人邮箱，开启SMTP功能

发短信获取授权码

2）创建报警媒介类型

设置邮箱参考：https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=371

密码写前面获取的授权码

3）给用户添加报警媒介

选择Admin用户

选择报警媒介，点击添加

类型选择前面创建的报警媒介，收件人选择要发送信息的对象

更新报警媒介

4）创建动作

在自治愈动作上添加发送邮件操作

添加故障发生时、故障恢复后的操作

发送故障时的邮件通告内容

恢复后的邮件通告内容

3. 验证故障告警邮件及恢复邮件通告功能

1）关闭nginx服务

查看80端口

nginx自动恢复

2）zabbix能够自动执行恢复指令及发送通知邮件

3）登录个人邮箱，查看告警邮件信息

posted @ 2022-07-15 21:46 areke 阅读(179) 评论(0) 收藏举报

刷新页面返回顶部

areke

Redis、Zabbix

一、简述 redis 特点及其应用场景

Redis 特点

Redis 典型应用场景

二、对比 redis 的 RDB、AOF 模式的优缺点

1. RDB（Redis DataBase）模式

RDB 工作原理

RDB 模式优缺点

AOF（AppendOnlyFile）模式

AOF 工作原理

AOF 模式优缺点

RDB 和 AOF 适用场景

三、实现 redis 哨兵，模拟 master 故障场景

1. 工作原理

2. 实现哨兵（sentinel）模式

1）配置一主两从

2）编辑哨兵配置文件

3. 模拟故障转移

四、简述 redis 集群的实现原理

Redis Cluster特点

Redis cluster 架构

五、基于 redis5 的 redis cluster 部署

创建Redis Cluster准备条件

部署redis cluster

1. 安装redis

2. 查看当前redis状态

3. 创建集群

4. 查看主从状态

5. 验证集群状态

查看集群node对应关系

验证集群写入

六、部署 Zabbix 监控

1. 安装MySQL

2. 安装Nginx

3. 安装PHP

4. 安装Zabbix

安装zabbix_server

修改配置文件

5. 配置Web界面

初始化设置

优化设置

设置中文菜单

解决监控项乱码

七、实现 Nginx、Mysql 的监控

1. 安装zabbix agent

2. 实现监控Nginx

3. 实现监控Mysql

1）搭建mysql主从

2）利用percona工具实现监控

4. 问题

八、zabbix实现故障和恢复的邮件通知

1. 实现故障自治愈

1）agent开启远程执行命令权限

2）agent添加zabbix用户授权

3）创建动作

2. 实现邮件通知

1） 邮箱开启SMTP

2） 创建报警媒介类型

3）给用户添加报警媒介

4）创建动作

3. 验证故障告警邮件及恢复邮件通告功能

1）关闭nginx服务

2）zabbix能够自动执行恢复指令及发送通知邮件

3）登录个人邮箱，查看告警邮件信息

公告

1）邮箱开启SMTP

2）创建报警媒介类型