Redis、Zabbix

一、简述 redis 特点及其应用场景

Redis 特点

  • 速度快:10W QPS,基于内存,C 语言实现
  • 持久化
  • 支持多种数据结构:支持 string(字符串)、hash(哈希数据)、list(列表)、set(集合)、zset(有序集合)
  • 支持多种编码语言
  • 功能丰富:支持 Lua 脚本,发布订阅,事务,pipeline 等功能
  • 简单:代码短小精悍(单机核心代码只有 23000 行左右),单线程开发容易,不依赖外部库,使用简单
  • 主从复制
  • 支持高可用和分布式

Redis 典型应用场景

  • session 共享:常见于 Web 集群中的 Tomcat 或 PHP 中多 Web 服务器 session 共享
  • 缓存:数据查询、电商网站商品信息、新闻内容
  • 计数器:访问排行榜、商品浏览数等和次数相关的数值统计场景
  • 微博/微信社交场合:共同好友,粉丝数,关注,点赞评论等
  • 消息队列:ELK 的日志缓存、部分业务的订阅发布系统
  • 地理位置:基于 GEO(地理信息定位),实现摇一摇,附件的人,外卖等功能

二、对比 redis 的 RDB、AOF 模式的优缺点

1. RDB(Redis DataBase)模式

RDB 工作原理

RDB 基于时间的快照,其默认只保留当前最新的一次快照,特点是执行速度比较快,缺点是可能会丢失从上次快照到当前时间点之间未做快照的数据。

RDB bgsave(异步)实现快照具体过程

RDB 模式优缺点

优点

  • RDB 快照保存了某个时间点的数据,可以通过脚本执行 redis 指令 bgsave(非阻塞,后台执行)或者 save(会阻塞写操作,不推荐)命令自定义时间点备份,可以保留多个备份,当出现问题可以恢复到不同时间点的版本,很适合备份,并且此文件格式也支持有不少第三方工具可以进行后续的数据分析。

    比如: 可以在最近的 24 小时内,每小时备份一次 RDB 文件,并且在每个月的每一天,也备份一个 RDB 文件。这样的话,即使遇上问题,也可以随时将数据集还原到不同的版本。

  • RDB 可以最大化 Redis 的性能,父进程在保存 RDB 文件时唯一要做的就是 fork 出一个子进程,然后这个子进程就会处理接下来的所有保存工作,父进程无须执行任何磁盘工/0 操作。

  • RDB 在大量数据,比如几个 G 的数据,恢复的速度比 AOF 的快

缺点

  • 不能实时保存数据,可能会丢失自上一次执行 RDB 备份到当前的内存数据

    如果需要尽量避免在服务器故障时丢失数据,那么 RDB 不适合。虽然 Redis 允许设置不同的保存点(save point)来控制保存 RDB 文件的频率,但是,因为 RDB 文件需要保存整个数据集的状态,所以它并不是一个轻松快速的操作。因此一般会超过 5 分钟以上才保存一次 RDB 文件。在这种情况下,一旦发生故障停机,就可能会丢失好几分钟的数据。

  • 当数据量非常大的时候,从父进程 fork 子进程进行保存至 RDB 文件时需要一点时间,可能是毫秒或者秒,取决于磁盘 IO 性能

    在数据集比较庞大时,fork()可能会非常耗时,造成服务器在一定时间内停止处理客户端﹔如果数据集非常巨大,并且 CPU 时间非常紧张的话,那么这种停止时间甚至可能会长达整整一秒或更久。虽然 AOF 重写也需要进行 fork(),但无论 AOF 重写的执行间隔有多长,数据的持久性都不会有任何损失。

AOF(AppendOnlyFile)模式

AOF 工作原理

AOF 按照操作顺序依次将操作追加到指定的日志文件末尾。

注意:

同时启用 RDB 和 AOF,进行恢复时,默认 AOF 文件优先级高于 RDB 文件,即会使用 AOF 文件进行恢复;

AOF 模式默认是关闭的,第一次开启 AOF 后,并重启服务生效后,会因为 AOF 的优先级高于 RDB,而 AOF 默认没有文件存在,从而导致所有数据丢失。

AOF rewrite 重写

将一些重复的,可以合并的,过期的数据重新写入一个新的 AOF 文件,从而节约 AOF 备份占用的硬盘空间,也能加速恢复过程;可以手动执行 bgrewriteaof 触发 AOF,或定义自动 rewrite 策略。

AOF rewrite 过程

AOF 模式优缺点

优点

  • 数据安全性相对较高,根据所使用的 fsync 策略(fsync 是同步内存中 redis 所有已经修改的文件到存储设备),默认是 appendfsync everysec,即每秒执行一次 fsync,在这种配置下,Redis 仍然可以保持良好的性能,并且就算发生故障停机,也最多只会丢失一秒钟的数据( fsync 会在后台线程执行,所以主线程可以继续努力地处理命令请求)

  • 由于该机制对日志文件的写入操作采用的是 append 模式,因此在写入过程中不需要 seek, 即使出现宕机现象,也不会破坏日志文件中已经存在的内容。然而如果本次操作只是写入了一半数据就出现了系统崩溃问题,不用担心,在 Redis 下一次启动之前,可以通过 redis-check-aof 工具来解决数据一致性的问题

  • Redis 可以在 AOF 文件体积变得过大时,自动地在后台对 AOF 进行重写,重写后的新 AOF 文件包含了恢复当前数据集所需的最小命令集合。整个重写操作是绝对安全的,因为 Redis 在创建新 AOF 文件的过程中,append 模式不断的将修改数据追加到现有的 AOF 文件里面,即使重写过程中发生停机,现有的 AOF 文件也不会丢失。而一旦新 AOF 文件创建完毕,Redis 就会从旧 AOF 文件切换到新 AOF 文件,并开始对新 AOF 文件进行追加操作。

  • AOF 包含一个格式清晰、易于理解的日志文件用于记录所有的修改操作。事实上,也可以通过该文件完成数据的重建

    AOF 文件有序地保存了对数据库执行的所有写入操作,这些写入操作以 Redis 协议的格式保存,因此 AOF 文件的内容非常容易被人读懂,对文件进行分析(parse)也很轻松。导出(export)AOF 文件也非常简单:举个例子,如果不小心执行了 FLUSHALL.命令,但只要 AOF 文件未被重写,那么只要停止服务器,移除 AOF 文件末尾的 FLUSHAL 命令,并重启 Redis ,就可以将数据集恢复到 FLUSHALL 执行之前的状态。

缺点

  • 即使有些操作是重复的也会全部记录,AOF 的文件大小要大于 RDB 格式的文件
  • AOF 在恢复大数据集时的速度比 RDB 的恢复速度要慢
  • 根据 fsync 策略不同,AOF 速度可能会慢于 RDB
  • bug 出现的可能性更多

RDB 和 AOF 适用场景

  • 如果主要充当缓存功能,或者可以承受数分钟数据的丢失, 通常生产环境一般只需启用 RDB 即可,此也是默认值
  • 如果数据需要持久保存,一点不能丢失,可以选择同时开启 RDB 和 AOF
  • 一般不建议只开启 AOF

三、实现 redis 哨兵,模拟 master 故障场景

1. 工作原理

2. 实现哨兵(sentinel)模式

Sentinel
10.0.0.7
master
Sentinel
10.0.0.17
slave1
Sentinel
10.0.0.27
slave2

1)配置一主两从

一键编译 redis 安装脚本

#!/bin/bash
# 编译安装Redis
source /etc/init.d/functions
#Redis版本
Redis_version=redis-5.0.9
suffix=tar.gz
Redis=${Redis_version}.${suffix}
Password=123456
#redis源码下载地址
redis_url=http://download.redis.io/releases/${Redis}
#redis安装路径
redis_install_DIR=/apps/redis
# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
color () {
if [[ $2 -eq 0 ]];then
echo -e "\e[1;32m$1\t\t\t\t\t\t[ OK ]\e[0;m"
else
echo $2
echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m"
fi
}
download_redis (){
# 安装依赖包
yum -y install gcc jemalloc-devel || { color "安装依赖包失败,请检查网络" 1 ;exit 1;}
cd /opt
if [ -e ${Redis} ];then
color "Redis源码包已存在" 0
else
color "开始下载Redis源码包" 0
wget ${redis_url}
if [ $? -ne 0 ];then
color "下载Redis源码包失败,退出!" 1
exit 1
fi
fi
}
install_redis (){
# 解压源码包
tar xvf /opt/${Redis} -C /usr/local/src
ln -s /usr/local/src/${Redis_version} /usr/local/src/redis
# 编译安装
cd /usr/local/src/redis
make -j ${CPUS} install PREFIX=${redis_install_DIR}
if [ $? -ne 0 ];then
color "redis 编译安装失败!" 1
exit 1
else
color "redis编译安装成功" 0
fi
ln -s ${redis_install_DIR}/bin/redis-* /usr/sbin/
# 添加用户
if id redis &> /dev/null;then
color "redis用户已存在" 1
else
useradd -r -s /sbin/nologin redis
color "redis用户已创建完成" 0
fi
mkdir -p ${redis_install_DIR}/{etc,log,data,run}
#准备redis配置文件
cp redis.conf ${redis_install_DIR}/etc/
sed -i "s/bind 127.0.0.1/bind 0.0.0.0/" ${redis_install_DIR}/etc/redis.conf
sed -i "/# requirepass/a requirepass ${Password}" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^dir .*\$@dir ${redis_install_DIR}\/data@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^logfile .*\$@logfile ${redis_install_DIR}\/log\/redis-6379.log@" ${redis_install_DIR}/etc/redis.conf
sed -i "s@^pidfile .*\$@pidfile ${redis_install_DIR}\/run\/redis-6379.pid@" ${redis_install_DIR}/etc/redis.conf
chown -R redis:redis ${redis_install_DIR}
cat >> /etc/sysctl.conf <<EOF
net.core.somaxconn = 1024
vm.overcommit_memory = 1
EOF
sysctl -p
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.d/rc.local
chmod +x /etc/rc.d/rc.local
source /etc/rc.d/rc.local
# 准备service服务
cat > /usr/lib/systemd/system/redis.service <<EOF
[Unit]
Description=redis persistent key-value database
After=network.target
[Service]
ExecStart=${redis_install_DIR}/bin/redis-server ${redis_install_DIR}/etc/redis.conf --supervised systemd
ExecStop=/bin/kill -s QUIT \$MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target
EOF
chown -R redis:redis ${redis_install_DIR}
systemctl daemon-reload
systemctl enable --now redis
systemctl is-active redis
if [ $? -ne 0 ];then
color "redis服务启动失败!" 1
exit 1
else
color "redis服务启动成功" 0
color "redis安装已完成" 0
fi
}
download_redis
install_redis
exit 0
  1. master 节点配置

    #修改redis.conf配置
    vim /apps/redis/etc/redis.conf
    bind 0.0.0.0
    masterauth "123456"
    requirepass "123456"
    #重启redis
    systemctl restart redis
  2. slave 节点配置

    #修改redis.conf配置
    vim /apps/redis/etc/redis.conf
    bind 0.0.0.0
    masterauth "123456"
    requirepass "123456"
    replicaof 10.0.0.7 6379
    #重启redis
    systemctl restart redis
  3. 状态查看

    master

    [root@master ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:2
    slave0:ip=10.0.0.27,port=6379,state=online,offset=28,lag=1
    slave1:ip=10.0.0.17,port=6379,state=online,offset=28,lag=1
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:28
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:28
    127.0.0.1:6379>

    slave1

    [root@slave1 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.7
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:9
    master_sync_in_progress:0
    slave_repl_offset:154
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:154
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:154
    127.0.0.1:6379>

    slave2

    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.7
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:5
    master_sync_in_progress:0
    slave_repl_offset:210
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:14883e4254918d97c50ec0f05c6b7b741e09cc59
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:210
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:210
    127.0.0.1:6379>

2)编辑哨兵配置文件

Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口。

哨兵可以不和Redis服务器部署在一起,但一般部署在一起。

a)配置sentinel文件

cp /usr/local/src/redis/sentinel.conf /apps/redis/etc/redis-sentinel.conf
cd /apps/redis/etc/
#配置sentinel
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 3000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
#启动sentinel
[root@master etc]# redis-sentinel /apps/redis/etc/redis-sentinel.conf
#查看sentinel配置信息
[root@master etc]# grep "^[a-Z]" redis-sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile /apps/redis/run/redis-sentinel.pid
logfile /apps/redis/log/sentinel_26379.log
dir /apps/redis/data
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 10.0.0.7 6379 2
sentinel parallel-syncs mymaster 1
sentinel down-after-milliseconds mymaster 3000
sentinel auth-pass mymaster 123456
sentinel config-epoch mymaster 0
#以下内容为自动生成
sentinel myid c663d4b9db845d721cd6dccf608c7904d896b745 #myid必须唯一
protected-mode no
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.0.27 6379
sentinel known-replica mymaster 10.0.0.17 6379
sentinel known-sentinel mymaster 10.0.0.27 26379 66f276f274802c6f0243007a2be4b04001b9867e
sentinel known-sentinel mymaster 10.0.0.17 26379 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
sentinel current-epoch 0

b)配置sentinel服务

[root@shichu ~]# cat /lib/systemd/system/redis-sentinel.service
[Unit]
Description=Redis Sentinel
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=/apps/redis/bin/redis-sentinel /apps/redis/etc/redis-sentinel.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
[Install]
WantedBy=multi-user.target

c)启动sentinel服务

chown -R redis:redis /apps/redis
systemctl daemon-reload
systemctl enable --now redis-sentinel

d)sentinel配置参数说明

sentinel monitor mymaster 10.0.0.8 6379 2 # 指定当前mymaster集群中master服务器的地址和端口,2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456 #mymaster集群中master的密码,注意此行要在上面行的下面

sentinel down-after-milliseconds mymaster 30000 #(SDOWN)判断mymaster集群中所有节点的主观下线的时间,单位:毫秒,建议3000

sentinel parallel-syncs mymaster 1 #发生故障转移后,同时向新master同步数据的slave数量,数字越小总同步时间越长,但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000 #所有slaves指向新的master所需的超时时间,单位:毫秒

sentinel deny-scripts-reconfig yes #禁止修改脚本

e)查看端口

[root@master etc]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:26379 *:*
LISTEN 0 511 *:6379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 [::1]:25 [::]:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22

f)查看sentinel日志

master日志

[root@master redis]# tail /apps/redis/log/sentinel_26379.log
1491:X 11 Jul 2022 16:38:43.636 * supervised by systemd, will signal readiness
1491:X 11 Jul 2022 16:38:43.637 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1491:X 11 Jul 2022 16:38:43.637 * Running mode=sentinel, port=26379.
1491:X 11 Jul 2022 16:38:43.638 # Sentinel ID is c663d4b9db845d721cd6dccf608c7904d896b745
1491:X 11 Jul 2022 16:38:43.638 # +monitor master mymaster 10.0.0.7 6379 quorum 2
1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:38:46.640 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:39:20.763 # -sdown sentinel 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac 10.0.0.17 26379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 16:39:48.855 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379

slave1日志

[root@slave1 ~]# tail /apps/redis/log/sentinel_26379.log
1293:X 11 Jul 2022 16:39:19.722 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1293, just started
1293:X 11 Jul 2022 16:39:19.722 # Configuration loaded
1293:X 11 Jul 2022 16:39:19.722 * supervised by systemd, will signal readiness
1293:X 11 Jul 2022 16:39:19.723 * Increased maximum number of open files to 4096 (it was originally set to 1024).
1293:X 11 Jul 2022 16:39:19.724 * Running mode=sentinel, port=26379.
1293:X 11 Jul 2022 16:39:19.724 # Sentinel ID is 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac
1293:X 11 Jul 2022 16:39:19.724 # +monitor master mymaster 10.0.0.7 6379 quorum 2
1293:X 11 Jul 2022 16:39:22.777 # +sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379
1293:X 11 Jul 2022 16:39:48.988 # -sdown sentinel 66f276f274802c6f0243007a2be4b04001b9867e 10.0.0.27 26379 @ mymaster 10.0.0.7 6379

slave2日志

[root@slave2 ~]# tail /apps/redis/log/sentinel_26379.log
900:X 11 Jul 2022 16:32:23.322 # +sdown sentinel 605f713c7e6554ae0bfed0b98304e29d6a69e678 10.0.0.37 26379 @ mymaster 10.0.0.7 6379
1256:X 11 Jul 2022 16:39:48.523 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1256:X 11 Jul 2022 16:39:48.523 # Redis version=5.0.9, bits=64, commit=00000000, modified=0, pid=1256, just started
1256:X 11 Jul 2022 16:39:48.523 # Configuration loaded
1256:X 11 Jul 2022 16:39:48.523 * supervised by systemd, will signal readiness
1256:X 11 Jul 2022 16:39:48.524 * Increased maximum number of open files to 4096 (it was originally set to 1024).
1256:X 11 Jul 2022 16:39:48.525 * Running mode=sentinel, port=26379.
1256:X 11 Jul 2022 16:39:48.525 # Sentinel ID is 66f276f274802c6f0243007a2be4b04001b9867e
1256:X 11 Jul 2022 16:39:48.525 # +monitor master mymaster 10.0.0.7 6379 quorum 2

g)查看sentinel状态

[root@master redis]# redis-cli -a 123456 -p 26379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:26379> info sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.7:6379,slaves=2,sentinels=3
#两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突

3. 模拟故障转移

  • 停止master redis
[root@master etc]# systemctl stop redis
[root@master etc]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:26379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 [::1]:25 [::]:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22
  • 故障转移时sentinel信息
[root@master redis]# tail -f /apps/redis/log/sentinel_26379.log
1491:X 11 Jul 2022 17:07:16.959 # +sdown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.044 # +odown master mymaster 10.0.0.7 6379 #quorum 2/2
1491:X 11 Jul 2022 17:07:17.044 # +new-epoch 4
1491:X 11 Jul 2022 17:07:17.044 # +try-failover master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.045 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.048 # 5d3a6880bd134e211c77bef6bc408ab63a1fd3ac voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.050 # 66f276f274802c6f0243007a2be4b04001b9867e voted for c663d4b9db845d721cd6dccf608c7904d896b745 4
1491:X 11 Jul 2022 17:07:17.102 # +elected-leader master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.102 # +failover-state-select-slave master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 # +selected-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.205 * +failover-state-send-slaveof-noone slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:17.269 * +failover-state-wait-promotion slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +promoted-slave slave 10.0.0.27:6379 10.0.0.27 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.078 # +failover-state-reconf-slaves master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:18.145 * +slave-reconf-sent slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-inprog slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.144 * +slave-reconf-done slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # -odown master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +failover-end master mymaster 10.0.0.7 6379
1491:X 11 Jul 2022 17:07:19.228 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379 #可看出master节点已转移到10.0.0.27上
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:19.229 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
1491:X 11 Jul 2022 17:07:22.276 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379

日志参数说明

+reset-master :主服务器已被重置。
+slave :一个新的从服务器已经被 Sentinel 识别并关联。
+failover-state-reconf-slaves :故障转移状态切换到了 reconf-slaves 状态。
+failover-detected :另一个 Sentinel 开始了一次故障转移操作,或者一个从服务器转换成了主服务器。
+slave-reconf-sent :领头(leader)的 Sentinel 向实例发送了 SLAVEOF 命令,为实例设置新的主服务器。
+slave-reconf-inprog :实例正在将自己设置为指定主服务器的从服务器,但相应的同步过程仍未完成。
+slave-reconf-done :从服务器已经成功完成对新主服务器的同步。
-dup-sentinel :对给定主服务器进行监视的一个或多个 Sentinel 已经因为重复出现而被移除 —— 当 Sentinel 实例重启的时候,就会出现这种情况。
+sentinel :一个监视给定主服务器的新 Sentinel 已经被识别并添加。
+sdown :给定的实例现在处于主观下线状态。
-sdown :给定的实例已经不再处于主观下线状态。
+odown :给定的实例现在处于客观下线状态。
-odown :给定的实例已经不再处于客观下线状态。
+new-epoch :当前的纪元(epoch)已经被更新。
+try-failover :一个新的故障迁移操作正在执行中,等待被大多数 Sentinel 选中(waiting to be elected by the majority)。
+elected-leader :赢得指定纪元的选举,可以进行故障迁移操作了。
+failover-state-select-slave :故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器。
no-good-slave :Sentinel 操作未能找到适合进行升级的从服务器。Sentinel 会在一段时间之后再次尝试寻找合适的从服务器来进行升级,又或者直接放弃执行故障转移操作。
selected-slave :Sentinel 顺利找到适合进行升级的从服务器。
failover-state-send-slaveof-noone :Sentinel 正在将指定的从服务器升级为主服务器,等待升级功能完成。
failover-end-for-timeout :故障转移因为超时而中止,不过最终所有从服务器都会开始复制新的主服务器(slaves will eventually be configured to replicate with the new master anyway)。
failover-end :故障转移操作顺利完成。所有从服务器都开始复制新的主服务器了。
+switch-master :配置变更,主服务器的 IP 和地址已经改变。 这是绝大多数外部用户都关心的信息。
+tilt :进入 tilt 模式。
-tilt :退出 tilt 模式。

  • 故障转移后

    redis配置文件中replicaof的master IP自动修改

    [root@slave1 ~]# grep "^replicaof" /apps/redis/etc/redis.conf
    replicaof 10.0.0.27 6379

    sentinel配置文件的sentinel monitor IP自动修改

    [root@slave1 ~]# grep "^sentinel monitor" /apps/redis/etc/redis-sentinel.conf
    sentinel monitor mymaster 10.0.0.27 6379 2
  • redis状态

    新master状态

    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:1
    slave0:ip=10.0.0.17,port=6379,state=online,offset=4290787,lag=1
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4290787
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3242212
    repl_backlog_histlen:1048576
    127.0.0.1:6379>

    另一个slave指向新的master

    [root@slave1 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.27
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:0
    master_sync_in_progress:0
    slave_repl_offset:4296387
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4296387
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3247812
    repl_backlog_histlen:1048576
    127.0.0.1:6379>
  • 恢复原故障master重新加入redis集群

    [root@master redis]# systemctl start redis

    原master状态

    #redis配置指向新的master节点
    [root@master redis]# grep "^replicaof" /apps/redis/etc/redis.conf
    replicaof 10.0.0.27 6379
    #查看redis状态
    [root@master redis]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:slave
    master_host:10.0.0.27
    master_port:6379
    master_link_status:up
    master_last_io_seconds_ago:0
    master_sync_in_progress:0
    slave_repl_offset:4366815
    slave_priority:100
    slave_read_only:1
    connected_slaves:0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:4366815
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:4343555
    repl_backlog_histlen:23261
    #查看sentinel状态
    [root@master redis]# redis-cli -a 123456 -p 26379
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:26379> info sentinel
    # Sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=mymaster,status=ok,address=10.0.0.27:6379,slaves=2,sentinels=3

    新master状态

    #redis状态
    [root@slave2 ~]# redis-cli -a 123456
    Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:2
    slave0:ip=10.0.0.17,port=6379,state=online,offset=4407027,lag=0
    slave1:ip=10.0.0.7,port=6379,state=online,offset=4407160,lag=0
    master_replid:590248f1058be0774dab136e8fb18a8e5b5e4078
    master_replid2:90a5507845cbc6319a7f704ec666b28aa7e9b5ff
    master_repl_offset:4407293
    second_repl_offset:3910006
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:3358718
    repl_backlog_histlen:1048576
    #sentinel日志
    [root@slave2 ~]# tail -f /apps/redis/log/sentinel_26379.log
    1256:X 11 Jul 2022 17:07:17.049 # +new-epoch 4
    1256:X 11 Jul 2022 17:07:17.052 # +vote-for-leader c663d4b9db845d721cd6dccf608c7904d896b745 4
    1256:X 11 Jul 2022 17:07:17.068 # +odown master mymaster 10.0.0.7 6379 #quorum 3/2
    1256:X 11 Jul 2022 17:07:17.068 # Next failover delay: I will not start a failover before Mon Jul 11 17:13:17 2022
    1256:X 11 Jul 2022 17:07:18.149 # +config-update-from sentinel c663d4b9db845d721cd6dccf608c7904d896b745 10.0.0.7 26379 @ mymaster 10.0.0.7 6379
    1256:X 11 Jul 2022 17:07:18.149 # +switch-master mymaster 10.0.0.7 6379 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.17:6379 10.0.0.17 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:18.149 * +slave slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:07:21.189 # +sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
    1256:X 11 Jul 2022 17:43:54.361 # -sdown slave 10.0.0.7:6379 10.0.0.7 6379 @ mymaster 10.0.0.27 6379
  • sentinel运维

    手动让主节点下线

    sentinel failover <masterName>

    范例

    #可指定优先级,值越小sentinel会优先将之选为新的master,默为值为100
    [root@slave1 ~]# grep 'replica-priority' /apps/redis/etc/redis.conf
    replica-priority 30
    [root@slave1 ~]# redis-cli -a 123456 -p 26379
    127.0.0.1:26379> sentinel failover mymaster
    OK
    127.0.0.1:26379> info sentinel
    # Sentinel
    sentinel_masters:1
    sentinel_tilt:0
    sentinel_running_scripts:0
    sentinel_scripts_queue_length:0
    sentinel_simulate_failure_flags:0
    master0:name=mymaster,status=ok,address=10.0.0.17:6379,slaves=2,sentinels=3

四、简述 redis 集群的实现原理

Redis Cluster特点

  • 所有Redis节点使用(PING机制)互联
  • 集群中某个节点的是否失效,是由整个集群中超过半数的节点监测都失效,才能算真正的失效
  • 客户端不需要proxy即可直接连接redis,应用程序中需要配置有全部的redis服务器IP
  • redis cluster把所有的redis node 平均映射到 0-16383个槽位(slot)上,读写需要到指定的redis node上进行操作,因此有多少个redis node相当于redis 并发扩展了多少倍,每个redis node 承担16384/N个槽位
  • Redis cluster预先分配16384个(slot)槽位,当需要在redis集群中写入一个key -value的时候,会使用CRC16(key) mod 16384之后的值,决定将key写入值哪一个槽位从而决定写入哪一个Redis节点上,从而有效解决单机瓶颈。

Redis cluster 架构

五、基于 redis5 的 redis cluster 部署

官方文档:https://redis.io/topics/cluster-tutorial

创建Redis Cluster准备条件

  • 每个redis 节点采用相同的硬件配置、相同的密码、相同的redis版本
  • 所有redis服务器必须没有任何数据
  • 准备6台机器,三主三从架构

#集群节点
Redis-node1:10.0.0.7
Redis-node2:10.0.0.17
Redis-node3:10.0.0.27
Redis-node4: 10.0.0.37
Redis-node5: 10.0.0.47
Redis-node6: 10.0.0.57
#预留节点
10.0.0.67
10.0.0.77

部署redis cluster

1. 安装redis

修改redis配置

[root@node1 etc]# cat redis.conf
...
bind 0.0.0.0
masterauth 123456 #建议配置,否则后期的master和slave主从复制无法成功,还需再配置
requirepass 123456
cluster-enabled yes #取消此行注释,必须开启集群,开启后redis 进程会有cluster显示
cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护
cluster-require-full-coverage no #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能
...
[root@node1 etc]#systemctl enable --now redis

2. 查看当前redis状态

#查看端口
[root@node1 ~]# ss -ntl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 511 *:6379 *:*
LISTEN 0 128 *:111 *:*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 511 *:16379 *:*
LISTEN 0 128 [::]:111 [::]:*
LISTEN 0 128 [::]:22 [::]:*
LISTEN 0 100 [::1]:25 [::]:*
#查看进程有[cluster]状态
[root@node1 ~]# ps aux|grep redis
redis 24754 0.2 0.3 153996 3172 ? Ssl 21:28 0:02 /apps/redis/bin/redis-server 0.0.0.0:6379 [cluster]
root 24822 0.0 0.0 112812 980 pts/0 R+ 21:44 0:00 grep --color=auto redis

3. 创建集群

[root@node1 ~]# redis-cli -a 123456 --cluster create 10.0.0.7:6379 10.0.0.17:6379 10.0.0.27:6379 10.0.0.37:6379 \
10.0.0.47:6379 10.0.0.57:6379 --cluster-replicas 1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.47:6379 to 10.0.0.7:6379
Adding replica 10.0.0.57:6379 to 10.0.0.17:6379
Adding replica 10.0.0.37:6379 to 10.0.0.27:6379
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379 #带M的为master
slots:[0-5460] (5461 slots) master #当前master的槽位起始和结束位
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379 #带S的slave
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
replicates 12fdc235442ed40a838e77b246025799b4b3357b
Can I set the above configuration? (type 'yes' to accept): yes #输入yes自动创建集群
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.7:6379)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
slots:[0-5460] (5461 slots) master #已经分配的槽位
1 additional replica(s) #分配了一个slave
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
slots: (0 slots) slave #slave没有分配槽位
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 #对应的master的10.0.0.27的ID
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
slots: (0 slots) slave
replicates 12fdc235442ed40a838e77b246025799b4b3357b #对应的master的10.0.0.17的ID
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
slots: (0 slots) slave
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec #对应的master的10.0.0.7的ID
[OK] All nodes agree about slots configuration. #所有节点槽位分配完成
>>> Check for open slots... #检查打开的槽位
>>> Check slots coverage... #检查插槽覆盖范围
[OK] All 16384 slots covered. #所有槽位(16384个)分配完成
[root@node1 ~]#

观察以上结果,可以看到3组master/slave

master:10.0.0.7-->slave:10.0.0.47
master:10.0.0.17-->slave:10.0.0.57
master:10.0.0.27-->slave:10.0.0.37

4. 查看主从状态

node1(10.0.0.7)

[root@node1 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.47,port=6379,state=online,offset=1008,lag=1
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node2(10.0.0.17)

[root@node2 etc]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.57,port=6379,state=online,offset=1008,lag=0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node3(10.0.0.27)

[root@node3 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.37,port=6379,state=online,offset=1008,lag=0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node4(10.0.0.37)

[root@node4 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.27
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:826e716b92aa4e287013a33f9786e529be2fff71
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node5(10.0.0.47)

[root@node5 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.7
master_port:6379
master_link_status:up
master_last_io_seconds_ago:4
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:3493f56b2f698cea41c90cb0a41e1562b5821636
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

node6(10.0.0.57)

[root@node6 ~]# redis-cli -a 123456 -c info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.17
master_port:6379
master_link_status:up
master_last_io_seconds_ago:10
master_sync_in_progress:0
slave_repl_offset:1008
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:269568d06cb92748f583d6ea900e7563b1739f54
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1008
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1008

查看指定master节点的slave节点信息

#获取所有节点信息
[root@node1 ~]# redis-cli -a 123456 cluster nodes 2>/dev/null
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554345797 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657554345000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657554343746 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657554344770 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657554344000 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657554344000 5 connected
#查看master节点ID对应的slave节点信息,16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7为10.0.0.27 master节点ID
[root@node1 ~]# redis-cli -a 123456 cluster slaves 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 2>/dev/null
1) "59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657554778157 4 connected"

5. 验证集群状态

[root@node1 ~]# redis-cli -a 123456 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6 #6个节点
cluster_size:3 #3组集群
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3639
cluster_stats_messages_pong_sent:3625
cluster_stats_messages_sent:7264
cluster_stats_messages_ping_received:3620
cluster_stats_messages_pong_received:3639
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:7264
#查看任意节点的集群状态
[root@node1 ~]# redis-cli -a 123456 --cluster info 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

查看集群node对应关系

#获取集群中所有节点
[root@node1 ~]# redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379@16379 slave 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 0 1657556036000 4 connected
4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379@16379 myself,master - 0 1657556036000 1 connected 0-5460
12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379@16379 master - 0 1657556036033 2 connected 5461-10922
8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379@16379 slave 12fdc235442ed40a838e77b246025799b4b3357b 0 1657556038079 6 connected
16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379@16379 master - 0 1657556037057 3 connected 10923-16383
15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379@16379 slave 4ccee0bb38763061cf567995bcdd9289cea9cfec 0 1657556036000 5 connected
[root@node1 ~]# redis-cli -a 123456 --cluster check 10.0.0.27:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.27:6379 (16bb6630...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.17:6379 (12fdc235...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.7:6379 (4ccee0bb...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.27:6379)
M: 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7 10.0.0.27:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 59eac16e6e2992cdfffe97934d7409afe21d2a9a 10.0.0.37:6379
slots: (0 slots) slave
replicates 16bb6630a6a09bd4b24d7a203ecaa38b9a4360a7
S: 8c3b8146ce75ab277958937d4e79e893a15c50e2 10.0.0.57:6379
slots: (0 slots) slave
replicates 12fdc235442ed40a838e77b246025799b4b3357b
M: 12fdc235442ed40a838e77b246025799b4b3357b 10.0.0.17:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
M: 4ccee0bb38763061cf567995bcdd9289cea9cfec 10.0.0.7:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: 15e2e2eccefd453f1a154fc42c6a9b030acacfb2 10.0.0.47:6379
slots: (0 slots) slave
replicates 4ccee0bb38763061cf567995bcdd9289cea9cfec
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

验证集群写入

#连接节点,可能会出现槽位不在当前node所以无法写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7
10.0.0.7:6379> set key1 v1
(error) MOVED 9189 10.0.0.17:6379
#需要连接指定node,才可写入
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.17
10.0.0.17:6379> set key1 values1
OK
10.0.0.17:6379> get key1
"values1"
#使用选项-c以集群方式连接,连接至集群中任意一节点均可
[root@shichu ~]# redis-cli -a 123456 -h 10.0.0.7 -c
10.0.0.7:6379> set key1 v1
-> Redirected to slot [9189] located at 10.0.0.17:6379
OK
10.0.0.17:6379> get key1
"v1"

六、部署 Zabbix 监控

官网下载地址:https://www.zabbix.com/cn/download

官网文档:https://www.zabbix.com/manuals

https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz

使用LNMP编译安装Zabbix 5

L:Linux(CentOS7)https://mirrors.aliyun.com/centos/7/isos/x86_64/
N:Nginx(1.18.0) https://nginx.org/en/download.html
M:MySQL(8.0.19) https://dev.mysql.com/downloads/mysql/
P:PHP(7.4.11) http://php.net/downloads.php
Zabbix (5.0.25) https://cdn.zabbix.com/zabbix/sources/
Client
Linux
Nginx
PHP
Zabbix
10.0.0.100
Linux
MySQL
10.0.0.200

1. 安装MySQL

参考:

安装完成后创建zabbix用户

mysql -uroot -p123456 -e "create database zabbix character set utf8 collate utf8_bin;"
mysql -uroot -p123456 -e "create user zabbix@'10.0.0.%' identified by '123456'"
mysql -uroot -p123456 -e "grant all privileges on zabbix.* to zabbix@'10.0.0.%'"
mysql -uroot -p123456 -e "use mysql;\
alter user zabbix@'10.0.0.%' identified with mysql_native_password by '123456';\
flush privileges;"

2. 安装Nginx

参考:[]https://www.cnblogs.com/areke/p/16482954.html

3. 安装PHP

参考:[]https://www.cnblogs.com/areke/p/16482958.html

4. 安装Zabbix

安装zabbix_server

#!/bin/bash
# 编译安装Zabbix
source /etc/init.d/functions
#Zabbix版本
Zabbix_Version=zabbix-5.0.25
Suffix=tar.gz
Zabbix=${Zabbix_Version}.${Suffix}
Password=123456
#Zabbix源码下载地址
Zabbix_url=https://cdn.zabbix.com/zabbix/sources/stable/5.0/zabbix-5.0.25.tar.gz
#Zabbix安装路径
Zabbix_install_DIR=/apps/zabbix
# CPU数量
CPUS=`lscpu|grep "^CPU(s)"|awk '{print $2}'`
# 系统类型
os_type=`grep "^NAME" /etc/os-release |awk -F'"| ' '{print $2}'`
# 系统版本号
os_version=`awk -F'"' '/^VERSION_ID/{print $2}' /etc/os-release`
color () {
if [[ $2 -eq 0 ]];then
echo -e "\e[1;32m$1\t\t\t\t\t\t[ OK ]\e[0;m"
else
echo $2
echo -e "\e[1;31m$1\t\t\t\t\t\t[ FAILED ]\e[0;m"
fi
}
install_Zabbix (){
#----------------------------下载源码包-----------------------------
cd /opt
if [ -e ${Zabbix} ];then
color "Zabbix源码包已存在" 0
else
color "开始下载Zabbix源码包" 0
wget ${Zabbix_url}
if [ $? -ne 0 ];then
color "下载Zabbix源码包失败,退出!" 1
exit 1
fi
fi
#----------------------------解压源码包-----------------------------
color "开始解压源码包" 0
tar -zxvf /opt/${Zabbix} -C /usr/local/src
ln -s /usr/local/src/${Zabbix_Version} /usr/local/src/zabbix
#----------------------------安装依赖包--------------------------------
color "开始安装依赖包" 0
#wget https://dev.mysql.com/get/mysql80-community-release-el7-6.noarch.rpm
yum install -y gcc libxml2-devel net-snmp net-snmp-devel curl curl-devel php-gd php-bcmath php-xml \
php-mbstring mariadb mariadb-devel OpenIPMI-devel libevent-devel java-1.8.0-openjdk-devel \
|| { color "安装依赖包失败,请检查网络" 1 ;exit 1;}
#---------------------------创建Zabbix用户---------------------------
if id zabbix &> /dev/null ;then
color "Zabbix用户已存在" 1
else
groupadd --system zabbix
useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix
color "Zabbix用户已创建完成" 0
fi
#---------------------------编译---------------------------
color "开始编译zabbix" 0
cd /usr/local/src/zabbix
./configure --prefix=${Zabbix_install_DIR} \
--enable-server \
--enable-agent \
--with-mysql \
--with-net-snmp \
--with-libcurl \
--with-libxml2 \
--with-openipmi \
--enable-proxy \
--enable-java
make -j ${CPUS} install
if [ $? -ne 0 ];then
color "Zabbix 编译安装失败!" 1
exit 1
else
color "Zabbix编译安装成功" 0
fi
#复制web界面相关文件
mkdir -pv /home/nginx/zabbix
cp -rf /usr/local/src/zabbix/ui/* /home/nginx/zabbix/
chown nginx:nginx -R /home/nginx/zabbix
/apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
if [ $? -eq 0 ];then
color "zabbix_server测试能正常启动" 0
pkill zabbix
fi
color "zabbix安装完成" 0
}
install_Zabbix
exit 0

修改配置文件

  1. 修改/apps/nginx/conf/nginx.conf配置文件

    worker_processes 1;
    pid logs/nginx.pid;
    events {
    worker_connections 1024;
    }
    http {
    include mime.types;
    default_type application/octet-stream;
    sendfile on;
    keepalive_timeout 65;
    server {
    listen 80;
    server_name 10.0.0.100; #指定主机名
    server_tokens off; #隐藏nginx版本信息
    location / {
    root /home/nginx/zabbix; #指定数据目录
    index index.php index.html index.htm; #指定默认主页
    }
    error_page 500 502 503 504 /50x.html;
    location = /50x.html {
    root html;
    }
    location ~ \.php$ { #实现php-fpm
    root /home/nginx/zabbix;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    include fastcgi_params;
    fastcgi_hide_header X-Powered-By; #隐藏php版本信息
    }
    location ~ ^/(ping|pm_status)$ { #实现状态页
    include fastcgi_params;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_param PATH_TRANSLATED $document_root$fastcgi_script_name;
    }
    }
    }
  2. 修改php配置文件

    #修改/etc/php.ini
    sed -i -e "/memory_limit/c memory_limit = 256M" \
    -e "/post_max_size/c post_max_size = 30M" \
    -e "/upload_max_filesize/c upload_max_filesize = 20M" \
    -e "/max_execution_time/c max_execution_time = 300" \
    -e "/max_input_time/c max_input_time = 300" \
    -e "/;date.timezone/c date.timezone = Asia/Shanghai" \
    /etc/php.ini
    #修改/apps/php/etc/php-fpm.d/www.conf
    sed -i -e "/user = www/c user = nginx" \
    -e "/group = www/c group = nginx" /apps/php/etc/php-fpm.d/www.conf

    重启服务

    systemctl restart nginx php-fpm
  3. 导入mysql数据

    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/schema.sql
    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/images.sql
    mysql -uzabbix -p123456 -h10.0.0.200 zabbix < /usr/local/src/zabbix/database/mysql/data.sql
  4. 修改zabbix配置文件

    sed -i "/# DBHost=localhost/aDBHost=10.0.0.200" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/# DBPassword=/aDBPassword=123456" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/# DBPort=/aDBPort=3306" /apps/zabbix/etc/zabbix_server.conf
    sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /apps/zabbix/etc/zabbix_server.conf
  5. 设置zabbix_server启动服务脚本

    cat /lib/systemd/system/zabbix-server.service

    [Unit]
    Description=Zabbix Server
    After=syslog.target
    After=network.target
    [Service]
    Environment="CONFFILE=/apps/zabbix/etc/zabbix_server.conf"
    EnvironmentFile=-/etc/default/zabbix-server
    Type=forking
    Restart=on-failure
    PIDFile=/tmp/zabbix_server.pid
    KillMode=control-group
    ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE
    ExecStop=/bin/kill -SIGTERM $MAINPID
    RestartSec=10s
    TimeoutStopSec=5
    [Install]
    WantedBy=multi-user.target

    启动服务

    systemctl daemon-reload
    systemctl enable --now zabbix-server
  6. 设置zabbix_agent启动服务脚本

    cat /lib/systemd/system/zabbix-agent.service

    [Unit]
    Description=Zabbix Agent
    After=syslog.target
    After=network.target
    [Service]
    Environment="CONFFILE=/apps/zabbix/etc/zabbix_agentd.conf"
    EnvironmentFile=-/etc/default/zabbix-agent
    Type=forking
    Restart=on-failure
    PIDFile=/tmp/zabbix_agentd.pid
    KillMode=control-group
    ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE
    ExecStop=/bin/kill -SIGTERM $MAINPID
    RestartSec=10s
    User=zabbix
    Group=zabbix
    [Install]
    WantedBy=multi-user.target

    启动服务

    systemctl daemon-reload
    systemctl enable --now zabbix-agent
  7. 查看状态

    • 10050、10051端口启动正常
    #可看到10050(agent)、10051(server)端口
    [root@shichu apps]# ss -ntl
    State Recv-Q Send-Q Local Address:Port Peer Address:Port
    LISTEN 0 128 *:22 *:*
    LISTEN 0 100 127.0.0.1:25 *:*
    LISTEN 0 128 *:10050 *:*
    LISTEN 0 128 *:10051 *:*
    LISTEN 0 128 127.0.0.1:9000 *:*
    LISTEN 0 128 *:111 *:*
    LISTEN 0 128 *:80 *:*
    LISTEN 0 128 [::]:22 [::]:*
    LISTEN 0 100 [::1]:25 [::]:*
    LISTEN 0 128 [::]:111 [::]:*
    • zabbix-sever服务状态
    [root@shichu apps]# systemctl status zabbix-server
    ● zabbix-server.service - Zabbix Server
    Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; disabled; vendor preset: disabled)
    Active: active (running) since Thu 2022-07-14 00:47:09 CST; 52s ago
    Process: 8346 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS)
    Process: 8352 ExecStart=/apps/zabbix/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
    Main PID: 8360 (zabbix_server)
    CGroup: /system.slice/zabbix-server.service
    ├─8360 /apps/zabbix/sbin/zabbix_server -c /apps/zabbix/etc/zabbix_server.conf
    ├─8362 /apps/zabbix/sbin/zabbix_server: configuration syncer [synced configuration in 0.059399 sec, idle 6...
    ├─8363 /apps/zabbix/sbin/zabbix_server: alert manager #1 [sent 0, failed 0 alerts, idle 5.027609 sec durin...
    ├─8364 /apps/zabbix/sbin/zabbix_server: alerter #1 started
    ├─8365 /apps/zabbix/sbin/zabbix_server: alerter #2 started
    ├─8366 /apps/zabbix/sbin/zabbix_server: alerter #3 started
    ├─8367 /apps/zabbix/sbin/zabbix_server: preprocessing manager #1 [queued 0, processed 11 values, idle 5.00...
    ├─8368 /apps/zabbix/sbin/zabbix_server: preprocessing worker #1 started
    ├─8369 /apps/zabbix/sbin/zabbix_server: preprocessing worker #2 started
    ├─8370 /apps/zabbix/sbin/zabbix_server: preprocessing worker #3 started
    ├─8371 /apps/zabbix/sbin/zabbix_server: lld manager #1 [processed 0 LLD rules, idle 5.008702sec during 5.0...
    ├─8372 /apps/zabbix/sbin/zabbix_server: lld worker #1 started
    ├─8373 /apps/zabbix/sbin/zabbix_server: lld worker #2 started
    ├─8374 /apps/zabbix/sbin/zabbix_server: housekeeper [startup idle for 30 minutes]
    ├─8375 /apps/zabbix/sbin/zabbix_server: timer #1 [updated 0 hosts, suppressed 0 events in 0.001868 sec, id...
    ├─8376 /apps/zabbix/sbin/zabbix_server: http poller #1 [got 0 values in 0.001502 sec, idle 5 sec]
    ├─8377 /apps/zabbix/sbin/zabbix_server: discoverer #1 [processed 0 rules in 0.004759 sec, idle 60 sec]
    ├─8378 /apps/zabbix/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000050 sec,...
    ├─8379 /apps/zabbix/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000175 sec,...
    ├─8380 /apps/zabbix/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000029 sec,...
    ├─8381 /apps/zabbix/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000019 sec,...
    ├─8382 /apps/zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.004440 sec, idle 3 sec]...
    ├─8383 /apps/zabbix/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000028 sec, id...
    ├─8384 /apps/zabbix/sbin/zabbix_server: self-monitoring [processed data in 0.000016 sec, idle 1 sec]
    ├─8385 /apps/zabbix/sbin/zabbix_server: task manager [processed 0 task(s) in 0.000836 sec, idle 5 sec]
    ├─8386 /apps/zabbix/sbin/zabbix_server: poller #1 [got 0 values in 0.000050 sec, idle 1 sec]
    ├─8387 /apps/zabbix/sbin/zabbix_server: poller #2 [got 0 values in 0.000048 sec, idle 1 sec]
    ├─8388 /apps/zabbix/sbin/zabbix_server: poller #3 [got 1 values in 0.001602 sec, idle 1 sec]
    ├─8389 /apps/zabbix/sbin/zabbix_server: poller #4 [got 0 values in 0.000019 sec, idle 1 sec]
    ├─8390 /apps/zabbix/sbin/zabbix_server: poller #5 [got 0 values in 0.001402 sec, idle 1 sec]
    ├─8391 /apps/zabbix/sbin/zabbix_server: unreachable poller #1 [got 0 values in 0.000039 sec, idle 5 sec]
    ├─8392 /apps/zabbix/sbin/zabbix_server: trapper #1 [processed data in 0.000000 sec, waiting for connection...
    ├─8393 /apps/zabbix/sbin/zabbix_server: trapper #2 [processed data in 0.000000 sec, waiting for connection...
    ├─8394 /apps/zabbix/sbin/zabbix_server: trapper #3 [processed data in 0.000000 sec, waiting for connection...
    ├─8395 /apps/zabbix/sbin/zabbix_server: trapper #4 [processed data in 0.000000 sec, waiting for connection...
    ├─8396 /apps/zabbix/sbin/zabbix_server: trapper #5 [processed data in 0.000000 sec, waiting for connection...
    ├─8397 /apps/zabbix/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000020 sec, idle 5 sec]
    └─8398 /apps/zabbix/sbin/zabbix_server: alert syncer [queued 0 alerts(s), flushed 0 result(s) in 0.001557 ...
    Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Server...
    Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Server.
    • zabbix-agent服务状态

      [root@shichu apps]# systemctl status zabbix-agent
      ● zabbix-agent.service - Zabbix Agent
      Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
      Active: active (running) since Thu 2022-07-14 00:47:09 CST; 58s ago
      Process: 8349 ExecStart=/apps/zabbix/sbin/zabbix_agentd -c $CONFFILE (code=exited, status=0/SUCCESS)
      Main PID: 8353 (zabbix_agentd)
      CGroup: /system.slice/zabbix-agent.service
      ├─8353 /apps/zabbix/sbin/zabbix_agentd -c /apps/zabbix/etc/zabbix_agentd.conf
      ├─8354 /apps/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
      ├─8355 /apps/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
      ├─8356 /apps/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
      ├─8357 /apps/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
      └─8358 /apps/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
      Jul 14 00:47:08 shichu systemd[1]: Starting Zabbix Agent...
      Jul 14 00:47:09 shichu systemd[1]: Started Zabbix Agent.

启动

5. 配置Web界面

初始化设置

浏览器访问本地IP(10.0.0.100)

  • 本地环境检查

  • 配置数据库信息

  • 配置zabbix信息

  • 信息确认

  • 创建配置

需要手动下载配置文件上传至zabbix sever的/home/nginx/zabbix/conf/目录下

  • 完成安装

  • 登录
默认用户名:Admin #注意A是大写
密码:zabbix

  • 进入首页

优化设置

设置中文菜单

显示中文

解决监控项乱码

  • 监控项存在乱码

  • 从Windows选择一种字体,如楷体(simkai.ttf)

  • 上传字体至zabbix web目录

具体路径为:/home/nginx/zabbix/assets/fonts

  • 修改zabbix调用字体
vim /home/nginx/zabbix/include/defines.inc.php
#修改如下两处即可
//define('ZBX_GRAPH_FONT_NAME', 'DejaVuSans'); // font file name
define('ZBX_GRAPH_FONT_NAME', 'simkai'); // font file name
#define('ZBX_FONT_NAME', 'DejaVuSans');
define('ZBX_FONT_NAME', 'simkai');
  • 验证字体生效

字体自动生效,无需重启zabbix及nginx服务

七、实现 Nginx、Mysql 的监控

Mysql
Slave
10.0.0.27
Master
10.0.0.17
Zabbix Server
10.0.0.100
Nginx
10.0.0.7

1. 安装zabbix agent

  • 通过yum安装agent yum install zabbix50-agent

  • 修改agent配置文件

    [root@nginx ~]# grep '^[a-Z]' /etc/zabbix_agentd.conf
    PidFile=/run/zabbix/zabbix_agentd.pid
    LogFile=/var/log/zabbix/zabbix_agentd.log
    LogFileSize=0
    Server=10.0.0.100 #zabbix-server的IP或Proxy的IP
    ListenPort=10050 #监听端口,默认值
    StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口
    ServerActive=10.0.0.100 #主动模式下的zabbix-server的IP或Proxy的IP
    Hostname=10.0.0.7 #区分大小写且在zabbix server中值唯一,默认填本机IP
    Include=/etc/zabbix_agentd.conf.d/*.conf #在文件末尾新增子配置文件路径

    启动服务

    mkdir -p /etc/zabbix_agentd.conf.d

    systemctl start zabbix-agent

    查看状态

    [root@nginx ~]# systemctl status zabbix-agent
    ● zabbix-agent.service - Zabbix Monitor Agent
    Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
    Active: active (running) since Thu 2022-07-14 16:07:35 CST; 1s ago
    Main PID: 1511 (zabbix_agentd)
    CGroup: /system.slice/zabbix-agent.service
    ├─1511 /usr/sbin/zabbix_agentd -f
    ├─1512 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
    ├─1513 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
    ├─1514 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
    └─1515 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
    Jul 14 16:07:35 nginx systemd[1]: Stopped Zabbix Monitor Agent.
    Jul 14 16:07:35 nginx systemd[1]: Started Zabbix Monitor Agent.
    Jul 14 16:07:35 nginx zabbix_agentd[1511]: Starting Zabbix Agent [10.0.0.7]. Zabbix 5.0.21 (revision 47104dd574).
    Jul 14 16:07:35 nginx zabbix_agentd[1511]: Press Ctrl+C to exit.
  • web界面添加被监控主机

    配置——主机——创建主机

2. 实现监控Nginx

  1. 准备nginx状态页
#添加nginx状态配置
[root@nginx ~]# cat /etc/nginx/nginx.conf
#在server块中添加状态页信息
...
location /nginx_status {
stub_status;
allow 10.0.0.0/24;
allow 127.0.0.1;
}
  1. 准备nginx监控脚本
[root@nginx etc]# cat /etc/zabbix_agentd.d/nginx_status.sh
#!/bin/bash
nginx_status_fun(){ #函数内容
NGINX_PORT=$1 #端口,函数的第一个参数是脚本的第二个参数,即脚本的第二个参数是端口号
NGINX_COMMAND=$2 #命令,函数的第二个参数是脚本的第三个参数,即脚本的第三个参数是命令
nginx_active(){ #获取nginx_active数量,以下相同,这是开启了nginx状态但是只能从本机看到
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Active' | awk '{print $NF}'
}
nginx_reading(){ #获取状态的数量
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Reading' | awk '{print $2}'
}
nginx_writing(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Writing' | awk '{print $4}'
}
nginx_waiting(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| grep 'Waiting' | awk '{print $6}'
}
nginx_accepts(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $1}'
}
nginx_handled(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $2}'
}
nginx_requests(){
/usr/bin/curl "http://127.0.0.1:"$NGINX_PORT"/nginx_status" 2>/dev/null| awk NR==3 | awk '{print $3}'
}
case $NGINX_COMMAND in
active)
nginx_active;
;;
reading)
nginx_reading;
;;
writing)
nginx_writing;
;;
waiting)
nginx_waiting;
;;
accepts)
nginx_accepts;
;;
handled)
nginx_handled;
;;
requests)
nginx_requests;
esac
}
main(){ #主函数内容
case $1 in
nginx_status) #分支结构,用于判断用户的输入而进行响应的操作
nginx_status_fun $2 $3; #当输入nginx_status就调用nginx_status_fun,并传递第二和第三个参数
;;
status) #获取状态码
curl -I -s http://127.0.0.1/nginx_status 2>/dev/null | awk 'NR==1{print $2}';
;; # -I仅输出HTTP请求头,-s不输出任何东西
*) #其他的输入打印帮助信息
echo $"Usage: $0 {nginx_status key}"
esac
}
main $1 $2 $3
  1. 添加zabbix agent自定义监控项(通过子配置文件方式)

    • 创建子配置文件
    [root@nginx etc]# cat /etc/zabbix_agentd.conf.d/nginx_monitor.conf
    UserParameter=nginx_status[*],/etc/zabbix_agentd.d/nginx_status.sh "$1" "$2" "$3"
  2. 验证测试

#重启服务
systemctl restart nginx zabbix-agent
#本地获取所有nginx状态
[root@nginx zabbix_agentd.d]# curl 127.0.0.1/nginx_status
Active connections: 1
server accepts handled requests
21 21 21
Reading: 0 Writing: 1 Waiting: 0
#本机获取active连接数
[root@nginx zabbix_agentd.d]# /etc/zabbix_agentd.d/nginx_status.sh nginx_status 80 active
1
#server获取active连接数
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.7 -p 10050 -k "nginx_status["nginx_status",80,"active"]"
1
  1. 导入监控模板

    模板参考:https://files.cnblogs.com/files/blogs/744193/nginx-template.xml

    关联模板

    查看导入的nginx模板监控项

  2. 验证监控

3. 实现监控Mysql

1)搭建mysql主从

master(10.0.0.17)

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=17
log-bin
#重启数据库
systemctl restart mariadb
#创建复制用户
MariaDB [(none)]> create user 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#授权复制用户权限
MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#备份数据
[root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 \
--lock-tables > /opt/backup.sql
#将备份数据复制到slave节点
[root@mysql-master ~]# scp /opt/backup.sql 10.0.0.27:/opt/
#查看二进制文件和位置
[root@mysql-master ~]# mysql
MariaDB [(none)]> show master logs;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| mariadb-bin.000001 | 29733 |
| mariadb-bin.000002 | 245 |
+--------------------+-----------+
2 rows in set (0.00 sec)

slave(10.0.0.27)

#修改配置
vim /etc/my.cnf.d/server.cnf
[mysqld]
bind=0.0.0.0
server-id=27
read-only
#重启数据库
systemctl restart mariadb
# 导入master节点备份数据
[root@slave ~]# mysql < /opt/backup.sql
#根据master信息开启同步设置
#其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看)
[root@mysql-slave ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
MASTER_HOST='10.0.0.17',
MASTER_USER='repluser',
MASTER_PASSWORD='',
MASTER_PORT=3306,
MASTER_LOG_FILE='mariadb-bin.000001',
MASTER_LOG_POS=29733,
MASTER_CONNECT_RETRY=10;
#开启slave
MariaDB [(none)]> start slave;
#显示状态信息
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.17
Master_User: repluser
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 245
Relay_Log_File: mariadb-relay-bin.000003
Relay_Log_Pos: 531
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
Master_Server_Id: 17

2)利用percona工具实现监控

官网下载地址:https://www.percona.com/downloads/

安装包:https://www.percona.com/downloads/percona-monitoring-plugins/LATEST/

  1. 安装percona插件
#下载
wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/redhat/7/x86_64/percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装
yum install -y percona-zabbix-templates-1.1.8-1.noarch.rpm
#安装php
yum install -y php php-mysql
#复制模板
cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix_agentd.conf.d/
#创建连接mysql数据库的php配置文件
vim /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf
<?php
$mysql_user = 'root';
$mysql_pass = '';
#重启
systemctl restart zabbix-agent
  1. 在zabbix-server上测试
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.17 -p 10050 -k MySQL.Key-reads
19
[root@zabbix ~]# /apps/zabbix/bin/zabbix_get -s 10.0.0.27 -p 10050 -k MySQL.Key-reads
0
  1. 关联主机模板

    注意:默认的模板/var/lib/zabbix/percona/templates/zabbix_agent_template_percona_mysql_server_ht_2.0.9-sver1.1.8.xml不可用,需要进行修改。

    参考:https://files.cnblogs.com/files/blogs/744193/PerconaMySQLServer.xml

  1. 查看监控状态

  1. 监控类型更改为主动式

  1. 验证监控

4. 问题

1. 主动模式下监控数据正常,但ZBX图标为灰色未变绿

解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。

ZBX图标变绿

八、zabbix实现故障和恢复的邮件通知

1. 实现故障自治愈

1)agent开启远程执行命令权限

[root@nginx tmp]# grep '^[a-Z]' /etc/zabbix_agentd.conf
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1 #开启远程执行命令功能
Server=10.0.0.100
ListenPort=10050
StartAgents=3
ServerActive=10.0.0.100
Hostname=10.0.0.7
User=zabbix
UnsafeUserParameters=1 #允许远程执行命令的时候使用不安全的参数(特殊的字符串)
Include=/etc/zabbix_agentd.conf.d/*.conf

2)agent添加zabbix用户授权

[root@nginx ~]# vim /etc/sudoers
......
root ALL=(ALL) ALL
zabbix ALL=NOPASSWD:ALL #授权zabbix用户执行特殊命令不再需要密码,比如sudo命令

重启服务

systemctl restart zabbix-agent

3)创建动作

  • 添加动作名称和执行条件

  • 添加具体操作指令

    远程执行的命令要写绝对路径

2. 实现邮件通知

1) 邮箱开启SMTP

进入个人邮箱,开启SMTP功能

发短信获取授权码

2) 创建报警媒介类型

设置邮箱参考:https://service.mail.qq.com/cgi-bin/help?subtype=1&&id=28&&no=371

密码写前面获取的授权码

3)给用户添加报警媒介

选择Admin用户

选择报警媒介,点击添加

类型选择前面创建的报警媒介,收件人选择要发送信息的对象

更新报警媒介

4)创建动作

  • 在自治愈动作上添加发送邮件操作

  • 添加故障发生时、故障恢复后的操作

发送故障时的邮件通告内容

恢复后的邮件通告内容

3. 验证故障告警邮件及恢复邮件通告功能

1)关闭nginx服务

查看80端口

nginx自动恢复

2)zabbix能够自动执行恢复指令及发送通知邮件

3)登录个人邮箱,查看告警邮件信息

posted @   areke  阅读(144)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 上周热点回顾(2.24-3.2)
点击右上角即可分享
微信分享提示