redis、zabbix

一、redis配置两种持久化存储方式

Redis持久化存储分为RDB(Redis DataBase)和AOF(AppendOnlyFile)两种方式,详见 https://www.cnblogs.com/areke/p/16482870.html 第二章节部分(对比 redis 的 RDB、AOF 模式的优缺点)。

1. RDB

实现RDB方式有:

  • save:同步,会阻塞其它命令
  • bgsave:异步后台执行,不影响其它命令的执行
  • 自动:制定规则,自动执行

1)手动save备份

redis-cli -h 127.0.0.1 -a $PASSWORD --no-auth-warning save
[root@redis ~]#redis-cli -h 10.0.0.53 --no-auth-warning save
OK

查看备份

[root@redis ~]#ls -l /var/lib/redis/
total 4
-rw-rw---- 1 redis redis 993 Sep 4 20:26 /var/lib/redis/dump.rdb

2)手动bgsave备份

redis-cli -h 127.0.0.1 -a $PASSWORD --no-auth-warning bgsave
[root@redis ~]#redis-cli -h 10.0.0.53 --no-auth-warning bgsave
Background saving started

查看备份

[root@redis ~]#ls -l /var/lib/redis/
total 4
-rw-rw---- 1 redis redis 993 Sep 4 20:30 dump.rdb

3)redis自动备份

配置redis.conf

save 900 1 #在900秒内有1个key内容发生更改,自动备份生成RDB文件,可设置多条策略
save 300 10
save 60 3
dbfilename dump.rdb
dir /var/lib/redis
stop-writes-on-bgsave-error yes #可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,此项针对自动save有效
rdbcompression yes #持久化到RDB文件时进行压缩
rdbchecksum yes #对备份文件开启RC64校验

验证

# 测试60S内修改3个key,验证是否生成RDB文件
10.0.0.53:6379> mset key1 v1 key2 v2 key3 v3
[root@redis ~]#ls -l /var/lib/redis/
total 4
-rw-rw---- 1 redis redis 1059 Sep 4 20:45 dump.rdb

2. AOF

配置redis.conf

# rdb配置
save 900 1 #在900秒内有1个key内容发生更改,自动备份生成RDB文件,可设置多条策略
save 300 10
save 60 3
dbfilename dump.rdb # rdb文件名
dir /var/lib/redis # 存放路径
stop-writes-on-bgsave-error yes #可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,此项针对自动save有效
rdbcompression yes #持久化到RDB文件时进行压缩
rdbchecksum yes #对备份文件开启RC64校验
# aof配置
appendonly yes # 开启aof
appendfilename "appendonly-6379.aof" # aof文件名
appendfsync everysec # 每秒执行一次fsync,即redis异常时最多会丢失1s数据
no-appendfsync-on-rewrite yes # 在aof rewrite期间,对aof新记录的append暂缓使用文件同步策略,主要考虑磁盘IO开支和请求阻塞时间。
auto-aof-rewrite-percentage 100 # 当aof log增长超过指定百分比例时,重写AOF文件,重写是为了使aof体积保持最小,但是还可以确保保存最完整的数据
auto-aof-rewrite-min-size 64mb # 触发aof rewrite的最小文件大小
aof-load-truncated yes # 加载由于某些原因导致的末尾异常的AOF文件(主进程被kill/断电等)

验证

# 新增记录
10.0.0.53:6379> CONFIG GET appendonly
1) "appendonly"
2) "yes"
10.0.0.53:6379> set kkk1 vvv1
OK
# 查看aof文件
[root@redis redis]#ls -l /var/lib/redis/
total 8
-rw-r----- 1 redis redis 56 Sep 4 22:05 appendonly-6379.aof
-rw-rw---- 1 redis redis 1086 Sep 4 22:04 dump.rdb

第一次开启AOF操作方法

注意: AOF 模式默认是关闭的,第一次开启AOF后,并重启服务生效后,会因为AOF的优先级高于RDB,而AOF默认没有文件存在,从而导致所有数据丢失

#停止redis
systemctl stop redis
# 关闭aof
vim redis
...
appendonly no
# 拷贝rdb备份
cp dump.rdb-bak dump.rdb
#重启redis,确认数据恢复
systemctl start redis
#直接在命令行热修改redis配置,打开aof(临时生效),这时redis就会将内存中的数据对应的日志,写入aof文件中,此时aof和rdb两份数据文件的数据就同步了
127.0.0.1:6379> CONFIG set appendonly yes
[root@redis redis]#ls -l
total 8
-rw-rw---- 1 redis redis 1086 Sep 4 22:19 appendonly-6379.aof
-rw-rw---- 1 redis redis 1086 Sep 4 22:15 dump.rdb
# 启动aof
vim redis
...
appendonly yes
# 重启redis
systemctl restart redis
# 查看数据
[root@redis ~]#redis-cli -h 10.0.0.53 dbsize
(integer) 110

redis主要配置项说明

bind 0.0.0.0 #监听地址,可以用空格隔开后多个监听IP
protected-mode yes #redis3.2之后加入的新特性,在没有设置bind IP和密码的时候,redis只允许访问127.0.0.1:6379,可以远程连接,
#但当访问将提示警告信息并拒绝远程访问
port 6379 #监听端口,默认6379/tcp
tcp-backlog 511 #三次握手的时候server端收到client ack确认号之后的队列值,即全连接队列长度
timeout 0 #客户端和Redis服务端的连接超时时间,默认是0,表示永不超时
tcp-keepalive 300 #tcp会话保持时间300s
daemonize no #默认no,即直接运行redis-server程序时,不作为守护进程运行,而是以前台方式运行,如果想在后台运行需改成yes,当redis作为守护进程运行的时候,
#它会写一个pid到/var/run/redis.pid 文件
supervised no #和OS相关参数,可设置通过upstart和systemd管理Redis守护进程,centos7后都使用systemd
pidfile /var/run/redis_6379.pid #pid文件路径,可以修改为/apps/redis/run/redis_6379.pid
loglevel notice #日志级别
logfile "/path/redis.log" #日志路径,示例:logfile "/apps/redis/log/redis_6379.log"
databases 16 #设置数据库数量,默认:0-15,共16个库
always-show-logo yes #在启动redis 时是否显示或在日志中记录记录redis的logo
save 900 1 #在900秒内有1个key内容发生更改,就执行快照机制
save 300 10 #在300秒内有10个key内容发生更改,就执行快照机制
save 60 10000 #60秒内如果有10000个key以上的变化,就自动快照备份
stop-writes-on-bgsave-error yes #默认为yes时,可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,生产建议为no
#此项只针对配置文件中的自动save有效
rdbcompression yes #持久化到RDB文件时,是否压缩,"yes"为压缩,"no"则反之
rdbchecksum yes #是否对备份文件开启RC64校验,默认是开启
dbfilename dump.rdb #快照文件名
dir ./ #快照文件保存路径,示例:dir "/apps/redis/data"
#主从复制相关
# replicaof <masterip> <masterport> #指定复制的master主机地址和端口,5.0版之前的指令为slaveof
# masterauth <master-password> #指定复制的master主机的密码
replica-serve-stale-data yes #当从库同主库失去连接或者复制正在进行,从机库有两种运行方式:
#1、设置为yes(默认设置),从库会继续响应客户端的读请求,此为建议值
#2、设置为no,除去特定命令外的任何请求都会返回一个错误"SYNC with master in progress"。
replica-read-only yes #是否设置从库只读,建议值为yes,否则主库同步从库时可能会覆盖数据,造成数据丢失
repl-diskless-sync no #是否使用socket方式复制数据(无盘同步),新slave第一次连接master时需要做数据的全量同步,
#redis server就要从内存dump出新的RDB文件,然后从master传到slave,有两种方式把RDB文件传输给客户端:
#1、基于硬盘(disk-backed):为no时,master创建一个新进程dump生成RDB磁盘文件,RDB完成之后由父进程(即主进程)将RDB文件发送给slaves,此为默认值
#2、基于socket(diskless):master创建一个新进程直接dump RDB至slave的网络socket,不经过主进程和硬盘
#推荐使用基于硬盘(为no),是因为RDB文件创建后,可以同时传输给更多的slave,但是基于socket(为yes), 新slave连接到master之后得逐个同步数据。
#只有当磁盘I/O较慢且网络较快时,可用diskless(yes),否则一般建议使用磁盘(no)
repl-diskless-sync-delay 5 #diskless时复制的服务器等待的延迟时间,设置0为关闭,在延迟时间内到达的客户端,会一起通过diskless方式同步数据,但是一旦
#复制开始,master节点不会再接收新slave的复制请求,直到下一次同步开始才再接收新请求。即无法为延迟时间后到达的新副本提供
#服务,新副本将排队等待下一次RDB传输,因此服务器会等待一段时间才能让更多副本到达。推荐值:30-60
repl-ping-replica-period 10 #slave根据master指定的时间进行周期性的PING master,用于监测master状态,默认10s
repl-timeout 60 #复制连接的超时时间,需要大于repl-ping-slave-period,否则会经常报超时
repl-disable-tcp-nodelay no #是否在slave套接字发送SYNC之后禁用 TCP_NODELAY,如果选择"yes",Redis将合并多个报文为一个大的报文,从而使用更少数量的
#包向slaves发送数据,但是将使数据传输到slave上有延迟,Linux内核的默认配置会达到40毫秒,如果 "no" ,数据传输到slave的
#延迟将会减少,但要使用更多的带宽
repl-backlog-size 512mb #复制缓冲区内存大小,当slave断开连接一段时间后,该缓冲区会累积复制副本数据,因此当slave 重新连接时,通常不需要完全重新同步,
#只需传递在副本中的断开连接后没有同步的部分数据即可。只有在至少有一个slave连接之后才分配此内存空间,建议建立主从时此值要调大
#一些或在低峰期配置,否则会导致同步到slave失败
repl-backlog-ttl 3600 #多长时间内master没有slave连接,就清空backlog缓冲区
replica-priority 100 #当master不可用,哨兵Sentinel会根据slave的优先级选举一个master,此值最低的slave会优先当选master,而配置成0,永远不会
#被选举,一般多个slave都设为一样的值,让其自动选择
#min-replicas-to-write 3 #至少有3个可连接的slave,mater才接受写操作
#min-replicas-max-lag 10 #和上面至少3个slave的ping延迟不能超过10秒,否则master也将停止写操作
requirepass foobared #设置redis连接密码,之后需要AUTH pass,如果有特殊符号,用" "引起来,生产建议设置
rename-command #重命名一些高危命令,示例:rename-command FLUSHALL "" 禁用命令
#示例: rename-command del magedu
maxclients 10000 #Redis最大连接客户端
maxmemory <bytes> #redis使用的最大内存,单位为bytes字节,0为不限制,建议设为物理内存一半,8G内存的计算方式8(G)*1024(MB)*1024(KB)*1024(Kbyte),
#需要注意的是缓冲区是不计算在maxmemory内,生产中如果不设置此项,可能会导致OOM
appendonly no #是否开启AOF日志记录,默认redis使用的是rdb方式持久化,这种方式在许多应用中已经足够用了,但是redis如果中途宕机,会导致可能有
#几分钟的数据丢失(取决于dump数据的间隔时间),根据save来策略进行持久化,Append Only File是另一种持久化方式,可以提供更好的
#持久化特性,Redis会把每次写入的数据在接收后都写入appendonly.aof文件,每次启动时Redis都会先把这个文件的数据读入内存里,
#先忽略RDB文件。默认不启用此功能
appendfilename "appendonly.aof" #文本文件AOF的文件名,存放在dir指令指定的目录中
appendfsync everysec #aof持久化策略的配置
#no表示由操作系统保证数据同步到磁盘,Linux的默认fsync策略是30秒,最多会丢失30s的数据
#always表示每次写入都执行fsync,以保证数据同步到磁盘,安全性高,性能较差
#everysec表示每秒执行一次fsync,可能会导致丢失这1s数据,此为默认值,也是生产建议值
#同时在执行bgrewriteaof操作和主进程写aof文件的操作,两者都会操作磁盘,而bgrewriteaof往往会涉及大量磁盘操作,这样就会造成主进程在写aof文件的时候出现阻塞的情形,
#以下参数实现控制
no-appendfsync-on-rewrite no #在aof rewrite期间,是否对aof新记录的append暂缓使用文件同步策略,主要考虑磁盘IO开支和请求阻塞时间。
#默认为no,表示"不暂缓",新的aof记录仍然会被立即同步到磁盘,是最安全的方式,不会丢失数据,但是要忍受阻塞的问题
#为yes,相当于将appendfsync设置为no,这说明并没有执行磁盘操作,只是写入了缓冲区,因此这样并不会造成阻塞(因为没有竞争磁盘),
#但是如果这个时候redis挂掉,就会丢失数据。丢失多少数据呢?Linux的默认fsync策略是30秒,最多会丢失30s的数据,但由于yes性能较好而且
#会避免出现阻塞因此比较推荐
#rewrite 即对aof文件进行整理,将空闲空间回收,从而可以减少恢复数据时间
auto-aof-rewrite-percentage 100 #当aof log增长超过指定百分比例时,重写AOF文件,设置为0表示不自动重写aof日志,重写是为了使aof体积保持最小,
#但是还可以确保保存最完整的数据
auto-aof-rewrite-min-size 64mb #触发aof rewrite的最小文件大小
aof-load-truncated yes #是否加载由于某些原因导致的末尾异常的AOF文件(主进程被kill/断电等),建议yes
aof-use-rdb-preamble no #redis4.0新增RDB-AOF混合持久化格式,在开启了这个功能之后,AOF重写产生的文件将同时包含RDB格式的内容和AOF格式的内容,
#其中RDB格式的内容用于记录已有的数据,而AOF格式的内容则用于记录最近发生了变化的数据,这样Redis就可以同时兼有RDB持久化
#和AOF持久化的优点(既能够快速地生成重写文件,也能够在出现问题时,快速地载入数据),默认为no,即不启用此功能
lua-time-limit 5000 #lua脚本的最大执行时间,单位为毫秒
cluster-enabled yes #是否开启集群模式,默认不开启,即单机模式
cluster-config-file nodes-6379.conf #由node节点自动生成的集群配置文件名称
cluster-node-timeout 15000 #集群中node节点连接超时时间,单位ms,超过此时间,会踢出集群
cluster-replica-validity-factor 10 #单位为次,在执行故障转移的时候可能有些节点和master断开一段时间导致数据比较旧,这些节点就不适用于选举为master,
#超过这个时间的就不会被进行故障转移,不能当选master,计算公式:(node-timeout*replica-validity-factor)+repl-pingreplica-period
cluster-migration-barrier 1 #集群迁移屏障,一个主节点至少拥有1个正常工作的从节点,即如果主节点的slave节点故障后会将多余的从节点分配到当前主节点成为其新的从节点。
cluster-require-full-coverage yes #集群请求槽位全部覆盖,如果一个主库宕机且没有备库就会出现集群槽位不全,那么yes时redis集群槽位验证不全,就不再对外提供服务
#(对key赋值时,会出现CLUSTERDOWN The cluster is down的提示,cluster_state:fail,但ping 仍PONG),而no则可以继续使用,
#但是会出现查询数据查不到的情况(因为有数据丢失)。生产建议为no
cluster-replica-no-failover no #如果为yes,此选项阻止在主服务器发生故障时尝试对其主服务器进行故障转移。但是,主服务器仍然可以执行手动强制故障转移,一般为no
#Slow log 是 Redis 用来记录超过指定执行时间的日志系统,执行时间不包括与客户端交谈,发送回复等I/O操作,而是实际执行命令所需的时间(在该阶段线程被阻塞并且不能同时
#为其它请求提供服务),由于slow log 保存在内存里面,读写速度非常快,因此可放心地使用,不必担心因为开启 slow log 而影响Redis 的速度
slowlog-log-slower-than 10000 #以微秒为单位的慢日志记录,为负数会禁用慢日志,为0会记录每个命令操作。默认值为10ms,一般一条命令执行都在微秒级,
#生产建议设为1ms-10ms之间
slowlog-max-len 128 #最多记录多少条慢日志的保存队列长度,达到此长度后,记录新命令会将最旧的命令从命令队列中删除,以此滚动删除,即,先进先出,队列固定长度,
#默认128,值偏小,生产建议设为1000以上

二、redis集群配置主从模式

master节点(10.0.0.53)

  1. 安装redis

    apt install -y redis
  2. 修改redis.conf配置

    vim /etc/redis/redis.conf
    ...
    bind 0.0.0.0
    masterauth "123456" # slave连接master使用的密码
    requirepass "123456" # 客户端连接密码
    ...
  3. 重启redis

    systemctl restart redis

slave节点(10.0.0.63)

  1. 安装redis

    apt install -y redis
  2. 修改redis.conf配置

    vim /etc/redis/redis.conf
    ...
    bind 0.0.0.0
    masterauth "123456" # slave连接master使用的密码
    requirepass "123456" # 客户端连接密码
    replicaof 10.0.0.53 6379 # master节点的IP和端口
    ...
  3. 重启redis

    systemctl restart redis

同步验证

查看master状态

[root@master ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.63,port=6379,state=online,offset=252,lag=0
master_replid:a6d36cfca1cb45585f0697ea933a74ae49b530bf
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:252
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:252

查看slave状态

[root@slave ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.53
master_port:6379
master_link_status:up
master_last_io_seconds_ago:11
master_sync_in_progress:0
slave_repl_offset:224
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:a6d36cfca1cb45585f0697ea933a74ae49b530bf
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:224

修改master节点信息,验证slave节点正确同步

# master节点新增记录
[root@master ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> set k1 v1
OK
# slave节点信息同步
[root@slave ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> get k1
"v1"

三、redis集群配置集群模式

架构

Redis-Cluster
Redis-S1
Redis-M1
Redis-M2
Redis-S2
Redis-M3
Redis-S3
#集群节点
Redis-node1:10.0.0.51
Redis-node2:10.0.0.52
Redis-node3:10.0.0.53
Redis-node4: 10.0.0.54
Redis-node5: 10.0.0.55
Redis-node6: 10.0.0.56
#预留节点
Redis-node7: 10.0.0.57
Redis-node8: 10.0.0.58

1. 安装redis

apt install -y redis

2. 修改redis配置

vim /etc/redis/redis.conf
...
bind 0.0.0.0
masterauth 123456
requirepass 123456
cluster-enabled yes #取消此行注释,开启集群,开启后redis 进程会有cluster显示
cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护
cluster-require-full-coverage no #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能
...

或命令修改

sed -i 's/^bind 127.0.0.1 ::1/bind 0.0.0.0/' /etc/redis/redis.conf
echo -e "masterauth 123456\nrequirepass 123456\ncluster-enabled yes\ncluster-config-file nodes-6379.conf\ncluster-require-full-coverage no" >> /etc/redis/redis.conf
systemctl restart redis

3. 创建集群

在任一机器上执行

redis-cli -a 123456 --cluster create 10.0.0.51:6379 10.0.0.52:6379 10.0.0.53:6379 10.0.0.54:6379 \
10.0.0.55:6379 10.0.0.56:6379 --cluster-replicas 1

创建过程,输入yes自动创建集群

[root@node1 ~]#redis-cli -a 123456 --cluster create 10.0.0.51:6379 10.0.0.52:6379 10.0.0.53:6379 10.0.0.54:6379 \
> 10.0.0.55:6379 10.0.0.56:6379 --cluster-replicas 1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.55:6379 to 10.0.0.51:6379 #master为10.0.0.51,slave为10.0.0.55
Adding replica 10.0.0.56:6379 to 10.0.0.52:6379 #master为10.0.0.52,slave为10.0.0.56
Adding replica 10.0.0.54:6379 to 10.0.0.53:6379 #master为10.0.0.53,slave为10.0.0.54
M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379 #当前master节点ID
slots:[0-5460] (5461 slots) master #当前master的槽位起始和结束位
M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379
slots:[5461-10922] (5462 slots) master
M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379
slots:[10923-16383] (5461 slots) master
S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379 #当前slave节点ID
replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec
S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379
replicates e07fc57be51d9aaf69822061010425e30e36a428
S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379
replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....
>>> Performing Cluster Check (using node 10.0.0.51:6379)
M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379
slots: (0 slots) slave
replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2
S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379
slots: (0 slots) slave
replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec
M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379
slots: (0 slots) slave
replicates e07fc57be51d9aaf69822061010425e30e36a428
M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

4. 验证集群状态

查看集群状态

[root@node1 ~]#redis-cli -a 123456 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6 #6个节点
cluster_size:3 #3组集群
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:4014
cluster_stats_messages_pong_sent:4179
cluster_stats_messages_sent:8193
cluster_stats_messages_ping_received:4174
cluster_stats_messages_pong_received:4014
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:8193
# 查看任意节点的集群状态 #--no-auth-warning表示忽略告警信息
[root@node1 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379 --no-auth-warning
10.0.0.51:6379 (e07fc57b...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.53:6379 (211bf9e4...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.52:6379 (89ff87bb...) -> 0 keys | 5462 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

查看集群所有节点信息、主从节点对应关系

[root@node1 ~]#redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 myself,master - 0 1663077859000 1 connected 0-5460
b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663077862173 6 connected
b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663077861147 4 connected
211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663077861000 3 connected 10923-16383
b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663077860122 5 connected
89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663077861000 2 connected 5461-10922

5. 验证集群写入

#使用选项-c以集群方式连接,连接至集群中任意一节点均可
[root@node1 ~]#redis-cli -a 123456 -h 10.0.0.51 -c
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.51:6379> set key1 v1
-> Redirected to slot [9189] located at 10.0.0.52:6379 # 自动转移连接至10.0.0.52
OK
10.0.0.52:6379> get key1
"v1"

6. 集群扩容

将一主一从加入集群

1) 安装redis

# 安装redis
apt install -y redis
# 配置集群模式
sed -i 's/^bind 127.0.0.1 ::1/bind 0.0.0.0/' /etc/redis/redis.conf
echo -e "masterauth 123456\nrequirepass 123456\ncluster-enabled yes\ncluster-config-file nodes-6379.conf\ncluster-require-full-coverage no" >> /etc/redis/redis.conf
systemctl restart redis

2) 将node7节点加入集群

默认作为master节点加入集群

redis-cli -a 123456 --cluster add-node 10.0.0.57:6379 10.0.0.51:6379
# 添加节点执行过程
[root@node7 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.57:6379 10.0.0.51:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 10.0.0.57:6379 to cluster 10.0.0.51:6379
>>> Performing Cluster Check (using node 10.0.0.51:6379)
M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379
slots: (0 slots) slave
replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2
S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379
slots: (0 slots) slave
replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec
M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379
slots: (0 slots) slave
replicates e07fc57be51d9aaf69822061010425e30e36a428
M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 10.0.0.57:6379 to make it join the cluster.
[OK] New node added correctly. #新节点加入至集群中
# 观察到该节点已经加入成功,但此节点上没有slot位,也无从节点,而且新的节点是master
[root@node7 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.51:6379 (e07fc57b...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.57:6379 (92faaa7d...) -> 0 keys | 0 slots | 0 slaves. # 无槽位,无slave节点
10.0.0.53:6379 (211bf9e4...) -> 1 keys | 5461 slots | 1 slaves.
10.0.0.52:6379 (89ff87bb...) -> 1 keys | 5462 slots | 1 slaves.
[OK] 2 keys in 4 masters.
0.00 keys per slot on average.
# 无槽位,无slave节点
[root@node7 ~]#redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663084485001 3 connected 10923-16383
92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 myself,master - 0 1663084486000 0 connected
e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663084485000 1 connected 0-5460
89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663084487049 2 connected 5461-10922
b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663084487000 1 connected
b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663084488074 2 connected
b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663084486026 3 connected

3) 重新分配槽位

添加主机之后需要对添加至集群种的新主机重新分片,否则其没有分片也就无法写入数据。

redis-cli -a 123456 --cluster reshard 10.0.0.51:6379
...
How many slots do you want to move (from 1 to 16384)?4096 #新分配多少个槽位=16384/master个数
What is the receiving node ID? 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 #接收槽位新的masterID
Please enter all the source node IDs.
Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes IDs.
Source node #1: all #将哪些源主机的槽位分配给新的节点,all是自动在所有的redis node选择划分,
#如果是从redis cluster删除某个主机可以使用此方式将指定主机上的槽位全部移动到别的redis主机
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认分配
# 确定slot分配成功
[root@node7 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.51:6379 (e07fc57b...) -> 0 keys | 4096 slots | 1 slaves.
10.0.0.57:6379 (92faaa7d...) -> 0 keys | 4096 slots | 0 slaves.
10.0.0.53:6379 (211bf9e4...) -> 1 keys | 4096 slots | 1 slaves.
10.0.0.52:6379 (89ff87bb...) -> 1 keys | 4096 slots | 1 slaves.
[OK] 2 keys in 4 masters.
0.00 keys per slot on average.
# 集群节点信息
[root@node7 ~]#redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663085974000 3 connected 12288-16383
92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 myself,master - 0 1663085974000 7 connected 0-1364 5461-6826 10923-12287
e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663085975808 1 connected 1365-5460
89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663085974788 2 connected 6827-10922
b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663085975000 1 connected
b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663085976829 2 connected
b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663085975000 3 connected

4) 将node8节点加入集群

作为node7 slave节点加入集群

# 10.0.0.51可为集群任一节点IP,ID为node7 master节点ID
redis-cli -a 123456 --cluster add-node 10.0.0.58:6379 10.0.0.51:6379 \
--cluster-slave --cluster-master-id 92faaa7d5ac5f6d978b3dd2f7028323b46b02706
# 执行过程
[root@node8 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.58:6379 10.0.0.51:6379 \
> --cluster-slave --cluster-master-id 92faaa7d5ac5f6d978b3dd2f7028323b46b02706
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 10.0.0.58:6379 to cluster 10.0.0.51:6379
>>> Performing Cluster Check (using node 10.0.0.51:6379)
M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379
slots:[1365-5460] (4096 slots) master
1 additional replica(s)
S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379
slots: (0 slots) slave
replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2
M: 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379
slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379
slots: (0 slots) slave
replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec
M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379
slots:[12288-16383] (4096 slots) master
1 additional replica(s)
S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379
slots: (0 slots) slave
replicates e07fc57be51d9aaf69822061010425e30e36a428
M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379
slots:[6827-10922] (4096 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 10.0.0.58:6379 to make it join the cluster.
Waiting for the cluster to join
>>> Configure node as replica of 10.0.0.57:6379.
[OK] New node added correctly.

5) 验证是否成功

# 查看集群节点信息
[root@node8 ~]#redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663086607000 2 connected
53ec6239ce484104709a51f70578d31db5453983 10.0.0.58:6379@16379 myself,slave 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 0 1663086609000 0 connected
b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663086609177 1 connected
89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663086608164 2 connected 6827-10922
e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663086611228 1 connected 1365-5460
92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 master - 0 1663086609000 7 connected 0-1364 5461-6826 10923-12287
211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663086610000 3 connected 12288-16383
b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663086610200 3 connected
# 查看集群信息,每个集群分配4096个槽位
[root@node8 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.51:6379 (e07fc57b...) -> 0 keys | 4096 slots | 1 slaves.
10.0.0.57:6379 (92faaa7d...) -> 0 keys | 4096 slots | 1 slaves.
10.0.0.53:6379 (211bf9e4...) -> 1 keys | 4096 slots | 1 slaves.
10.0.0.52:6379 (89ff87bb...) -> 1 keys | 4096 slots | 1 slaves.
[OK] 2 keys in 4 masters.
0.00 keys per slot on average.
[root@node8 ~]#redis-cli -a 123456 -h 10.0.0.51 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:8 #8个节点
cluster_size:4 #4组主从
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:12776
cluster_stats_messages_pong_sent:13125
cluster_stats_messages_update_sent:5
cluster_stats_messages_sent:25906
cluster_stats_messages_ping_received:13118
cluster_stats_messages_pong_received:12776
cluster_stats_messages_meet_received:7
cluster_stats_messages_received:25901

四、搭建zabbix服务器,监控linux系统和tomcat,mysql

安装zabbix5.0

Mysql
master
10.0.0.42
slave
10.0.0.52
Zabbix
Zabbix-server
Nginx+PHP
10.0.0.12
Zabbix-MySQL
10.0.0.22
Tomcat
10.0.0.32

官方安装说明:

1. 安装依赖包

apt install -y iproute2 ntpdate tcpdump telnet traceroute nfs-kernel-server nfs-common lrzsz tree \
openssl libssl-dev libpcre3 libpcre3-dev zlib1g-dev gcc iotop unzip zip

2. 安装zabbix仓库

wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update

3. 安装Zabbix server,Web前端,agent,zabbix-get

apt install -y zabbix-server-mysql zabbix-frontend-php zabbix-nginx-conf zabbix-agent zabbix-get

4. 创建初始数据库

登录mysql服务器(10.0.0.22)安装mysql数据库

apt install -y mariadb-server=1:10.3.22-1ubuntu1
sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf
systemctl restart mariadb

创建zabbix用户并授权

# 在mysql服务器(10.0.0.22)上执行
[root@mysql ~]# mysql -uroot
MariaDB [(none)]> create database zabbix character set utf8 collate utf8_bin;
MariaDB [(none)]> create user zabbix@'10.0.0.%' identified by '123456';
MariaDB [(none)]> grant all privileges on zabbix.* to zabbix@'10.0.0.%';
MariaDB [(none)]> quit;

验证zabbix-server服务器远程连接mysql数据库

[root@zabbix ~]#mysql -uzabbix -p123456 -h10.0.0.22
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 40
Server version: 5.5.5-10.3.34-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>

在zabbix-server服务器(10.0.0.12)上导入初始架构和数据

zcat /usr/share/doc/zabbix-server-mysql/create.sql.gz | mysql -uzabbix -p123456 -h10.0.0.22 zabbix

5. 配置Zabbix server、zabbix agent

编辑配置文件 /etc/zabbix/zabbix_server.conf

sed -i "/# DBHost=localhost/aDBHost=10.0.0.22" /etc/zabbix/zabbix_server.conf
sed -i "/# DBPassword=/aDBPassword=123456" /etc/zabbix/zabbix_server.conf
sed -i "/# DBPort=/aDBPort=3306" /etc/zabbix/zabbix_server.conf
sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /etc/zabbix/zabbix_server.conf
#查看
[root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_server.conf
ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/run/zabbix/zabbix_server.pid
SocketDir=/run/zabbix
DBHost=10.0.0.22
DBName=zabbix
DBUser=zabbix
DBPassword=123456
DBPort=3306
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
Timeout=4
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000

编辑配置文件/etc/zabbix/zabbix_agentd.conf

sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \
-e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \
-e "2aStartAgents=3" \
-e "2aListenPort=10050" \
-e "2aHostname=10.0.0.12" \
/etc/zabbix/zabbix_agentd.conf
#查看
[root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_agentd.conf
StartAgents=3
ListenPort=10050
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.0.12
ServerActive=10.0.0.12
Hostname=10.0.0.12
Include=/etc/zabbix/zabbix_agentd.d/*.conf

6. 配置Zabbix前端PHP

编辑配置文件 /etc/zabbix/nginx.conf

server {
listen 80;
server_name 10.0.0.12;
...
}

编辑配置文件 /etc/zabbix/php-fpm.conf

echo "php_value[date.timezone] = Asia/Shanghai" >> /etc/zabbix/php-fpm.conf

7. 启动Zabbix server和agent进程

启动Zabbix server和agent进程,并为它们设置开机自启

# 先关闭apache服务
systemctl disable --now apache2
systemctl restart zabbix-server zabbix-agent nginx php7.4-fpm
systemctl enable zabbix-server zabbix-agent nginx php7.4-fpm

8. 配置web界面

  1. 访问http://10.0.0.12(zabbix-server服务器IP),进入web配置页面

  2. 检查本地环境

  3. 配置数据库连接信息

  4. 配置zabbix server信息

​​

  1. 确认信息

  2. 完成安装

  3. 登录

    默认用户名:Admin #注意A是大写
    密码:zabbix
  4. 进入首页

9. 优化zabbix

设置中文菜单

  1. Ubuntu系统目前未安装中文语言环境,当前中文无法选中

  2. Ubuntu安装并设置简体中文语言环境

    # 安装简体中文
    apt install language-pack-zh*
    # 增加中文语言环境变量
    echo 'LANG="zh_CN.UTF-8"' >> /etc/environment
    # 重新设置本地配置
    dpkg-reconfigure locales

    选择zh_CN.UTF-8 UTF-8

    选择zh_CN.UTF-8 UTF-8

    等待完成

  3. 重启系统

    reboot
  4. 选择简体中文

  5. 验证中文菜单生效

解决监控乱码

  1. 部分监控项显示乱码

  2. 从Windows选择一种字体,如楷体(simkai.ttf)

  3. 上传Windows字体至zabbix web目录

    具体路径为:/usr/share/zabbix/assets/fonts

    注意:若楷体文件名为大写(SIMKAI.TTF)需重命名为小写(simkai.ttf),也可TTF重命名为ttf

    [root@zabbix fonts]#pwd
    /usr/share/zabbix/assets/fonts
    [root@zabbix fonts]#ll
    total 11512
    drwxr-xr-x 2 root root 45 Sep 16 00:37 ./
    drwxr-xr-x 5 root root 44 Sep 14 22:45 ../
    lrwxrwxrwx 1 root root 38 Sep 14 22:45 graphfont.ttf -> /etc/alternatives/zabbix-frontend-font
    -rw-r--r-- 1 root root 11787328 Oct 15 2019 simkai.ttf
  4. 修改zabbix调用字体

    vim /usr/share/zabbix/include/defines.inc.php
    # 将ZBX_GRAPH_FONT_NAME从graphfont修改为simkai
    #修改如下两处即可
    #define('ZBX_GRAPH_FONT_NAME', 'simkai'); // font file name
    #define('ZBX_FONT_NAME', 'simkai');
    sed -i -e "/define('ZBX_GRAPH_FONT_NAME'/c define('ZBX_GRAPH_FONT_NAME','simkai');" \
    -e "/define('ZBX_FONT_NAME'/c define('ZBX_FONT_NAME','simkai');" /usr/share/zabbix/include/defines.inc.php
  5. 验证字体生效

    字体自动生效,无需重启zabbix及nginx服务

监控tomcat(10.0.0.32)

1. 安装zabbix agent

安装zabbix仓库

wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update

安装Zabbix agent

apt install -y zabbix-agent

修改配置

sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \
-e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \
-e "2aStartAgents=3" \
-e "2aListenPort=10050" \
-e "/^Hostname=/c Hostname=10.0.0.32" \
/etc/zabbix/zabbix_agentd.conf
[root@tomcat ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf
StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口
ListenPort=10050 #监听端口,默认值
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.0.12 #zabbix-server的IP或Proxy的IP
ServerActive=10.0.0.12 #主动模式下的zabbix-server的IP或Proxy的IP
Hostname=10.0.0.32 #区分大小写且在zabbix server中值唯一,默认填本机IP
Include=/etc/zabbix/zabbix_agentd.d/*.conf #在文件末尾新增子配置文件路径

重启

systemctl restart zabbix-agent

2. 安装tomcat

# 准备jdk环境
#下载jdk-8u291-linux-x64.tar.gz,解压
tar -xvf jdk-8u291-linux-x64.tar.gz -C /usr/local/src
ln -s /usr/local/src/jdk1.8.0_291 /usr/local/src/jdk
#配置环境变量
echo -e "export JAVA_HOME=/usr/local/src/jdk\n
export TOMCAT_HOME=/apps/tomcat\n
export PATH=\$JAVA_HOME/bin:\$JAVA_HOME/jre/bin:\$TOMCAT_HOME/bin:\$PATH\n
export CLASSPATH=.\$CLASSPATH:\$JAVA_HOME/lib:\$JAVA_HOME/jre/lib/tools.jar" \
>> /etc/profile
# 执行生效
source /etc/profile
# 下载tomcat安装包
wget https://dlcdn.apache.org/tomcat/tomcat-8/v8.5.82/bin/apache-tomcat-8.5.82.tar.gz
# 解压至/apps
mkdir -pv /apps
tar -xvf apache-tomcat-8.5.82.tar.gz -C /apps
ln -s /apps/apache-tomcat-8.5.82 /apps/tomcat
# 准备测试页
echo "tomcat web page" > /apps/tomcat/webapps/ROOT/index.html
# 启动
/apps/tomcat/bin/catalina.sh start

验证tomcat页面

访问

3. 部署java gateway

说明:java gateway是一个独立于zabbix server和zabbix agent的组件,java gateway可以安装在单独的服务器上,也可以安装在zabbix server或zabbix agent服务器上,前提是端口不要配置冲突。

在zabbix server上安装java gateway

apt install -y zabbix-java-gateway

修改配置

sed -i -e '/^# LISTEN_IP="0.0.0.0"/a LISTEN_IP="0.0.0.0"' \
-e "/^# LISTEN_PORT=10052/a LISTEN_PORT=10052" \
-e "/^# START_POLLERS/a START_POLLERS=50" \
-e "/# TIMEOUT=/a TIMEOUT=30" \
/etc/zabbix/zabbix_java_gateway.conf
[root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_java_gateway.conf
LISTEN_IP="0.0.0.0"
LISTEN_PORT=10052
PID_FILE="/run/zabbix/zabbix_java_gateway.pid"
START_POLLERS=50
TIMEOUT=30

重启服务

systemctl restart zabbix-java-gateway.service

验证端口

[root@zabbix-server /]#lsof -i:10052
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 47205 zabbix 12u IPv6 409025 0t0 TCP *:10052 (LISTEN)

4. 配置zabbix server调用java gateway

sed -i -e "/^# JavaGateway=/a JavaGateway=0.0.0.0" \
-e "/^# JavaGatewayPort=/a JavaGatewayPort=10052" \
-e "/^# StartJavaPollers=/a StartJavaPollers=20" \
/etc/zabbix/zabbix_server.conf
[root@zabbix-server /]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_server.conf
ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/run/zabbix/zabbix_server.pid
SocketDir=/run/zabbix
DBHost=10.0.0.22
DBName=zabbix
DBUser=zabbix
DBPassword=123456
DBPort=3306
JavaGateway=0.0.0.0 #监听地址
JavaGatewayPort=10052 #指定java gateway的服务器监听端口,若是默认端口可不写
StartJavaPollers=20 #启动多少个线程去轮询java gateway
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
Timeout=4
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000

重启zabbix server服务

systemctl restart zabbix-server.service

验证java pollers

[root@zabbix ~]#ps -ef|grep java
zabbix 11006 1 0 23:39 ? 00:00:00 java -server -Dlogback.configurationFile=/etc/zabbix/zabbix_java_gateway_logback.xml -classpath \
lib:lib/android-json-4.3_r3.1.jar:lib/logback-classic-1.2.9.jar:lib/logback-core-1.2.9.jar:lib/slf4j-api-1.7.32.jar:bin/zabbix-java-gateway-5.0.27.jar \
-Dzabbix.pidFile=/run/zabbix/zabbix_java_gateway.pid -Dzabbix.listenIP=0.0.0.0 -Dzabbix.listenPort=10052 -Dzabbix.startPollers=50 -Dzabbix.timeout=30 \
-Dsun.rmi.transport.tcp.responseTimeout=30000 com.zabbix.gateway.JavaGateway
zabbix 11557 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #1 [got 0 values in 0.000010 sec, idle 5 sec]
zabbix 11558 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #2 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11559 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #3 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11560 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #4 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11561 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #5 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11562 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #6 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11563 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #7 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11564 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #8 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11565 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #9 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11566 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #10 [got 0 values in 0.000012 sec, idle 5 sec]
zabbix 11567 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #11 [got 0 values in 0.000006 sec, idle 5 sec]
zabbix 11568 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #12 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11569 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #13 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11570 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #14 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11571 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #15 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11572 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #16 [got 0 values in 0.000006 sec, idle 5 sec]
zabbix 11573 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #17 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11574 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #18 [got 0 values in 0.000005 sec, idle 5 sec]
zabbix 11575 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #19 [got 0 values in 0.000007 sec, idle 5 sec]
zabbix 11576 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #20 [got 0 values in 0.000005 sec, idle 5 sec]
root 11611 1629 0 23:51 pts/0 00:00:00 grep --color=auto java

5. tomcat开启JMX监控

修改/apps/tomcat/bin/catalina.sh配置

sed -i '1aCATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=10.0.0.32"' \
/apps/tomcat/bin/catalina.sh
# 参数说明
CATALINA_OPTS="$CATALINA_OPTS
-Dcom.sun.management.jmxremote #启用远程监控JMX
-Dcom.sun.management.jmxremote.port=12345 #默认启动JMX端口号
-Dcom.sun.management.jmxremote.authenticate=false #不使用用户名密码
-Dcom.sun.management.jmxremote.ssl=false # 不使用ssl认证
-Djava.rmi.server.hostname=10.0.0.32" # tomcat主机IP地址,非zabbix服务器地址

启动tomcat

[root@tomcat bin]#./catalina.sh start
Using CATALINA_BASE: /apps/tomcat
Using CATALINA_HOME: /apps/tomcat
Using CATALINA_TMPDIR: /apps/tomcat/temp
Using JRE_HOME: /usr/local/src/jdk
Using CLASSPATH: /apps/tomcat/bin/bootstrap.jar:/apps/tomcat/bin/tomcat-juli.jar
Using CATALINA_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=10.0.0.32
Tomcat started.

查看端口

# 8080,12345端口打开
[root@tomcat bin]#netstat -ntl
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:47585 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:34245 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:48329 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:39443 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN
tcp6 0 0 :::2049 :::* LISTEN
tcp6 0 0 :::58755 :::* LISTEN
tcp6 0 0 127.0.0.1:8005 :::* LISTEN
tcp6 0 0 :::44139 :::* LISTEN
tcp6 0 0 :::111 :::* LISTEN
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 :::42419 :::* LISTEN
tcp6 0 0 :::46453 :::* LISTEN
tcp6 0 0 :::41813 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 :::12345 :::* LISTEN
tcp6 0 0 ::1:6010 :::* LISTEN
tcp6 0 0 :::38427 :::* LISTEN

6. zabbix server添加JMX监控

进入配置--主机--创建主机,添加tomcat信息

关联模板,选择Template OS Linux by Zabbix agent active、Template App Generic Java JMX模板

7. 验证当前JMX状态及数据

验证JMX状态

ZBX、JMX图标为绿色

验证JMX数据

8. JMX监控生产模板使用

导入自定义模板

配置--模板--导入

关联模板

关联上一步导入的监控模板,对之前的JMX模板执行取消链接并清理

验证JMX状态及数据

状态正常

数据正常

9. 配置zabbix agent主动模式注意事项

存在问题:主动模式下监控数据正常,但ZBX图标为灰色未变绿

解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。

percona监控mysql

1. 搭建mysql主从(10.0.0.42/52)

master(10.0.0.42)

# 安装mariadb
apt install -y mariadb-server=1:10.3.22-1ubuntu1
sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf
systemctl restart mariadb
# 修改配置
sed -i '/^\[mysqld/a server-id=42\nlog-bin' /etc/mysql/mariadb.conf.d/50-server.cnf
#查看
[root@mysql-master ~]#egrep -v '^#|^$' /etc/mysql/mariadb.conf.d/50-server.cnf
[server]
[mysqld]
server-id=42
log-bin
user = mysql
pid-file = /run/mysqld/mysqld.pid
socket = /run/mysqld/mysqld.sock
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
bind-address = 0.0.0.0
query_cache_size = 16M
log_error = /var/log/mysql/error.log
expire_logs_days = 10
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
[embedded]
[mariadb]
[mariadb-10.3]
#重启数据库
systemctl restart mariadb
#创建复制用户
mysql -uroot
MariaDB [(none)]> create user 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#授权复制用户权限
MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%';
Query OK, 0 rows affected (0.00 sec)
#若mysql已存在数据,可先备份数据
#备份数据
[root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 \
--lock-tables > /opt/backup.sql
#将备份数据复制到slave节点
[root@mysql-master ~]# scp /opt/backup.sql 10.0.0.52:/opt/
#查看二进制文件和位置
[root@mysql-master ~]# mysql
MariaDB [(none)]> show master logs;
+-------------------+-----------+
| Log_name | File_size |
+-------------------+-----------+
| mysqld-bin.000001 | 691 |
| mysqld-bin.000002 | 387 |
+-------------------+-----------+
2 rows in set (0.001 sec)

slave(10.0.0.52)

# 安装mariadb
apt install -y mariadb-server=1:10.3.22-1ubuntu1
sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf
systemctl restart mariadb
# 修改配置
sed -i '/^\[mysqld/a server-id=52\nread-only' /etc/mysql/mariadb.conf.d/50-server.cnf
#查看
[root@mysql-slave ~]#egrep -v '^#|^$' /etc/mysql/mariadb.conf.d/50-server.cnf
[server]
[mysqld]
server-id=52
read-only
user = mysql
pid-file = /run/mysqld/mysqld.pid
socket = /run/mysqld/mysqld.sock
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
bind-address = 0.0.0.0
query_cache_size = 16M
log_error = /var/log/mysql/error.log
expire_logs_days = 10
character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
[embedded]
[mariadb]
[mariadb-10.3]
#重启数据库
systemctl restart mariadb
# 导入master节点备份数据
[root@mysql-slave ~]#mysql < /opt/backup.sql
#根据master信息开启同步设置
#其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看)
[root@mysql-slave ~]# mysql
MariaDB [(none)]> CHANGE MASTER TO
MASTER_HOST='10.0.0.42',
MASTER_USER='repluser',
MASTER_PASSWORD='',
MASTER_PORT=3306,
MASTER_LOG_FILE='mysqld-bin.000001',
MASTER_LOG_POS=691,
MASTER_CONNECT_RETRY=10;
#开启slave
MariaDB [(none)]> start slave;
#显示状态信息
MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.42
Master_User: repluser
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysqld-bin.000002
Read_Master_Log_Pos: 387
Relay_Log_File: mysqld-relay-bin.000003
Relay_Log_Pos: 687
Relay_Master_Log_File: mysqld-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
Master_Server_Id: 42

2. master安装zabbix agent

安装agent

wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update
apt install -y zabbix-agent

配置

sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \
-e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \
-e "2aStartAgents=3" \
-e "2aListenPort=10050" \
-e "/^Hostname=/c Hostname=10.0.0.42" \
/etc/zabbix/zabbix_agentd.conf
[root@tomcat ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf
StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口
ListenPort=10050 #监听端口,默认值
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.0.12 #zabbix-server的IP或Proxy的IP
ServerActive=10.0.0.12 #主动模式下的zabbix-server的IP或Proxy的IP
Hostname=10.0.0.42 #区分大小写且在zabbix server中值唯一,默认填本机IP
Include=/etc/zabbix/zabbix_agentd.d/*.conf #在文件末尾新增子配置文件路径

重启

systemctl restart zabbix-agent

3. 安装percona

1)修改zabbix agent启动用户为root

# 修agent改配置文件
sed -i -e "/^# AllowRoot=0/a AllowRoot=1" \ #允许root启动
-e "/^# User=/a User=root" \
/etc/zabbix/zabbix_agentd.conf
# 修改服务启动文件
sed -i -e "/^User=/c User=root" \
-e "/^Group=/c Group=root" \
/lib/systemd/system/zabbix-agent.service
# 重启服务
systemctl daemon-reload
systemctl restart zabbix-agent
# 查看
[root@mysql-master ~]#ps -ef|grep zabbix
root 8754 1 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
root 8763 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
root 8764 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
root 8765 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
root 8766 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
root 8767 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 8775 8351 0 17:47 pts/2 00:00:00 grep --color=auto zabbix

2)安装percona

官网文档:

下载地址:,无对应ubuntu版本,选择最新版本即可

# 下载percona软件包并安装,
wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/debian/artful/x86_64/percona-zabbix-templates_1.1.8-1.artful_all.deb
dpkg -i percona-zabbix-templates_1.1.8-1.artful_all.deb
cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix/zabbix_agentd.d/
systemctl restart zabbix-agent.service
# 安装PHP环境,注意:percona与php7.2不兼容
apt install -y php7.4 php7.4-mysql
# 创建mysql认证
cat > /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf <<EOF
<?php
\$mysql_user = 'root';
\$mysql_pass = '';
EOF
# 测试脚本获取数据
[root@mysql-master opt]#/var/lib/zabbix/percona/scripts/get_mysql_stats_wrapper.sh gg
75

3)导入percona模板

可使用该模板 https://files.cnblogs.com/files/blogs/744193/PerconaMySQLServer.xml?t=1657952009

4)添加主机

5)关联模板

6)验证mysql监控

percona模板中的监控项默认是五分钟收集一次监控数据,会结合脚本检查agent上报错数据文件的时间戳是否超过五分钟,脚本位置在

# zabbix server测试
[root@zabbix-server ~]#zabbix_get -s 10.0.0.42 -p 10050 -k MySQL.Key-read-requests
75

自定义脚本监控mysql

1. mysql-slave安装agent

#! /bin/bash
zabbix_server=10.0.0.12
IP=`hostname -i|awk '{print $1}'`
cd /opt
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update
sleep 1
apt install -y zabbix-agent
sleep 1
sed -i -e "/^Server=127.0.0.1/c Server=$zabbix_server" \
-e "/^ServerActive=127.0.0.1/c ServerActive=$zabbix_server" \
-e "2aStartAgents=3" \
-e "2aListenPort=10050" \
-e "/^Hostname=/c Hostname=$IP" \
-e "/^# AllowRoot=0/a AllowRoot=1" \
-e "/^# User=/a User=root" \
/etc/zabbix/zabbix_agentd.conf
sed -i -e "/^User=/c User=root" \
-e "/^Group=/c Group=root" \
/lib/systemd/system/zabbix-agent.service
systemctl daemon-reload
systemctl restart zabbix-agent
systemctl is-active zabbix-agent.service
if [ $? -eq 0 ];then
echo 'install successed!'
else
echo 'install failed!'
fi

2. 脚本内容

#! /bin/bash
Seconds_Behind_Master(){
NUM=`mysql -uroot -e "show slave status\G;"|grep "Seconds_Behind_Master:"|awk '{print $2}'`
echo $NUM
}
master_slave_check(){
NUM1=`mysql -uroot -e "show slave status\G;"|grep "Slave_IO_Running:"|awk '{print $2}'`
NUM2=`mysql -uroot -e "show slave status\G;"|grep "Slave_SQL_Running:"|awk '{print $2}'`
if [[ $NUM1 == "Yes" && $NUM2 == "Yes" ]];then
echo 50
else
echo 100
fi
}
main(){
case $1 in
Seconds_Behind_Master)
Seconds_Behind_Master;
;;
master_slave_check)
master_slave_check;
;;
esac
}
main $1

测试

[root@mysql-slave ~]#./mysql_monitor.sh master_slave_check
50

3. 自定义监控项配置

#cat /etc/zabbix/zabbix_agentd.d/all.conf
UserParameter=mysql_monitor[*],/etc/zabbix/zabbix_agentd.d/mysql_monitor.sh "$1"
systemctl restart zabbix-agent
# zabbix server测试
[root@zabbix-server ~]#zabbix_get -s 10.0.0.52 -p 10050 -k "mysql_monitor[master_slave_check]"
50

4. 自定义模板

  1. 创建模板

    配置--模板--创建模板

  2. 添加监控项

    配置--模板,选择之前创建的模板mysql_monitor

    添加监控项信息

  3. 添加触发器

  4. 添加图形

5. 关联主机

  1. 添加主机

  2. 关联模板

  3. 验证监控数据

五、自定义监控项,实现故障邮件通知;

实现故障自治愈

1. agent开启远程执行命令权限

[root@mysql-slave ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf
StartAgents=3
ListenPort=10050
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1 #开启远程执行命令功能
Server=10.0.0.12
ServerActive=10.0.0.12
Hostname=10.0.0.52
AllowRoot=1
User=root
UnsafeUserParameters=1 #允许远程执行命令的时候使用不安全的参数(特殊的字符串)
Include=/etc/zabbix/zabbix_agentd.d/*.conf

2. agent添加zabbix用户授权

[root@mysql-slave ~]## vim /etc/sudoers
......
root ALL=(ALL) ALL
zabbix ALL=NOPASSWD:ALL #授权zabbix用户执行特殊命令不再需要密码,比如sudo命令

重启服务

systemctl restart zabbix-agent

3. 创建动作

添加动作名称和执行条件

添加具体操作指令

查看添加动作

实现邮件通知

163邮箱配置参考:

1. 邮箱开启SMTP

登录个人邮箱,进入设置,开启SMTP功能

发送短信

获取授权码

2. 创建报警媒介类型

管理--报警媒介类型--创建

​​

3. 给用户添加报警媒介

选择Admin用户

选择报警媒介,点击添加

类型选择前面创建的报警媒介,收件人选择要发送信息的对象

​​

更新报警媒介

​​

4. 创建动作

  • 在自治愈动作上添加发送邮件操作

  • 添加故障发生时、故障恢复后的操作

    发送故障时的邮件通知内容

    恢复操作添加的发送邮件通知内容

    最终动作操作步骤内容

验证故障告警邮件及恢复邮件通知功能

1. 停止mysql slave状态同步

mysql -e "stop slave;"

查看slave状态

[root@mysql-slave ~]#mysql -e "show slave status\G;"
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.0.0.42
Master_User: repluser
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysqld-bin.000002
Read_Master_Log_Pos: 387
Relay_Log_File: mysqld-relay-bin.000008
Relay_Log_Pos: 556
Relay_Master_Log_File: mysqld-bin.000002
Slave_IO_Running: No
Slave_SQL_Running: No

2. zabbix自动执行恢复指令及发送通知邮件

3. 登录个人邮箱,查看告警邮件信息


六、尝试使用zabbix proxy实现跨网段分布式监控。

zabbix-server
10.0.0.51
zabbix-proxy
10.0.0.53
zabbix-agent1
10.0.0.54
zabbix-agent2
10.0.0.55

安装zabbix-server

参考第四章节安装zabbix5.0部分。

安装zabbix-proxy

安装zabbix仓库

wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update

安装zabbix-proxy

apt install -y zabbix-proxy-mysql

安装本地数据库

apt install -y mysql-sever
sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mysql.conf.d/mysqld.cnf
systemctl restart mysql

创建zabbix_proxy数据库,zabbix用户并授权

[root@zabbix-proxy ~]#mysql -uroot
mysql> create database zabbix_proxy character set utf8 collate utf8_bin;
mysql> create user zabbix@localhost identified by '123456';
mysql> grant all privileges on zabbix_proxy.* to zabbix@localhost;
mysql> quit;

导入初始架构和数据

zcat /usr/share/doc/zabbix-proxy-mysql/schema.sql.gz | mysql -uzabbix -p123456 zabbix_proxy

编辑配置文件

sed -i "/# DBPassword=/aDBPassword=123456" /etc/zabbix/zabbix_proxy.conf

重启服务

systemctl restart zabbix-proxy
systemctl enable zabbix-proxy

安装zabbix-agent

安装agent

wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb
dpkg -i zabbix-release_5.0-1+focal_all.deb
apt update
apt install -y zabbix-agent

修改配置

[root@zabbix-agent1 ~]#grep "^[A-Z]" /etc/zabbix/zabbix_agentd.conf
PidFile=/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.0.0.51,10.0.0.53 # zabbix server与proxy地址
ServerActive=10.0.0.53 # 主动模式zabbix proxy地址
Hostname=10.0.0.54
Include=/etc/zabbix/zabbix_agentd.d/*.conf

重启服务

systemctl restart zabbix-agent.service

配置zabbix proxy主动模式

zabbix proxy配置说明

ProxyMode=0 #O为主动,1为被动
server=10.0.0.51 #zabbix server服务器的地址或主机名
Hostname=10.0.0.53 #代理服务器名称,需要与zabbix server添加代理时候的代理程序名称是一致的
ListenPort=10051 #zabbix proxy监听端口
LogFile=/tmp/zabbix_proxy.1og #日志文件
Enab1eRemoteCommands=1 #允许zabbix server执行远程命令
DBHost=127.0.0.1 #数据库服务器地址
DBName=zabbix_proxy #使用的数据库名称
DBUser=zabbix #连接数据库的用户名称
DBPassword=123456 #数据库用户密码
DBPort=3306 #数据库端口
ProxyLocalBuffer=720 #已经提交到zabbix server的数据保留时间
ProxyOfflineBuffer=720 #未提交到zabbix serve r的时间保留时间
HeartbeatFrequency=60 #心跳间隔检测时间,默认60秒,范围0-3600秒,被动模式不使用
ConfigFrequency=5 #间隔多少秒从zabbix server获取监控项信息
DataSenderFrequency=5 #数据发送时间间隔,默认为1秒,范围为1-3600秒,被动模式不使用
StartPollers=20 #启动的数据采集器数量
JavaGateway=172.31.0.104 #java gateway服务器地址,当需要监控java的时候必须配置否则监控不到数据
JavaGatewayPort=10052 #java gatewa服务端口
StartJavaPollers=20 #启动多少个线程采集数据
CacheSize=2G #保存监控项而占用的最大内存
HistoryCacheSize=2G #保存监控历史数据占用的最大内存
HistoryIndexcachesize=128M #历史索引缓存的大小
Timeout=30 #监控项超时时间,单位为秒
LogSlowQueries=3000 #毫秒,多久的数据库查询会被记录到日志

配置zabbix proxy

[root@zabbix-proxy ~]#grep '^[A-Z]' /etc/zabbix/zabbix_proxy.conf
ProxyMode=0
Server=10.0.0.51
Hostname=10.0.0.53
ListenPort=10051
LogFile=/var/log/zabbix/zabbix_proxy.log
LogFileSize=0
EnableRemoteCommands=1
PidFile=/run/zabbix/zabbix_proxy.pid
SocketDir=/run/zabbix
DBHost=127.0.0.1
DBName=zabbix_proxy
DBUser=zabbix
DBPassword=123456
DBPort=3306
ProxyLocalBuffer=720
ProxyOfflineBuffer=720
HeartbeatFrequency=60
ConfigFrequency=5
DataSenderFrequency=5
StartPollers=20
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
CacheSize=256M
HistoryCacheSize=256M
HistoryIndexCacheSize=128M
Timeout=30
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1

重启zabbix proxy

[root@zabbix-proxy ~]#systemctl restart zabbix-proxy.service

web页面添加主动代理

进入管理--agent代理程序,添加代理程序名称。

注意:该名称要与proxy配置中的Hostname保持一致

添加主机时,zabbix agent使用主动代理

验证状态

验证当前主机状态

验证主机监控数据及图形

posted @   areke  阅读(148)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
点击右上角即可分享
微信分享提示