redis、zabbix
一、redis配置两种持久化存储方式
Redis持久化存储分为RDB(Redis DataBase)和AOF(AppendOnlyFile)两种方式,详见 https://www.cnblogs.com/areke/p/16482870.html 第二章节部分(对比 redis 的 RDB、AOF 模式的优缺点)。
1. RDB
实现RDB方式有:
- save:同步,会阻塞其它命令
- bgsave:异步后台执行,不影响其它命令的执行
- 自动:制定规则,自动执行
1)手动save备份
redis-cli -h 127.0.0.1 -a $PASSWORD --no-auth-warning save [root@redis ~]#redis-cli -h 10.0.0.53 --no-auth-warning save OK
查看备份
[root@redis ~]#ls -l /var/lib/redis/ total 4 -rw-rw---- 1 redis redis 993 Sep 4 20:26 /var/lib/redis/dump.rdb
2)手动bgsave备份
redis-cli -h 127.0.0.1 -a $PASSWORD --no-auth-warning bgsave [root@redis ~]#redis-cli -h 10.0.0.53 --no-auth-warning bgsave Background saving started
查看备份
[root@redis ~]#ls -l /var/lib/redis/ total 4 -rw-rw---- 1 redis redis 993 Sep 4 20:30 dump.rdb
3)redis自动备份
配置redis.conf
save 900 1 #在900秒内有1个key内容发生更改,自动备份生成RDB文件,可设置多条策略 save 300 10 save 60 3 dbfilename dump.rdb dir /var/lib/redis stop-writes-on-bgsave-error yes #可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,此项针对自动save有效 rdbcompression yes #持久化到RDB文件时进行压缩 rdbchecksum yes #对备份文件开启RC64校验
验证
# 测试60S内修改3个key,验证是否生成RDB文件 10.0.0.53:6379> mset key1 v1 key2 v2 key3 v3 [root@redis ~]#ls -l /var/lib/redis/ total 4 -rw-rw---- 1 redis redis 1059 Sep 4 20:45 dump.rdb
2. AOF
配置redis.conf
# rdb配置 save 900 1 #在900秒内有1个key内容发生更改,自动备份生成RDB文件,可设置多条策略 save 300 10 save 60 3 dbfilename dump.rdb # rdb文件名 dir /var/lib/redis # 存放路径 stop-writes-on-bgsave-error yes #可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,此项针对自动save有效 rdbcompression yes #持久化到RDB文件时进行压缩 rdbchecksum yes #对备份文件开启RC64校验 # aof配置 appendonly yes # 开启aof appendfilename "appendonly-6379.aof" # aof文件名 appendfsync everysec # 每秒执行一次fsync,即redis异常时最多会丢失1s数据 no-appendfsync-on-rewrite yes # 在aof rewrite期间,对aof新记录的append暂缓使用文件同步策略,主要考虑磁盘IO开支和请求阻塞时间。 auto-aof-rewrite-percentage 100 # 当aof log增长超过指定百分比例时,重写AOF文件,重写是为了使aof体积保持最小,但是还可以确保保存最完整的数据 auto-aof-rewrite-min-size 64mb # 触发aof rewrite的最小文件大小 aof-load-truncated yes # 加载由于某些原因导致的末尾异常的AOF文件(主进程被kill/断电等)
验证
# 新增记录 10.0.0.53:6379> CONFIG GET appendonly 1) "appendonly" 2) "yes" 10.0.0.53:6379> set kkk1 vvv1 OK # 查看aof文件 [root@redis redis]#ls -l /var/lib/redis/ total 8 -rw-r----- 1 redis redis 56 Sep 4 22:05 appendonly-6379.aof -rw-rw---- 1 redis redis 1086 Sep 4 22:04 dump.rdb
第一次开启AOF操作方法
注意: AOF 模式默认是关闭的,第一次开启AOF后,并重启服务生效后,会因为AOF的优先级高于RDB,而AOF默认没有文件存在,从而导致所有数据丢失
#停止redis systemctl stop redis # 关闭aof vim redis ... appendonly no # 拷贝rdb备份 cp dump.rdb-bak dump.rdb #重启redis,确认数据恢复 systemctl start redis #直接在命令行热修改redis配置,打开aof(临时生效),这时redis就会将内存中的数据对应的日志,写入aof文件中,此时aof和rdb两份数据文件的数据就同步了 127.0.0.1:6379> CONFIG set appendonly yes [root@redis redis]#ls -l total 8 -rw-rw---- 1 redis redis 1086 Sep 4 22:19 appendonly-6379.aof -rw-rw---- 1 redis redis 1086 Sep 4 22:15 dump.rdb # 启动aof vim redis ... appendonly yes # 重启redis systemctl restart redis # 查看数据 [root@redis ~]#redis-cli -h 10.0.0.53 dbsize (integer) 110
redis主要配置项说明
bind 0.0.0.0 #监听地址,可以用空格隔开后多个监听IP protected-mode yes #redis3.2之后加入的新特性,在没有设置bind IP和密码的时候,redis只允许访问127.0.0.1:6379,可以远程连接, #但当访问将提示警告信息并拒绝远程访问 port 6379 #监听端口,默认6379/tcp tcp-backlog 511 #三次握手的时候server端收到client ack确认号之后的队列值,即全连接队列长度 timeout 0 #客户端和Redis服务端的连接超时时间,默认是0,表示永不超时 tcp-keepalive 300 #tcp会话保持时间300s daemonize no #默认no,即直接运行redis-server程序时,不作为守护进程运行,而是以前台方式运行,如果想在后台运行需改成yes,当redis作为守护进程运行的时候, #它会写一个pid到/var/run/redis.pid 文件 supervised no #和OS相关参数,可设置通过upstart和systemd管理Redis守护进程,centos7后都使用systemd pidfile /var/run/redis_6379.pid #pid文件路径,可以修改为/apps/redis/run/redis_6379.pid loglevel notice #日志级别 logfile "/path/redis.log" #日志路径,示例:logfile "/apps/redis/log/redis_6379.log" databases 16 #设置数据库数量,默认:0-15,共16个库 always-show-logo yes #在启动redis 时是否显示或在日志中记录记录redis的logo save 900 1 #在900秒内有1个key内容发生更改,就执行快照机制 save 300 10 #在300秒内有10个key内容发生更改,就执行快照机制 save 60 10000 #60秒内如果有10000个key以上的变化,就自动快照备份 stop-writes-on-bgsave-error yes #默认为yes时,可能会因空间满等原因快照无法保存出错时,会禁止redis写入操作,生产建议为no #此项只针对配置文件中的自动save有效 rdbcompression yes #持久化到RDB文件时,是否压缩,"yes"为压缩,"no"则反之 rdbchecksum yes #是否对备份文件开启RC64校验,默认是开启 dbfilename dump.rdb #快照文件名 dir ./ #快照文件保存路径,示例:dir "/apps/redis/data" #主从复制相关 # replicaof <masterip> <masterport> #指定复制的master主机地址和端口,5.0版之前的指令为slaveof # masterauth <master-password> #指定复制的master主机的密码 replica-serve-stale-data yes #当从库同主库失去连接或者复制正在进行,从机库有两种运行方式: #1、设置为yes(默认设置),从库会继续响应客户端的读请求,此为建议值 #2、设置为no,除去特定命令外的任何请求都会返回一个错误"SYNC with master in progress"。 replica-read-only yes #是否设置从库只读,建议值为yes,否则主库同步从库时可能会覆盖数据,造成数据丢失 repl-diskless-sync no #是否使用socket方式复制数据(无盘同步),新slave第一次连接master时需要做数据的全量同步, #redis server就要从内存dump出新的RDB文件,然后从master传到slave,有两种方式把RDB文件传输给客户端: #1、基于硬盘(disk-backed):为no时,master创建一个新进程dump生成RDB磁盘文件,RDB完成之后由父进程(即主进程)将RDB文件发送给slaves,此为默认值 #2、基于socket(diskless):master创建一个新进程直接dump RDB至slave的网络socket,不经过主进程和硬盘 #推荐使用基于硬盘(为no),是因为RDB文件创建后,可以同时传输给更多的slave,但是基于socket(为yes), 新slave连接到master之后得逐个同步数据。 #只有当磁盘I/O较慢且网络较快时,可用diskless(yes),否则一般建议使用磁盘(no) repl-diskless-sync-delay 5 #diskless时复制的服务器等待的延迟时间,设置0为关闭,在延迟时间内到达的客户端,会一起通过diskless方式同步数据,但是一旦 #复制开始,master节点不会再接收新slave的复制请求,直到下一次同步开始才再接收新请求。即无法为延迟时间后到达的新副本提供 #服务,新副本将排队等待下一次RDB传输,因此服务器会等待一段时间才能让更多副本到达。推荐值:30-60 repl-ping-replica-period 10 #slave根据master指定的时间进行周期性的PING master,用于监测master状态,默认10s repl-timeout 60 #复制连接的超时时间,需要大于repl-ping-slave-period,否则会经常报超时 repl-disable-tcp-nodelay no #是否在slave套接字发送SYNC之后禁用 TCP_NODELAY,如果选择"yes",Redis将合并多个报文为一个大的报文,从而使用更少数量的 #包向slaves发送数据,但是将使数据传输到slave上有延迟,Linux内核的默认配置会达到40毫秒,如果 "no" ,数据传输到slave的 #延迟将会减少,但要使用更多的带宽 repl-backlog-size 512mb #复制缓冲区内存大小,当slave断开连接一段时间后,该缓冲区会累积复制副本数据,因此当slave 重新连接时,通常不需要完全重新同步, #只需传递在副本中的断开连接后没有同步的部分数据即可。只有在至少有一个slave连接之后才分配此内存空间,建议建立主从时此值要调大 #一些或在低峰期配置,否则会导致同步到slave失败 repl-backlog-ttl 3600 #多长时间内master没有slave连接,就清空backlog缓冲区 replica-priority 100 #当master不可用,哨兵Sentinel会根据slave的优先级选举一个master,此值最低的slave会优先当选master,而配置成0,永远不会 #被选举,一般多个slave都设为一样的值,让其自动选择 #min-replicas-to-write 3 #至少有3个可连接的slave,mater才接受写操作 #min-replicas-max-lag 10 #和上面至少3个slave的ping延迟不能超过10秒,否则master也将停止写操作 requirepass foobared #设置redis连接密码,之后需要AUTH pass,如果有特殊符号,用" "引起来,生产建议设置 rename-command #重命名一些高危命令,示例:rename-command FLUSHALL "" 禁用命令 #示例: rename-command del magedu maxclients 10000 #Redis最大连接客户端 maxmemory <bytes> #redis使用的最大内存,单位为bytes字节,0为不限制,建议设为物理内存一半,8G内存的计算方式8(G)*1024(MB)*1024(KB)*1024(Kbyte), #需要注意的是缓冲区是不计算在maxmemory内,生产中如果不设置此项,可能会导致OOM appendonly no #是否开启AOF日志记录,默认redis使用的是rdb方式持久化,这种方式在许多应用中已经足够用了,但是redis如果中途宕机,会导致可能有 #几分钟的数据丢失(取决于dump数据的间隔时间),根据save来策略进行持久化,Append Only File是另一种持久化方式,可以提供更好的 #持久化特性,Redis会把每次写入的数据在接收后都写入appendonly.aof文件,每次启动时Redis都会先把这个文件的数据读入内存里, #先忽略RDB文件。默认不启用此功能 appendfilename "appendonly.aof" #文本文件AOF的文件名,存放在dir指令指定的目录中 appendfsync everysec #aof持久化策略的配置 #no表示由操作系统保证数据同步到磁盘,Linux的默认fsync策略是30秒,最多会丢失30s的数据 #always表示每次写入都执行fsync,以保证数据同步到磁盘,安全性高,性能较差 #everysec表示每秒执行一次fsync,可能会导致丢失这1s数据,此为默认值,也是生产建议值 #同时在执行bgrewriteaof操作和主进程写aof文件的操作,两者都会操作磁盘,而bgrewriteaof往往会涉及大量磁盘操作,这样就会造成主进程在写aof文件的时候出现阻塞的情形, #以下参数实现控制 no-appendfsync-on-rewrite no #在aof rewrite期间,是否对aof新记录的append暂缓使用文件同步策略,主要考虑磁盘IO开支和请求阻塞时间。 #默认为no,表示"不暂缓",新的aof记录仍然会被立即同步到磁盘,是最安全的方式,不会丢失数据,但是要忍受阻塞的问题 #为yes,相当于将appendfsync设置为no,这说明并没有执行磁盘操作,只是写入了缓冲区,因此这样并不会造成阻塞(因为没有竞争磁盘), #但是如果这个时候redis挂掉,就会丢失数据。丢失多少数据呢?Linux的默认fsync策略是30秒,最多会丢失30s的数据,但由于yes性能较好而且 #会避免出现阻塞因此比较推荐 #rewrite 即对aof文件进行整理,将空闲空间回收,从而可以减少恢复数据时间 auto-aof-rewrite-percentage 100 #当aof log增长超过指定百分比例时,重写AOF文件,设置为0表示不自动重写aof日志,重写是为了使aof体积保持最小, #但是还可以确保保存最完整的数据 auto-aof-rewrite-min-size 64mb #触发aof rewrite的最小文件大小 aof-load-truncated yes #是否加载由于某些原因导致的末尾异常的AOF文件(主进程被kill/断电等),建议yes aof-use-rdb-preamble no #redis4.0新增RDB-AOF混合持久化格式,在开启了这个功能之后,AOF重写产生的文件将同时包含RDB格式的内容和AOF格式的内容, #其中RDB格式的内容用于记录已有的数据,而AOF格式的内容则用于记录最近发生了变化的数据,这样Redis就可以同时兼有RDB持久化 #和AOF持久化的优点(既能够快速地生成重写文件,也能够在出现问题时,快速地载入数据),默认为no,即不启用此功能 lua-time-limit 5000 #lua脚本的最大执行时间,单位为毫秒 cluster-enabled yes #是否开启集群模式,默认不开启,即单机模式 cluster-config-file nodes-6379.conf #由node节点自动生成的集群配置文件名称 cluster-node-timeout 15000 #集群中node节点连接超时时间,单位ms,超过此时间,会踢出集群 cluster-replica-validity-factor 10 #单位为次,在执行故障转移的时候可能有些节点和master断开一段时间导致数据比较旧,这些节点就不适用于选举为master, #超过这个时间的就不会被进行故障转移,不能当选master,计算公式:(node-timeout*replica-validity-factor)+repl-pingreplica-period cluster-migration-barrier 1 #集群迁移屏障,一个主节点至少拥有1个正常工作的从节点,即如果主节点的slave节点故障后会将多余的从节点分配到当前主节点成为其新的从节点。 cluster-require-full-coverage yes #集群请求槽位全部覆盖,如果一个主库宕机且没有备库就会出现集群槽位不全,那么yes时redis集群槽位验证不全,就不再对外提供服务 #(对key赋值时,会出现CLUSTERDOWN The cluster is down的提示,cluster_state:fail,但ping 仍PONG),而no则可以继续使用, #但是会出现查询数据查不到的情况(因为有数据丢失)。生产建议为no cluster-replica-no-failover no #如果为yes,此选项阻止在主服务器发生故障时尝试对其主服务器进行故障转移。但是,主服务器仍然可以执行手动强制故障转移,一般为no #Slow log 是 Redis 用来记录超过指定执行时间的日志系统,执行时间不包括与客户端交谈,发送回复等I/O操作,而是实际执行命令所需的时间(在该阶段线程被阻塞并且不能同时 #为其它请求提供服务),由于slow log 保存在内存里面,读写速度非常快,因此可放心地使用,不必担心因为开启 slow log 而影响Redis 的速度 slowlog-log-slower-than 10000 #以微秒为单位的慢日志记录,为负数会禁用慢日志,为0会记录每个命令操作。默认值为10ms,一般一条命令执行都在微秒级, #生产建议设为1ms-10ms之间 slowlog-max-len 128 #最多记录多少条慢日志的保存队列长度,达到此长度后,记录新命令会将最旧的命令从命令队列中删除,以此滚动删除,即,先进先出,队列固定长度, #默认128,值偏小,生产建议设为1000以上
二、redis集群配置主从模式
master节点(10.0.0.53)
-
安装redis
apt install -y redis -
修改redis.conf配置
vim /etc/redis/redis.conf ... bind 0.0.0.0 masterauth "123456" # slave连接master使用的密码 requirepass "123456" # 客户端连接密码 ... -
重启redis
systemctl restart redis
slave节点(10.0.0.63)
-
安装redis
apt install -y redis -
修改redis.conf配置
vim /etc/redis/redis.conf ... bind 0.0.0.0 masterauth "123456" # slave连接master使用的密码 requirepass "123456" # 客户端连接密码 replicaof 10.0.0.53 6379 # master节点的IP和端口 ... -
重启redis
systemctl restart redis
同步验证
查看master状态
[root@master ~]#redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:master connected_slaves:1 slave0:ip=10.0.0.63,port=6379,state=online,offset=252,lag=0 master_replid:a6d36cfca1cb45585f0697ea933a74ae49b530bf master_replid2:0000000000000000000000000000000000000000 master_repl_offset:252 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:252
查看slave状态
[root@slave ~]#redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> info replication # Replication role:slave master_host:10.0.0.53 master_port:6379 master_link_status:up master_last_io_seconds_ago:11 master_sync_in_progress:0 slave_repl_offset:224 slave_priority:100 slave_read_only:1 connected_slaves:0 master_replid:a6d36cfca1cb45585f0697ea933a74ae49b530bf master_replid2:0000000000000000000000000000000000000000 master_repl_offset:224 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1 repl_backlog_histlen:224
修改master节点信息,验证slave节点正确同步
# master节点新增记录 [root@master ~]#redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> set k1 v1 OK # slave节点信息同步 [root@slave ~]#redis-cli -a 123456 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> get k1 "v1"
三、redis集群配置集群模式
架构
#集群节点 Redis-node1:10.0.0.51 Redis-node2:10.0.0.52 Redis-node3:10.0.0.53 Redis-node4: 10.0.0.54 Redis-node5: 10.0.0.55 Redis-node6: 10.0.0.56 #预留节点 Redis-node7: 10.0.0.57 Redis-node8: 10.0.0.58
1. 安装redis
apt install -y redis
2. 修改redis配置
vim /etc/redis/redis.conf ... bind 0.0.0.0 masterauth 123456 requirepass 123456 cluster-enabled yes #取消此行注释,开启集群,开启后redis 进程会有cluster显示 cluster-config-file nodes-6379.conf #取消此行注释,此为集群状态文件,记录主从关系及slot范围信息,由redis cluster 集群自动创建和维护 cluster-require-full-coverage no #默认值为yes,设为no可以防止一个节点不可用导致整个cluster不可能 ...
或命令修改
sed -i 's/^bind 127.0.0.1 ::1/bind 0.0.0.0/' /etc/redis/redis.conf echo -e "masterauth 123456\nrequirepass 123456\ncluster-enabled yes\ncluster-config-file nodes-6379.conf\ncluster-require-full-coverage no" >> /etc/redis/redis.conf systemctl restart redis
3. 创建集群
在任一机器上执行
redis-cli -a 123456 --cluster create 10.0.0.51:6379 10.0.0.52:6379 10.0.0.53:6379 10.0.0.54:6379 \ 10.0.0.55:6379 10.0.0.56:6379 --cluster-replicas 1
创建过程,输入yes自动创建集群
[root@node1 ~]#redis-cli -a 123456 --cluster create 10.0.0.51:6379 10.0.0.52:6379 10.0.0.53:6379 10.0.0.54:6379 \ > 10.0.0.55:6379 10.0.0.56:6379 --cluster-replicas 1 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. >>> Performing hash slots allocation on 6 nodes... Master[0] -> Slots 0 - 5460 Master[1] -> Slots 5461 - 10922 Master[2] -> Slots 10923 - 16383 Adding replica 10.0.0.55:6379 to 10.0.0.51:6379 #master为10.0.0.51,slave为10.0.0.55 Adding replica 10.0.0.56:6379 to 10.0.0.52:6379 #master为10.0.0.52,slave为10.0.0.56 Adding replica 10.0.0.54:6379 to 10.0.0.53:6379 #master为10.0.0.53,slave为10.0.0.54 M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379 #当前master节点ID slots:[0-5460] (5461 slots) master #当前master的槽位起始和结束位 M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379 slots:[5461-10922] (5462 slots) master M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379 slots:[10923-16383] (5461 slots) master S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379 #当前slave节点ID replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379 replicates e07fc57be51d9aaf69822061010425e30e36a428 S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379 replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 Can I set the above configuration? (type 'yes' to accept): yes >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join .... >>> Performing Cluster Check (using node 10.0.0.51:6379) M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379 slots: (0 slots) slave replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379 slots: (0 slots) slave replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379 slots: (0 slots) slave replicates e07fc57be51d9aaf69822061010425e30e36a428 M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
4. 验证集群状态
查看集群状态
[root@node1 ~]#redis-cli -a 123456 cluster info Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 #6个节点 cluster_size:3 #3组集群 cluster_current_epoch:6 cluster_my_epoch:1 cluster_stats_messages_ping_sent:4014 cluster_stats_messages_pong_sent:4179 cluster_stats_messages_sent:8193 cluster_stats_messages_ping_received:4174 cluster_stats_messages_pong_received:4014 cluster_stats_messages_meet_received:5 cluster_stats_messages_received:8193 # 查看任意节点的集群状态 #--no-auth-warning表示忽略告警信息 [root@node1 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379 --no-auth-warning 10.0.0.51:6379 (e07fc57b...) -> 0 keys | 5461 slots | 1 slaves. 10.0.0.53:6379 (211bf9e4...) -> 0 keys | 5461 slots | 1 slaves. 10.0.0.52:6379 (89ff87bb...) -> 0 keys | 5462 slots | 1 slaves. [OK] 0 keys in 3 masters. 0.00 keys per slot on average.
查看集群所有节点信息、主从节点对应关系
[root@node1 ~]#redis-cli -a 123456 cluster nodes Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 myself,master - 0 1663077859000 1 connected 0-5460 b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663077862173 6 connected b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663077861147 4 connected 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663077861000 3 connected 10923-16383 b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663077860122 5 connected 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663077861000 2 connected 5461-10922
5. 验证集群写入

#使用选项-c以集群方式连接,连接至集群中任意一节点均可 [root@node1 ~]#redis-cli -a 123456 -h 10.0.0.51 -c Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.51:6379> set key1 v1 -> Redirected to slot [9189] located at 10.0.0.52:6379 # 自动转移连接至10.0.0.52 OK 10.0.0.52:6379> get key1 "v1"
6. 集群扩容
将一主一从加入集群
1) 安装redis
# 安装redis apt install -y redis # 配置集群模式 sed -i 's/^bind 127.0.0.1 ::1/bind 0.0.0.0/' /etc/redis/redis.conf echo -e "masterauth 123456\nrequirepass 123456\ncluster-enabled yes\ncluster-config-file nodes-6379.conf\ncluster-require-full-coverage no" >> /etc/redis/redis.conf systemctl restart redis
2) 将node7节点加入集群
默认作为master节点加入集群
redis-cli -a 123456 --cluster add-node 10.0.0.57:6379 10.0.0.51:6379 # 添加节点执行过程 [root@node7 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.57:6379 10.0.0.51:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. >>> Adding node 10.0.0.57:6379 to cluster 10.0.0.51:6379 >>> Performing Cluster Check (using node 10.0.0.51:6379) M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379 slots: (0 slots) slave replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379 slots: (0 slots) slave replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379 slots: (0 slots) slave replicates e07fc57be51d9aaf69822061010425e30e36a428 M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Send CLUSTER MEET to node 10.0.0.57:6379 to make it join the cluster. [OK] New node added correctly. #新节点加入至集群中 # 观察到该节点已经加入成功,但此节点上没有slot位,也无从节点,而且新的节点是master [root@node7 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.51:6379 (e07fc57b...) -> 0 keys | 5461 slots | 1 slaves. 10.0.0.57:6379 (92faaa7d...) -> 0 keys | 0 slots | 0 slaves. # 无槽位,无slave节点 10.0.0.53:6379 (211bf9e4...) -> 1 keys | 5461 slots | 1 slaves. 10.0.0.52:6379 (89ff87bb...) -> 1 keys | 5462 slots | 1 slaves. [OK] 2 keys in 4 masters. 0.00 keys per slot on average. # 无槽位,无slave节点 [root@node7 ~]#redis-cli -a 123456 cluster nodes Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663084485001 3 connected 10923-16383 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 myself,master - 0 1663084486000 0 connected e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663084485000 1 connected 0-5460 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663084487049 2 connected 5461-10922 b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663084487000 1 connected b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663084488074 2 connected b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663084486026 3 connected
3) 重新分配槽位
添加主机之后需要对添加至集群种的新主机重新分片,否则其没有分片也就无法写入数据。
redis-cli -a 123456 --cluster reshard 10.0.0.51:6379 ... How many slots do you want to move (from 1 to 16384)?4096 #新分配多少个槽位=16384/master个数 What is the receiving node ID? 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 #接收槽位新的masterID Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1: all #将哪些源主机的槽位分配给新的节点,all是自动在所有的redis node选择划分, #如果是从redis cluster删除某个主机可以使用此方式将指定主机上的槽位全部移动到别的redis主机 ...... Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认分配 # 确定slot分配成功 [root@node7 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.51:6379 (e07fc57b...) -> 0 keys | 4096 slots | 1 slaves. 10.0.0.57:6379 (92faaa7d...) -> 0 keys | 4096 slots | 0 slaves. 10.0.0.53:6379 (211bf9e4...) -> 1 keys | 4096 slots | 1 slaves. 10.0.0.52:6379 (89ff87bb...) -> 1 keys | 4096 slots | 1 slaves. [OK] 2 keys in 4 masters. 0.00 keys per slot on average. # 集群节点信息 [root@node7 ~]#redis-cli -a 123456 cluster nodes Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663085974000 3 connected 12288-16383 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 myself,master - 0 1663085974000 7 connected 0-1364 5461-6826 10923-12287 e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663085975808 1 connected 1365-5460 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663085974788 2 connected 6827-10922 b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663085975000 1 connected b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663085976829 2 connected b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663085975000 3 connected
4) 将node8节点加入集群
作为node7 slave节点加入集群
# 10.0.0.51可为集群任一节点IP,ID为node7 master节点ID redis-cli -a 123456 --cluster add-node 10.0.0.58:6379 10.0.0.51:6379 \ --cluster-slave --cluster-master-id 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 # 执行过程 [root@node8 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.58:6379 10.0.0.51:6379 \ > --cluster-slave --cluster-master-id 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. >>> Adding node 10.0.0.58:6379 to cluster 10.0.0.51:6379 >>> Performing Cluster Check (using node 10.0.0.51:6379) M: e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379 slots:[1365-5460] (4096 slots) master 1 additional replica(s) S: b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379 slots: (0 slots) slave replicates 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 M: 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379 slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master S: b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379 slots: (0 slots) slave replicates 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec M: 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379 slots:[12288-16383] (4096 slots) master 1 additional replica(s) S: b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379 slots: (0 slots) slave replicates e07fc57be51d9aaf69822061010425e30e36a428 M: 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379 slots:[6827-10922] (4096 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Send CLUSTER MEET to node 10.0.0.58:6379 to make it join the cluster. Waiting for the cluster to join >>> Configure node as replica of 10.0.0.57:6379. [OK] New node added correctly.
5) 验证是否成功
# 查看集群节点信息 [root@node8 ~]#redis-cli -a 123456 cluster nodes Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. b06418548b244ac03a62d148e9150ced2fb6153d 10.0.0.56:6379@16379 slave 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 0 1663086607000 2 connected 53ec6239ce484104709a51f70578d31db5453983 10.0.0.58:6379@16379 myself,slave 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 0 1663086609000 0 connected b1f953fecfa8b732723e1f4d3f2530a19c3645fa 10.0.0.55:6379@16379 slave e07fc57be51d9aaf69822061010425e30e36a428 0 1663086609177 1 connected 89ff87bbe7433f7e7bd59eb07733b901efdc99e2 10.0.0.52:6379@16379 master - 0 1663086608164 2 connected 6827-10922 e07fc57be51d9aaf69822061010425e30e36a428 10.0.0.51:6379@16379 master - 0 1663086611228 1 connected 1365-5460 92faaa7d5ac5f6d978b3dd2f7028323b46b02706 10.0.0.57:6379@16379 master - 0 1663086609000 7 connected 0-1364 5461-6826 10923-12287 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 10.0.0.53:6379@16379 master - 0 1663086610000 3 connected 12288-16383 b335167bf0ee480271840974737d2863a0fa053f 10.0.0.54:6379@16379 slave 211bf9e40bab0a7c1d66f2d608178ef39cb7eeec 0 1663086610200 3 connected # 查看集群信息,每个集群分配4096个槽位 [root@node8 ~]#redis-cli -a 123456 --cluster info 10.0.0.51:6379 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 10.0.0.51:6379 (e07fc57b...) -> 0 keys | 4096 slots | 1 slaves. 10.0.0.57:6379 (92faaa7d...) -> 0 keys | 4096 slots | 1 slaves. 10.0.0.53:6379 (211bf9e4...) -> 1 keys | 4096 slots | 1 slaves. 10.0.0.52:6379 (89ff87bb...) -> 1 keys | 4096 slots | 1 slaves. [OK] 2 keys in 4 masters. 0.00 keys per slot on average. [root@node8 ~]#redis-cli -a 123456 -h 10.0.0.51 cluster info Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:8 #8个节点 cluster_size:4 #4组主从 cluster_current_epoch:7 cluster_my_epoch:1 cluster_stats_messages_ping_sent:12776 cluster_stats_messages_pong_sent:13125 cluster_stats_messages_update_sent:5 cluster_stats_messages_sent:25906 cluster_stats_messages_ping_received:13118 cluster_stats_messages_pong_received:12776 cluster_stats_messages_meet_received:7 cluster_stats_messages_received:25901
四、搭建zabbix服务器,监控linux系统和tomcat,mysql
安装zabbix5.0
官方安装说明:
1. 安装依赖包
apt install -y iproute2 ntpdate tcpdump telnet traceroute nfs-kernel-server nfs-common lrzsz tree \ openssl libssl-dev libpcre3 libpcre3-dev zlib1g-dev gcc iotop unzip zip
2. 安装zabbix仓库
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update
3. 安装Zabbix server,Web前端,agent,zabbix-get
apt install -y zabbix-server-mysql zabbix-frontend-php zabbix-nginx-conf zabbix-agent zabbix-get
4. 创建初始数据库
登录mysql服务器(10.0.0.22)安装mysql数据库
apt install -y mariadb-server=1:10.3.22-1ubuntu1 sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf systemctl restart mariadb
创建zabbix用户并授权
# 在mysql服务器(10.0.0.22)上执行 [root@mysql ~]# mysql -uroot MariaDB [(none)]> create database zabbix character set utf8 collate utf8_bin; MariaDB [(none)]> create user zabbix@'10.0.0.%' identified by '123456'; MariaDB [(none)]> grant all privileges on zabbix.* to zabbix@'10.0.0.%'; MariaDB [(none)]> quit;
验证zabbix-server服务器远程连接mysql数据库
[root@zabbix ~]#mysql -uzabbix -p123456 -h10.0.0.22 mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 40 Server version: 5.5.5-10.3.34-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04 Copyright (c) 2000, 2022, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
在zabbix-server服务器(10.0.0.12)上导入初始架构和数据
zcat /usr/share/doc/zabbix-server-mysql/create.sql.gz | mysql -uzabbix -p123456 -h10.0.0.22 zabbix
5. 配置Zabbix server、zabbix agent
编辑配置文件 /etc/zabbix/zabbix_server.conf
sed -i "/# DBHost=localhost/aDBHost=10.0.0.22" /etc/zabbix/zabbix_server.conf sed -i "/# DBPassword=/aDBPassword=123456" /etc/zabbix/zabbix_server.conf sed -i "/# DBPort=/aDBPort=3306" /etc/zabbix/zabbix_server.conf sed -i "/StatsAllowedIP=127.0.0.1/c #StatsAllowedIP=127.0.0.1" /etc/zabbix/zabbix_server.conf #查看 [root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_server.conf ListenPort=10051 LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 PidFile=/run/zabbix/zabbix_server.pid SocketDir=/run/zabbix DBHost=10.0.0.22 DBName=zabbix DBUser=zabbix DBPassword=123456 DBPort=3306 SNMPTrapperFile=/var/log/snmptrap/snmptrap.log Timeout=4 AlertScriptsPath=/usr/lib/zabbix/alertscripts ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 LogSlowQueries=3000
编辑配置文件/etc/zabbix/zabbix_agentd.conf
sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \ -e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \ -e "2aStartAgents=3" \ -e "2aListenPort=10050" \ -e "2aHostname=10.0.0.12" \ /etc/zabbix/zabbix_agentd.conf #查看 [root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_agentd.conf StartAgents=3 ListenPort=10050 PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.12 ServerActive=10.0.0.12 Hostname=10.0.0.12 Include=/etc/zabbix/zabbix_agentd.d/*.conf
6. 配置Zabbix前端PHP
编辑配置文件 /etc/zabbix/nginx.conf
server { listen 80; server_name 10.0.0.12; ... }
编辑配置文件 /etc/zabbix/php-fpm.conf
echo "php_value[date.timezone] = Asia/Shanghai" >> /etc/zabbix/php-fpm.conf
7. 启动Zabbix server和agent进程
启动Zabbix server和agent进程,并为它们设置开机自启
# 先关闭apache服务 systemctl disable --now apache2 systemctl restart zabbix-server zabbix-agent nginx php7.4-fpm systemctl enable zabbix-server zabbix-agent nginx php7.4-fpm
8. 配置web界面
-
访问http://10.0.0.12(zabbix-server服务器IP),进入web配置页面
-
检查本地环境
-
配置数据库连接信息
-
配置zabbix server信息
-
确认信息
-
完成安装
-
登录
默认用户名:Admin #注意A是大写 密码:zabbix -
进入首页
9. 优化zabbix
设置中文菜单
-
Ubuntu系统目前未安装中文语言环境,当前中文无法选中
-
Ubuntu安装并设置简体中文语言环境
# 安装简体中文 apt install language-pack-zh* # 增加中文语言环境变量 echo 'LANG="zh_CN.UTF-8"' >> /etc/environment # 重新设置本地配置 dpkg-reconfigure locales 选择zh_CN.UTF-8 UTF-8
选择zh_CN.UTF-8 UTF-8
等待完成
-
重启系统
reboot -
选择简体中文
-
验证中文菜单生效
解决监控乱码
-
部分监控项显示乱码
-
从Windows选择一种字体,如楷体(simkai.ttf)
-
上传Windows字体至zabbix web目录
具体路径为:/usr/share/zabbix/assets/fonts
注意:若楷体文件名为大写(SIMKAI.TTF)需重命名为小写(simkai.ttf),也可TTF重命名为ttf
[root@zabbix fonts]#pwd /usr/share/zabbix/assets/fonts [root@zabbix fonts]#ll total 11512 drwxr-xr-x 2 root root 45 Sep 16 00:37 ./ drwxr-xr-x 5 root root 44 Sep 14 22:45 ../ lrwxrwxrwx 1 root root 38 Sep 14 22:45 graphfont.ttf -> /etc/alternatives/zabbix-frontend-font -rw-r--r-- 1 root root 11787328 Oct 15 2019 simkai.ttf -
修改zabbix调用字体
vim /usr/share/zabbix/include/defines.inc.php # 将ZBX_GRAPH_FONT_NAME从graphfont修改为simkai #修改如下两处即可 #define('ZBX_GRAPH_FONT_NAME', 'simkai'); // font file name #define('ZBX_FONT_NAME', 'simkai'); sed -i -e "/define('ZBX_GRAPH_FONT_NAME'/c define('ZBX_GRAPH_FONT_NAME','simkai');" \ -e "/define('ZBX_FONT_NAME'/c define('ZBX_FONT_NAME','simkai');" /usr/share/zabbix/include/defines.inc.php -
验证字体生效
字体自动生效,无需重启zabbix及nginx服务
监控tomcat(10.0.0.32)
1. 安装zabbix agent
安装zabbix仓库
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update
安装Zabbix agent
apt install -y zabbix-agent
修改配置
sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \ -e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \ -e "2aStartAgents=3" \ -e "2aListenPort=10050" \ -e "/^Hostname=/c Hostname=10.0.0.32" \ /etc/zabbix/zabbix_agentd.conf [root@tomcat ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口 ListenPort=10050 #监听端口,默认值 PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.12 #zabbix-server的IP或Proxy的IP ServerActive=10.0.0.12 #主动模式下的zabbix-server的IP或Proxy的IP Hostname=10.0.0.32 #区分大小写且在zabbix server中值唯一,默认填本机IP Include=/etc/zabbix/zabbix_agentd.d/*.conf #在文件末尾新增子配置文件路径
重启
systemctl restart zabbix-agent
2. 安装tomcat
# 准备jdk环境 #下载jdk-8u291-linux-x64.tar.gz,解压 tar -xvf jdk-8u291-linux-x64.tar.gz -C /usr/local/src ln -s /usr/local/src/jdk1.8.0_291 /usr/local/src/jdk #配置环境变量 echo -e "export JAVA_HOME=/usr/local/src/jdk\n export TOMCAT_HOME=/apps/tomcat\n export PATH=\$JAVA_HOME/bin:\$JAVA_HOME/jre/bin:\$TOMCAT_HOME/bin:\$PATH\n export CLASSPATH=.\$CLASSPATH:\$JAVA_HOME/lib:\$JAVA_HOME/jre/lib/tools.jar" \ >> /etc/profile # 执行生效 source /etc/profile # 下载tomcat安装包 wget https://dlcdn.apache.org/tomcat/tomcat-8/v8.5.82/bin/apache-tomcat-8.5.82.tar.gz # 解压至/apps mkdir -pv /apps tar -xvf apache-tomcat-8.5.82.tar.gz -C /apps ln -s /apps/apache-tomcat-8.5.82 /apps/tomcat # 准备测试页 echo "tomcat web page" > /apps/tomcat/webapps/ROOT/index.html # 启动 /apps/tomcat/bin/catalina.sh start
验证tomcat页面
访问

3. 部署java gateway
说明:java gateway是一个独立于zabbix server和zabbix agent的组件,java gateway可以安装在单独的服务器上,也可以安装在zabbix server或zabbix agent服务器上,前提是端口不要配置冲突。
在zabbix server上安装java gateway
apt install -y zabbix-java-gateway
修改配置
sed -i -e '/^# LISTEN_IP="0.0.0.0"/a LISTEN_IP="0.0.0.0"' \ -e "/^# LISTEN_PORT=10052/a LISTEN_PORT=10052" \ -e "/^# START_POLLERS/a START_POLLERS=50" \ -e "/# TIMEOUT=/a TIMEOUT=30" \ /etc/zabbix/zabbix_java_gateway.conf [root@zabbix-server /]#egrep -v '^#|^$' /etc/zabbix/zabbix_java_gateway.conf LISTEN_IP="0.0.0.0" LISTEN_PORT=10052 PID_FILE="/run/zabbix/zabbix_java_gateway.pid" START_POLLERS=50 TIMEOUT=30
重启服务
systemctl restart zabbix-java-gateway.service
验证端口
[root@zabbix-server /]#lsof -i:10052 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 47205 zabbix 12u IPv6 409025 0t0 TCP *:10052 (LISTEN)
4. 配置zabbix server调用java gateway
sed -i -e "/^# JavaGateway=/a JavaGateway=0.0.0.0" \ -e "/^# JavaGatewayPort=/a JavaGatewayPort=10052" \ -e "/^# StartJavaPollers=/a StartJavaPollers=20" \ /etc/zabbix/zabbix_server.conf [root@zabbix-server /]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_server.conf ListenPort=10051 LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 PidFile=/run/zabbix/zabbix_server.pid SocketDir=/run/zabbix DBHost=10.0.0.22 DBName=zabbix DBUser=zabbix DBPassword=123456 DBPort=3306 JavaGateway=0.0.0.0 #监听地址 JavaGatewayPort=10052 #指定java gateway的服务器监听端口,若是默认端口可不写 StartJavaPollers=20 #启动多少个线程去轮询java gateway SNMPTrapperFile=/var/log/snmptrap/snmptrap.log Timeout=4 AlertScriptsPath=/usr/lib/zabbix/alertscripts ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 LogSlowQueries=3000
重启zabbix server服务
systemctl restart zabbix-server.service
验证java pollers
[root@zabbix ~]#ps -ef|grep java zabbix 11006 1 0 23:39 ? 00:00:00 java -server -Dlogback.configurationFile=/etc/zabbix/zabbix_java_gateway_logback.xml -classpath \ lib:lib/android-json-4.3_r3.1.jar:lib/logback-classic-1.2.9.jar:lib/logback-core-1.2.9.jar:lib/slf4j-api-1.7.32.jar:bin/zabbix-java-gateway-5.0.27.jar \ -Dzabbix.pidFile=/run/zabbix/zabbix_java_gateway.pid -Dzabbix.listenIP=0.0.0.0 -Dzabbix.listenPort=10052 -Dzabbix.startPollers=50 -Dzabbix.timeout=30 \ -Dsun.rmi.transport.tcp.responseTimeout=30000 com.zabbix.gateway.JavaGateway zabbix 11557 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #1 [got 0 values in 0.000010 sec, idle 5 sec] zabbix 11558 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #2 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11559 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #3 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11560 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #4 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11561 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #5 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11562 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #6 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11563 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #7 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11564 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #8 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11565 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #9 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11566 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #10 [got 0 values in 0.000012 sec, idle 5 sec] zabbix 11567 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #11 [got 0 values in 0.000006 sec, idle 5 sec] zabbix 11568 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #12 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11569 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #13 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11570 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #14 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11571 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #15 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11572 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #16 [got 0 values in 0.000006 sec, idle 5 sec] zabbix 11573 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #17 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11574 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #18 [got 0 values in 0.000005 sec, idle 5 sec] zabbix 11575 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #19 [got 0 values in 0.000007 sec, idle 5 sec] zabbix 11576 11535 0 23:51 ? 00:00:00 /usr/sbin/zabbix_server: java poller #20 [got 0 values in 0.000005 sec, idle 5 sec] root 11611 1629 0 23:51 pts/0 00:00:00 grep --color=auto java
5. tomcat开启JMX监控
修改/apps/tomcat/bin/catalina.sh配置
sed -i '1aCATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=10.0.0.32"' \ /apps/tomcat/bin/catalina.sh # 参数说明 CATALINA_OPTS="$CATALINA_OPTS -Dcom.sun.management.jmxremote #启用远程监控JMX -Dcom.sun.management.jmxremote.port=12345 #默认启动JMX端口号 -Dcom.sun.management.jmxremote.authenticate=false #不使用用户名密码 -Dcom.sun.management.jmxremote.ssl=false # 不使用ssl认证 -Djava.rmi.server.hostname=10.0.0.32" # tomcat主机IP地址,非zabbix服务器地址
启动tomcat
[root@tomcat bin]#./catalina.sh start Using CATALINA_BASE: /apps/tomcat Using CATALINA_HOME: /apps/tomcat Using CATALINA_TMPDIR: /apps/tomcat/temp Using JRE_HOME: /usr/local/src/jdk Using CLASSPATH: /apps/tomcat/bin/bootstrap.jar:/apps/tomcat/bin/tomcat-juli.jar Using CATALINA_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=10.0.0.32 Tomcat started.
查看端口
# 8080,12345端口打开 [root@tomcat bin]#netstat -ntl Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:47585 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:34245 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:48329 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:39443 0.0.0.0:* LISTEN tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN tcp6 0 0 :::2049 :::* LISTEN tcp6 0 0 :::58755 :::* LISTEN tcp6 0 0 127.0.0.1:8005 :::* LISTEN tcp6 0 0 :::44139 :::* LISTEN tcp6 0 0 :::111 :::* LISTEN tcp6 0 0 :::8080 :::* LISTEN tcp6 0 0 :::42419 :::* LISTEN tcp6 0 0 :::46453 :::* LISTEN tcp6 0 0 :::41813 :::* LISTEN tcp6 0 0 :::22 :::* LISTEN tcp6 0 0 :::12345 :::* LISTEN tcp6 0 0 ::1:6010 :::* LISTEN tcp6 0 0 :::38427 :::* LISTEN
6. zabbix server添加JMX监控
进入配置--主机--创建主机,添加tomcat信息

关联模板,选择Template OS Linux by Zabbix agent active、Template App Generic Java JMX模板

7. 验证当前JMX状态及数据
验证JMX状态
ZBX、JMX图标为绿色

验证JMX数据

8. JMX监控生产模板使用
导入自定义模板
配置--模板--导入

关联模板
关联上一步导入的监控模板,对之前的JMX模板执行取消链接并清理

验证JMX状态及数据
状态正常

数据正常

9. 配置zabbix agent主动模式注意事项
存在问题:主动模式下监控数据正常,但ZBX图标为灰色未变绿
解决方法:将模板Template OS Linux by Zabbix agent active中的链接模板Template Module Zabbix agent active先取消链接并清理,再添加Template Module Zabbix agent模板。

percona监控mysql
1. 搭建mysql主从(10.0.0.42/52)
master(10.0.0.42)
# 安装mariadb apt install -y mariadb-server=1:10.3.22-1ubuntu1 sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf systemctl restart mariadb # 修改配置 sed -i '/^\[mysqld/a server-id=42\nlog-bin' /etc/mysql/mariadb.conf.d/50-server.cnf #查看 [root@mysql-master ~]#egrep -v '^#|^$' /etc/mysql/mariadb.conf.d/50-server.cnf [server] [mysqld] server-id=42 log-bin user = mysql pid-file = /run/mysqld/mysqld.pid socket = /run/mysqld/mysqld.sock basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql bind-address = 0.0.0.0 query_cache_size = 16M log_error = /var/log/mysql/error.log expire_logs_days = 10 character-set-server = utf8mb4 collation-server = utf8mb4_general_ci [embedded] [mariadb] [mariadb-10.3] #重启数据库 systemctl restart mariadb #创建复制用户 mysql -uroot MariaDB [(none)]> create user 'repluser'@'10.0.0.%'; Query OK, 0 rows affected (0.00 sec) #授权复制用户权限 MariaDB [(none)]> grant replication slave on *.* to 'repluser'@'10.0.0.%'; Query OK, 0 rows affected (0.00 sec) #若mysql已存在数据,可先备份数据 #备份数据 [root@mysql-master ~]# mysqldump --all-databases --single_transaction --flush-logs --master-data=2 \ --lock-tables > /opt/backup.sql #将备份数据复制到slave节点 [root@mysql-master ~]# scp /opt/backup.sql 10.0.0.52:/opt/ #查看二进制文件和位置 [root@mysql-master ~]# mysql MariaDB [(none)]> show master logs; +-------------------+-----------+ | Log_name | File_size | +-------------------+-----------+ | mysqld-bin.000001 | 691 | | mysqld-bin.000002 | 387 | +-------------------+-----------+ 2 rows in set (0.001 sec)
slave(10.0.0.52)
# 安装mariadb apt install -y mariadb-server=1:10.3.22-1ubuntu1 sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mariadb.conf.d/50-server.cnf systemctl restart mariadb # 修改配置 sed -i '/^\[mysqld/a server-id=52\nread-only' /etc/mysql/mariadb.conf.d/50-server.cnf #查看 [root@mysql-slave ~]#egrep -v '^#|^$' /etc/mysql/mariadb.conf.d/50-server.cnf [server] [mysqld] server-id=52 read-only user = mysql pid-file = /run/mysqld/mysqld.pid socket = /run/mysqld/mysqld.sock basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql bind-address = 0.0.0.0 query_cache_size = 16M log_error = /var/log/mysql/error.log expire_logs_days = 10 character-set-server = utf8mb4 collation-server = utf8mb4_general_ci [embedded] [mariadb] [mariadb-10.3] #重启数据库 systemctl restart mariadb # 导入master节点备份数据 [root@mysql-slave ~]#mysql < /opt/backup.sql #根据master信息开启同步设置 #其中MASTER_LOG_FILE、MASTER_LOG_POS对应master节点中Log_name、File_size(可通过命令show master logs查看) [root@mysql-slave ~]# mysql MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.42', MASTER_USER='repluser', MASTER_PASSWORD='', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000001', MASTER_LOG_POS=691, MASTER_CONNECT_RETRY=10; #开启slave MariaDB [(none)]> start slave; #显示状态信息 MariaDB [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.42 Master_User: repluser Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysqld-bin.000002 Read_Master_Log_Pos: 387 Relay_Log_File: mysqld-relay-bin.000003 Relay_Log_Pos: 687 Relay_Master_Log_File: mysqld-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes ...... Master_Server_Id: 42
2. master安装zabbix agent
安装agent
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update apt install -y zabbix-agent
配置
sed -i -e "/^Server=127.0.0.1/c Server=10.0.0.12" \ -e "/^ServerActive=127.0.0.1/c ServerActive=10.0.0.12" \ -e "2aStartAgents=3" \ -e "2aListenPort=10050" \ -e "/^Hostname=/c Hostname=10.0.0.42" \ /etc/zabbix/zabbix_agentd.conf [root@tomcat ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf StartAgents=3 #被动状态是默认启动的进程数,为0不监听任何端口 ListenPort=10050 #监听端口,默认值 PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.12 #zabbix-server的IP或Proxy的IP ServerActive=10.0.0.12 #主动模式下的zabbix-server的IP或Proxy的IP Hostname=10.0.0.42 #区分大小写且在zabbix server中值唯一,默认填本机IP Include=/etc/zabbix/zabbix_agentd.d/*.conf #在文件末尾新增子配置文件路径
重启
systemctl restart zabbix-agent
3. 安装percona
1)修改zabbix agent启动用户为root
# 修agent改配置文件 sed -i -e "/^# AllowRoot=0/a AllowRoot=1" \ #允许root启动 -e "/^# User=/a User=root" \ /etc/zabbix/zabbix_agentd.conf # 修改服务启动文件 sed -i -e "/^User=/c User=root" \ -e "/^Group=/c Group=root" \ /lib/systemd/system/zabbix-agent.service # 重启服务 systemctl daemon-reload systemctl restart zabbix-agent # 查看 [root@mysql-master ~]#ps -ef|grep zabbix root 8754 1 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf root 8763 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: collector [idle 1 sec] root 8764 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection] root 8765 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection] root 8766 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection] root 8767 8754 0 17:47 ? 00:00:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec] root 8775 8351 0 17:47 pts/2 00:00:00 grep --color=auto zabbix
2)安装percona
官网文档:
下载地址:,无对应ubuntu版本,选择最新版本即可
# 下载percona软件包并安装, wget https://downloads.percona.com/downloads/percona-monitoring-plugins/percona-monitoring-plugins-1.1.8/binary/debian/artful/x86_64/percona-zabbix-templates_1.1.8-1.artful_all.deb dpkg -i percona-zabbix-templates_1.1.8-1.artful_all.deb cp /var/lib/zabbix/percona/templates/userparameter_percona_mysql.conf /etc/zabbix/zabbix_agentd.d/ systemctl restart zabbix-agent.service # 安装PHP环境,注意:percona与php7.2不兼容 apt install -y php7.4 php7.4-mysql # 创建mysql认证 cat > /var/lib/zabbix/percona/scripts/ss_get_mysql_stats.php.cnf <<EOF <?php \$mysql_user = 'root'; \$mysql_pass = ''; EOF # 测试脚本获取数据 [root@mysql-master opt]#/var/lib/zabbix/percona/scripts/get_mysql_stats_wrapper.sh gg 75
3)导入percona模板
可使用该模板 https://files.cnblogs.com/files/blogs/744193/PerconaMySQLServer.xml?t=1657952009

4)添加主机

5)关联模板

6)验证mysql监控
percona模板中的监控项默认是五分钟收集一次监控数据,会结合脚本检查agent上报错数据文件的时间戳是否超过五分钟,脚本位置在
# zabbix server测试 [root@zabbix-server ~]#zabbix_get -s 10.0.0.42 -p 10050 -k MySQL.Key-read-requests 75

自定义脚本监控mysql
1. mysql-slave安装agent
#! /bin/bash zabbix_server=10.0.0.12 IP=`hostname -i|awk '{print $1}'` cd /opt wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update sleep 1 apt install -y zabbix-agent sleep 1 sed -i -e "/^Server=127.0.0.1/c Server=$zabbix_server" \ -e "/^ServerActive=127.0.0.1/c ServerActive=$zabbix_server" \ -e "2aStartAgents=3" \ -e "2aListenPort=10050" \ -e "/^Hostname=/c Hostname=$IP" \ -e "/^# AllowRoot=0/a AllowRoot=1" \ -e "/^# User=/a User=root" \ /etc/zabbix/zabbix_agentd.conf sed -i -e "/^User=/c User=root" \ -e "/^Group=/c Group=root" \ /lib/systemd/system/zabbix-agent.service systemctl daemon-reload systemctl restart zabbix-agent systemctl is-active zabbix-agent.service if [ $? -eq 0 ];then echo 'install successed!' else echo 'install failed!' fi
2. 脚本内容
#! /bin/bash Seconds_Behind_Master(){ NUM=`mysql -uroot -e "show slave status\G;"|grep "Seconds_Behind_Master:"|awk '{print $2}'` echo $NUM } master_slave_check(){ NUM1=`mysql -uroot -e "show slave status\G;"|grep "Slave_IO_Running:"|awk '{print $2}'` NUM2=`mysql -uroot -e "show slave status\G;"|grep "Slave_SQL_Running:"|awk '{print $2}'` if [[ $NUM1 == "Yes" && $NUM2 == "Yes" ]];then echo 50 else echo 100 fi } main(){ case $1 in Seconds_Behind_Master) Seconds_Behind_Master; ;; master_slave_check) master_slave_check; ;; esac } main $1
测试
[root@mysql-slave ~]#./mysql_monitor.sh master_slave_check 50
3. 自定义监控项配置
#cat /etc/zabbix/zabbix_agentd.d/all.conf UserParameter=mysql_monitor[*],/etc/zabbix/zabbix_agentd.d/mysql_monitor.sh "$1" systemctl restart zabbix-agent # zabbix server测试 [root@zabbix-server ~]#zabbix_get -s 10.0.0.52 -p 10050 -k "mysql_monitor[master_slave_check]" 50
4. 自定义模板
-
创建模板
配置--模板--创建模板
-
添加监控项
配置--模板,选择之前创建的模板mysql_monitor
添加监控项信息
-
添加触发器
-
添加图形
5. 关联主机
-
添加主机
-
关联模板
-
验证监控数据
五、自定义监控项,实现故障邮件通知;
实现故障自治愈
1. agent开启远程执行命令权限
[root@mysql-slave ~]#grep '^[a-zA-Z]' /etc/zabbix/zabbix_agentd.conf StartAgents=3 ListenPort=10050 PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 EnableRemoteCommands=1 #开启远程执行命令功能 Server=10.0.0.12 ServerActive=10.0.0.12 Hostname=10.0.0.52 AllowRoot=1 User=root UnsafeUserParameters=1 #允许远程执行命令的时候使用不安全的参数(特殊的字符串) Include=/etc/zabbix/zabbix_agentd.d/*.conf
2. agent添加zabbix用户授权
[root@mysql-slave ~]## vim /etc/sudoers ...... root ALL=(ALL) ALL zabbix ALL=NOPASSWD:ALL #授权zabbix用户执行特殊命令不再需要密码,比如sudo命令
重启服务
systemctl restart zabbix-agent
3. 创建动作
添加动作名称和执行条件

添加具体操作指令

查看添加动作

实现邮件通知
163邮箱配置参考:
1. 邮箱开启SMTP
登录个人邮箱,进入设置,开启SMTP功能

发送短信

获取授权码

2. 创建报警媒介类型
管理--报警媒介类型--创建

3. 给用户添加报警媒介
选择Admin用户

选择报警媒介,点击添加

类型选择前面创建的报警媒介,收件人选择要发送信息的对象

更新报警媒介

4. 创建动作
-
在自治愈动作上添加发送邮件操作
-
添加故障发生时、故障恢复后的操作
发送故障时的邮件通知内容
恢复操作添加的发送邮件通知内容
最终动作操作步骤内容
验证故障告警邮件及恢复邮件通知功能
1. 停止mysql slave状态同步
mysql -e "stop slave;"
查看slave状态
[root@mysql-slave ~]#mysql -e "show slave status\G;" *************************** 1. row *************************** Slave_IO_State: Master_Host: 10.0.0.42 Master_User: repluser Master_Port: 3306 Connect_Retry: 10 Master_Log_File: mysqld-bin.000002 Read_Master_Log_Pos: 387 Relay_Log_File: mysqld-relay-bin.000008 Relay_Log_Pos: 556 Relay_Master_Log_File: mysqld-bin.000002 Slave_IO_Running: No Slave_SQL_Running: No
2. zabbix自动执行恢复指令及发送通知邮件

3. 登录个人邮箱,查看告警邮件信息


六、尝试使用zabbix proxy实现跨网段分布式监控。
安装zabbix-server
参考第四章节安装zabbix5.0部分。
安装zabbix-proxy
安装zabbix仓库
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update
安装zabbix-proxy
apt install -y zabbix-proxy-mysql
安装本地数据库
apt install -y mysql-sever sed -i '/bind-address/c bind-address = 0.0.0.0' /etc/mysql/mysql.conf.d/mysqld.cnf systemctl restart mysql
创建zabbix_proxy数据库,zabbix用户并授权
[root@zabbix-proxy ~]#mysql -uroot mysql> create database zabbix_proxy character set utf8 collate utf8_bin; mysql> create user zabbix@localhost identified by '123456'; mysql> grant all privileges on zabbix_proxy.* to zabbix@localhost; mysql> quit;
导入初始架构和数据
zcat /usr/share/doc/zabbix-proxy-mysql/schema.sql.gz | mysql -uzabbix -p123456 zabbix_proxy
编辑配置文件
sed -i "/# DBPassword=/aDBPassword=123456" /etc/zabbix/zabbix_proxy.conf
重启服务
systemctl restart zabbix-proxy systemctl enable zabbix-proxy
安装zabbix-agent
安装agent
wget https://repo.zabbix.com/zabbix/5.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bfocal_all.deb dpkg -i zabbix-release_5.0-1+focal_all.deb apt update apt install -y zabbix-agent
修改配置
[root@zabbix-agent1 ~]#grep "^[A-Z]" /etc/zabbix/zabbix_agentd.conf PidFile=/run/zabbix/zabbix_agentd.pid LogFile=/var/log/zabbix/zabbix_agentd.log LogFileSize=0 Server=10.0.0.51,10.0.0.53 # zabbix server与proxy地址 ServerActive=10.0.0.53 # 主动模式zabbix proxy地址 Hostname=10.0.0.54 Include=/etc/zabbix/zabbix_agentd.d/*.conf
重启服务
systemctl restart zabbix-agent.service
配置zabbix proxy主动模式
zabbix proxy配置说明
ProxyMode=0 #O为主动,1为被动 server=10.0.0.51 #zabbix server服务器的地址或主机名 Hostname=10.0.0.53 #代理服务器名称,需要与zabbix server添加代理时候的代理程序名称是一致的 ListenPort=10051 #zabbix proxy监听端口 LogFile=/tmp/zabbix_proxy.1og #日志文件 Enab1eRemoteCommands=1 #允许zabbix server执行远程命令 DBHost=127.0.0.1 #数据库服务器地址 DBName=zabbix_proxy #使用的数据库名称 DBUser=zabbix #连接数据库的用户名称 DBPassword=123456 #数据库用户密码 DBPort=3306 #数据库端口 ProxyLocalBuffer=720 #已经提交到zabbix server的数据保留时间 ProxyOfflineBuffer=720 #未提交到zabbix serve r的时间保留时间 HeartbeatFrequency=60 #心跳间隔检测时间,默认60秒,范围0-3600秒,被动模式不使用 ConfigFrequency=5 #间隔多少秒从zabbix server获取监控项信息 DataSenderFrequency=5 #数据发送时间间隔,默认为1秒,范围为1-3600秒,被动模式不使用 StartPollers=20 #启动的数据采集器数量 JavaGateway=172.31.0.104 #java gateway服务器地址,当需要监控java的时候必须配置否则监控不到数据 JavaGatewayPort=10052 #java gatewa服务端口 StartJavaPollers=20 #启动多少个线程采集数据 CacheSize=2G #保存监控项而占用的最大内存 HistoryCacheSize=2G #保存监控历史数据占用的最大内存 HistoryIndexcachesize=128M #历史索引缓存的大小 Timeout=30 #监控项超时时间,单位为秒 LogSlowQueries=3000 #毫秒,多久的数据库查询会被记录到日志
配置zabbix proxy
[root@zabbix-proxy ~]#grep '^[A-Z]' /etc/zabbix/zabbix_proxy.conf ProxyMode=0 Server=10.0.0.51 Hostname=10.0.0.53 ListenPort=10051 LogFile=/var/log/zabbix/zabbix_proxy.log LogFileSize=0 EnableRemoteCommands=1 PidFile=/run/zabbix/zabbix_proxy.pid SocketDir=/run/zabbix DBHost=127.0.0.1 DBName=zabbix_proxy DBUser=zabbix DBPassword=123456 DBPort=3306 ProxyLocalBuffer=720 ProxyOfflineBuffer=720 HeartbeatFrequency=60 ConfigFrequency=5 DataSenderFrequency=5 StartPollers=20 SNMPTrapperFile=/var/log/snmptrap/snmptrap.log CacheSize=256M HistoryCacheSize=256M HistoryIndexCacheSize=128M Timeout=30 ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 LogSlowQueries=3000 StatsAllowedIP=127.0.0.1
重启zabbix proxy
[root@zabbix-proxy ~]#systemctl restart zabbix-proxy.service
web页面添加主动代理
进入管理--agent代理程序,添加代理程序名称。
注意:该名称要与proxy配置中的Hostname保持一致

添加主机时,zabbix agent使用主动代理

验证状态
验证当前主机状态

验证主机监控数据及图形

【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?