docker部署zabbix 6.0高可用集群实验
0 实验环境
虚拟机,postgresql本地部署,zabbix server及nginx容器部署
1 postgresql
参看前作 《postgresql + timescaledb离线安装笔记》完成部署,对外端口tcp 5432,账号zabbix,密码123
2 zabbix server
2.1 拉取镜像
docker pull zabbix/zabbix-server-pgsql:6.0-alpine-latest
2.2 创建网络
docker network create --subnet 172.20.0.0/16 --ip-range 172.20.240.0/20 zabbix-net
2.3 启动server容器
启动两个容器分别作为主备server节点,分别使用本地tcp 10051和10052端口
2.3.1 server主节点
docker run --name zabbix-server-pgsql-1 -t \
-e DB_SERVER_HOST="172.17.0.1" -e DB_SERVER_PORT="5432" \
-e POSTGRES_USER="zabbix" -e POSTGRES_PASSWORD='123' -e POSTGRES_DB="zabbix" \
-e ZBX_CACHESIZE="128M" -e ZBX_HISTORYCACHESIZE="32M" -e ZBX_HISTORYINDEXCACHESIZE="8M" -e ZBX_TRENDCACHESIZE="8M" -e ZBX_VALUECACHESIZE="64M" \
-e ZBX_LOGSLOWQUERIES="3000" -e ZBX_STARTPOLLERS="5" -e ZBX_STARTPREPROCESSORS="10" -e ZBX_STARTPOLLERSUNREACHABLE="5" -e ZBX_STARTESCALATORS="5" -e ZBX_STARTDBSYNCERS="5" \
-e ZBX_HANODENAME="server-01" -e ZBX_NODEADDRESS="172.20.240.1" \
-p 10051:10051 --network=zabbix-net --restart unless-stopped \
-d zabbix/zabbix-server-pgsql:6.0-alpine-latest
2.3.2 server备节点
docker run --name zabbix-server-pgsql-2 -t \
-e DB_SERVER_HOST="172.17.0.1" -e DB_SERVER_PORT="5432" \
-e POSTGRES_USER="zabbix" -e POSTGRES_PASSWORD='123' -e POSTGRES_DB="zabbix" \
-e ZBX_CACHESIZE="128M" -e ZBX_HISTORYCACHESIZE="32M" -e ZBX_HISTORYINDEXCACHESIZE="8M" -e ZBX_TRENDCACHESIZE="8M" -e ZBX_VALUECACHESIZE="64M" \
-e ZBX_LOGSLOWQUERIES="3000" -e ZBX_STARTPOLLERS="5" -e ZBX_STARTPREPROCESSORS="10" -e ZBX_STARTPOLLERSUNREACHABLE="5" -e ZBX_STARTESCALATORS="5" -e ZBX_STARTDBSYNCERS="5" \
-e ZBX_HANODENAME="server-02" -e ZBX_NODEADDRESS="172.20.240.2" \
-p 10052:10051 --network=zabbix-net --restart unless-stopped \
-d zabbix/zabbix-server-pgsql:6.0-alpine-latest
2.3.3 调试命令(进入active容器,即zabbix-server-pgsql-1)
显示Server集群状态:
zabbix_server -R ha_status
结果如下
Failover delay: 60 seconds
Cluster status:
# ID Name Address Status Last Access
1. clkc8ouam00016nrvjcuugxsr server-01 172.20.240.1:10051 active 4s
2. clkc8ozws00016nmr5dyv7qd4 server-02 172.20.240.2:10051 standby 2s
可删除备节点
zabbix_server -R ha_remove_node=clkc8ozws00016nmr5dyv7qd4
3 web
3.1 拉取镜像
docker pull zabbix/zabbix-web-nginx-pgsql:6.0-alpine-latest
3.1 启动容器
注意ZBX_SERVER_HOST和ZBX_SERVER_PORT一定要设置为空
docker run --name zabbix-web-nginx-pgsql -t \
-e ZBX_SERVER_HOST="" -e ZBX_SERVER_PORT="" \
-e DB_SERVER_HOST="172.17.0.1" -e DB_SERVER_PORT="5432" \
-e POSTGRES_USER="zabbix" -e POSTGRES_PASSWORD="123" -e POSTGRES_DB="zabbix" \
-e ZBX_SERVER_NAME="zabbix-test" \
-e PHP_TZ="Asia/Shanghai" \
-p 8080:8080 --network=zabbix-net --restart unless-stopped \
-d zabbix/zabbix-web-nginx-pgsql:6.0-alpine-latest
4 测试
4.1 原始状态
在System information页面下可以看到,此时server-01是主用节点
4.2 切换
将zabbix-server-pgsql-1容器暂停
docker stop zabbix-server-pgsql-1
此时可以看到server-01变为stopped状态,server-02成为Active状态
再次启动zabbix-server-pgsql-1容器,则server-01变回Standby状态
5 其他笔记
5.1 zabbix-agent2监控docker
如果出现以下报错cannot fetch data: Get http://1.28/info: dial unix /var/run/docker.sock: connect: permission denied
,可能是因为zabbix账号对/var/run/docker.sock不具备权限,我的修复命令如下:
chgrp docker /var/run/docker.sock
usermod -aG docker zabbix
systemctl restart zabbix-agent2