一、架构介绍
MySQL MGR实现了MySQL服务的高可用、高扩展、高可靠,但在客户端只能同时连接一台服务实例,在master切换后,客户端无法感知并自动切换,所以需要搭配keepalived实现MySQL MGR集群在客户端的高可用。
二、架构搭建
架构设计
MySQL MGR一主一从
机器名称
|
IP
|
服务器角色
|
备注
|
localhost
|
192.168.11.13
|
keepalived master
mysql mgr master
|
keepalived 2.2.7 + mysql 5.7.35
|
localhost
|
192.168.11.14
|
keepalived backup
mysql mgr slave
|
keepalived 2.2.7 + mysql 5.7.35
|
高可用自动切换策略
- 将所有keepalived节点配置为backup角色,仅根据权重竞争master
- 所有节点的权重分为基础权重与脚本权重,其中脚本权重根据脚本执行结果动态增加,且每次脚本检查后都是基于基础权重增加,不会累计
- 脚本权重设置原则:最小基础权重 + 脚本权重 > 最大基础权重
- 如果mgr角色为master,则需要竞争keepalived master,所以不管keepalived角色如何,都要控制keepalived增加权重
- 如果mgr角色为backup,则控制keepalived不能增加权重,如果此时keepalived角色为master,则需要转让其master角色,即重启keepalived,触发VIP漂移
特别关注:keepalived角色竞争,只有在节点启动(加入集群)、master离线时执行
根据当前mgr角色 与 keepalived角色,需要执行的操作如下
mgr master
|
mgr slave
|
|
keepalived master
|
增加权重
|
保持权重,触发VIP漂移(即重启keepalived)
|
keepalived backup
|
增加权重
|
保持权重
|
keepalived实现MySQL MGR高可用,包括以下步骤
- mysql mgr集群搭建
- keepalived安装
- keepalived配置
- keepalived运行测试
架构实现
1.mysql mgr集群搭建
2.keepalived安装
3.keepalived配置
本文设计keepalived + mysql mgr流程如下
3.1 keepalived服务配置(keepalived.conf)
本方案实现keepalived根据mgr集群实际节点角色自动漂移VIP,所以将在所有keepalived节点上做相同配置(除各自基础权重)
特别关注:配置文件中指定的所有路径,请在启动前确保已存在
! Configuration File for keepalived ###使用说明开始### #keepalived实现mysql mgr 集群高可用,流程如下 #0.所有keepalived节点配置为backup,非抢占模式,基础权重建议为 70、80、90,脚本权重为30,即最小权重+脚本权重>最大权重 #1.如果mysql服务异常,启动服务 #2.如果mysql为master,则返回成功0,keepalived增加权重 #3.如果mysql为slave,则返回失败100,keepalived不增加权重 #4.如果mysql为slave,且keepalived为master,则keepalived切换 ###使用说明结束### global_defs { notification_email { } #keepalived机器标识,无特殊作用,一般为机器名 router_id ha_mysql } #检查脚本,可以用来关联业务,脚本执行结果决定是否准备切换 vrrp_script ha_mysql{ #脚本路径,脚本执行是否成功,根据脚本的退出码确认,默认为0,即exit 0 script "/usr/local/ha_mysql/ha_mysql.sh" #脚本检测周期,单位秒 interval 2 ##权重策略是:根据脚本执行结果计算权重,然后触发keepalived重新选举 #当weight > 0时:脚本执行成功了 Priority + Weight,执行失败 Priority #当weight < 0时:脚本执行成功了 Priority 执行失败 Priority + Weight weight 30 } #VRRP协议配置 vrrp_instance VI_1 { #集群初始状态统一配置为 BACKUP,当至少2台keepalived启动后,将根据priority重新竞选角色 state BACKUP interface enp0s3 #虚拟路由id,同一个集群中的keepalived设置一致 virtual_router_id 100 #优先级决定最终的master角色 priority 70 #不抢占,即异常恢复后,不立即抢占master角色 nopreempt #主备之间通信检查的时间间隔,单位秒 advert_int 1 authentication { #keepalived之间认证类型为密码 auth_type PASS auth_pass 1234 } #虚拟IP池 virtual_ipaddress { #VIP地址,一行一个,格式为 <IP地址>/<掩码> brd <广播地址> dev <网卡设备> scope <范围如global> label <网卡别名> 192.168.11.200/24 } #检查脚本,与vrrp_script对应 track_script{ ha_mysql } }
3.2 mysql业务检测脚本
ha_mysql.sh脚本
配置脚本可执行权限
[root@localhost ~]# chmod +x /usr/local/ha_mysql/ha_mysql.sh
脚本内容,自动检测mysql mgr集群角色,以及keepalived角色,实现自动将vip漂移到mgr master
#!/bin/bash ###使用说明开始### #配合keepalived实现mysql mgr 集群高可用,流程如下 #1.如果mysql服务异常,启动服务 #2.如果mysql为master,则返回成功0,keepalived增加权重 #3.如果mysql为slave,则返回失败100,keepalived不增加权重 #4.如果mysql为slave,且keepalived为master,则keepalived切换 ###使用说明结束### #监控日志 source /etc/profile vip="192.168.11.200" monitorLogPath=/usr/local/ha_mysql/monitor.log touch $monitorLogPath #格式化日期时间 function getDatetime(){ local cur=`date "+%Y-%m-%d %H:%M:%S"` echo $cur } #获取local IP function getLocalIp(){ echo `ifconfig enp0s3 | grep -w "inet" | awk '{print $2}'` } #获取master IP function getMasterIp(){ echo `mysql -uroot -pxxxxxxx -e "SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_ID = (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME= 'group_replication_primary_member')" | awk 'NR==2{print($3)}'` } #检测keepalived是否master function checkKeepalivedMaster(){ local num=`ip a | grep "${vip}" | wc -l` if [ $num -eq 0 ];then echo 0 else echo 1 fi } #检测mysql进程 function checkMysqlProcess(){ #检测mysql服务,可以使用进程,也可以使用端口 #此处检测mysql进程 local num=`ps -C mysqld --no-header | wc -l` #local num=`netstat -lntup | grep -w 3307 | wc -l` if [ $num -eq 0 ];then echo 0 else echo 1 fi } #检测MGR Master function checkMgrMaster(){ #获取本机IP localIp=`getLocalIp` #获取mgr master ip masterIp=`getMasterIp` #判断当前实例是否为master if [ "${localIp}" = "${masterIp}" ];then echo 1 else echo 0 fi } #定义变量,mysql是否运行 run=`checkMysqlProcess` #1.如果mysql服务异常,启动服务 if [ $run -eq 0 ];then #mysql异常,先启动mysql echo `getDatetime` "mysql error,start mysql" >> $monitorLogPath service mysqld start run=`checkMysqlProcess` if [ $run -eq 0 ];then #启动mysql失败,停止keepalived服务,停止热备,触发keepalived切换 echo `getDatetime` "start mysql failed" >> $monitorLogPath fi #else # echo "mysql is alive" fi mgrMaster=`checkMgrMaster` keepalivedMasger=`checkKeepalivedMaster` if [ ${mgrMaster} -eq 1 ];then #2.如果mysql为master,则返回成功0,keepalived增加权重 #echo `getDatetime` "mgr master,add priority" >> $monitorLogPath exit 0 else #3.如果mysql为slave,则返回失败100,keepalived不增加权重 if [ ${keepalivedMasger} -eq 1 ];then #4.如果mysql为slave,且keepalived为master,则keepalived切换(服务重启) echo `getDatetime` "mgr slave,keepalived master,restart keepalived" >> $monitorLogPath service keepalived restart #keepalived服务重启,本脚本返回结果为重启前的keepalived进程,此时无需exit else #echo `getDatetime` "mgr slave,keepalived slave,keep priority" >> $monitorLogPath exit 100 fi fi
三、架构测试
1.确认mgr master
mysql> select * from performance_schema.replication_group_members ; +---------------------------+--------------------------------------+---------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+---------------+-------------+--------------+ | group_replication_applier | 33e3b5a5-7367-11ec-9299-0800272dd186 | 192.168.11.14 | 3307 | ONLINE | | group_replication_applier | fdbf6d8c-7365-11ec-bf67-08002746658b | 192.168.11.13 | 3307 | ONLINE | +---------------------------+--------------------------------------+---------------+-------------+--------------+ 2 rows in set (0.00 sec) mysql> select * from performance_schema.global_status where variable_name like '%group%'; +----------------------------------+--------------------------------------+ | VARIABLE_NAME | VARIABLE_VALUE | +----------------------------------+--------------------------------------+ | group_replication_primary_member | fdbf6d8c-7365-11ec-bf67-08002746658b | +----------------------------------+--------------------------------------+ 1 row in set (0.01 sec)
2.以上确认11.13服务器为mgr master,在11.13确认VIP已经绑定
[root@localhost ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:46:65:8b brd ff:ff:ff:ff:ff:ff inet 192.168.11.13/24 brd 192.168.11.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.11.200/24 scope global secondary enp0s3 valid_lft forever preferred_lft forever inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::d071:eb89:2f12:8e56/64 scope link noprefixroute valid_lft forever preferred_lft forever [root@localhost ~]#
3.模拟11.13 MySQL服务宕机
[root@localhost ~]# service mysqld stop
Redirecting to /bin/systemctl stop mysqld.service
4.确认 mgr master 已经改为 11.14
此时,由于mysql健康检查脚本中已经自动启动mysql服务器,所以正常情况下无需运维人员介入
mysql> select * from performance_schema.replication_group_members ; +---------------------------+--------------------------------------+---------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+---------------+-------------+--------------+ | group_replication_applier | 33e3b5a5-7367-11ec-9299-0800272dd186 | 192.168.11.14 | 3307 | ONLINE | | group_replication_applier | fdbf6d8c-7365-11ec-bf67-08002746658b | 192.168.11.13 | 3307 | ONLINE | +---------------------------+--------------------------------------+---------------+-------------+--------------+ 2 rows in set (0.00 sec) mysql> select * from performance_schema.global_status where variable_name like '%group%'; +----------------------------------+--------------------------------------+ | VARIABLE_NAME | VARIABLE_VALUE | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 33e3b5a5-7367-11ec-9299-0800272dd186 | +----------------------------------+--------------------------------------+ 1 row in set (0.00 sec)
5.在11.13上确认VIP已经失效
[root@localhost ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:46:65:8b brd ff:ff:ff:ff:ff:ff inet 192.168.11.13/24 brd 192.168.11.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::d071:eb89:2f12:8e56/64 scope link noprefixroute valid_lft forever preferred_lft forever [root@localhost ~]#
6.在11.14上确认VIP已经漂移
[root@localhost ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:2d:d1:86 brd ff:ff:ff:ff:ff:ff inet 192.168.11.14/24 brd 192.168.11.255 scope global noprefixroute enp0s3 valid_lft forever preferred_lft forever inet 192.168.11.200/24 scope global secondary enp0s3 valid_lft forever preferred_lft forever inet6 fe80::c08b:489f:1587:3bb6/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::a33a:d49b:da44:119a/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever inet6 fe80::d071:eb89:2f12:8e56/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft forever
四、后记
关于keepalived VIP漂移 与 mgr 角色转换
MySQL MGR master故障后,slave会自动升级为 master,此时故障服务器恢复后重新加入集群,自动为 slave角色,处于只读状态。如果此时VIP重新漂移至该节点,则客户端会引起事务提交异常。
为了避免出现此问题,本文中将所有 keepalived 实例设置为 backup 角色,同时添加了 nopreempt 配置项,即设置为 非抢占 模式,如此,MySQL服务重启 与 keepalived 服务重启后,都不会主动竞争 master,客户端业务保持正常。
特别关注:系统默认启用了SELinux内核模块(安全子系统),所以在服务绑定/监听某些端口时,提示无访问权限,此时需要禁用SELinux,修改 /etc/selinux/config 文件,设置SELINUX=disabled
Can't start server: Bind on TCP/IP port: Permission denied
特别关注:selinux设置完成需要重启生效,如果当前不方便重启,可以执行 setenforce 0 临时关闭selinux,下次重启是配置再生效
特别关注:系统默认启用了防火墙,请在启动服务前关闭防火墙,或在防火墙中添加服务端口