MySQL集群搭建-MHA高可用架构

1 概述

1.1 MHA 简介

MHA - Master High Availability 是由 Perl 实现的一款高可用程序,出现故障时,MHA 以最小的停机时间(通常10-30秒)执行 master 的故障转移以及 slave 的升级。MHA 可防止复制一致性问题,并且易于安装,不需要改变现有部署。

MHA 由MHA managerMHA node组成, MHA manager是一个监控管理程序,用于监控MySQL master状态; MHA node是具有故障转移的工具脚本,如解析 MySQL 二进制/中继日志,传输应用事件到SlaveMHA node在每个MySQL服务器上运行。

MHA manager调用MHA node工具脚本的方式是SSH到主机上然后执行命令,所以各节点需要做等效验证。

1.2 MHA 怎么保证数据不丢失

Master宕机后,MHA会尝试保存宕机Master的二进制日志,然后自动判断MySQL集群中哪个实例的中继日志是最新的,并将有最新日志的实例的差异日志传到其他实例补齐,从而实现所有实例数据一致。然后把宕机Master的二进制日志应用到选定节点,并提升为 Master

具体流程如下:

  1. 尝试从宕机Master中保存二进制日志
  2. 找到含有最新中继日志的Slave
  3. 把最新中继日志应用到其他实例,实现各实例数据一致
  4. 应用从Master保存的二进制日志事件
  5. 提升一个SlaveMaster
  6. 其他Slave向该新Master同步

从切换流程流程可以看到,如果宕机Master主机无法SSH登录,那么第一步就没办法实现,对于MySQL5.5以前的版本,数据还是有丢失的风险。对于5.5后的版本,开启半同步复制后,真正有助于避免数据丢失,半同步复制保证至少一个 (不是所有)slave 在 master 提交时接收到二进制日志事件。因此,对于可以处理一致性问题的MHA 可以实现"几乎没有数据丢失"和"从属一致性"。

1.3 MHA 优点和限制

优点

  • 开源,用Perl编写
  • 方案成熟,故障切换时,MHA会做日志补齐操作,尽可能减少数据丢失,保证数据一
  • 部署不需要改变现有架构

限制

  • 各个节点要打通SSH信任,有一定的安全隐患
  • 没有 Slave 的高可用
  • 自带的脚本不足,例如虚IP配置需要自己写命令或者依赖其他软件
  • 需要手动清理中继日志

1.4 MHA 常用两种复制配置

单 master,多 slave

        M(RW)
        |
+-------+-------+
S1(R)  S2(R)   S3(R)

这种复制方式非常常见,当Master宕机时,MHA会选一个日志最新的主机升级为Master, 如果不希望个节点成为Master,把no_master设为1就可以。

多 master, 多 slave

        M(RW)----M2(R, candidate_master=1)
        |
+-------+-------+
S1(R)          S2(R)

双主结构也是常见的复制模式,如果当前Master崩溃, MHA会选择只读Master成为新的Master

2 数据库环境准备

本次演示使用复制方式是主主从,主主从数据库搭建方式参考以前文章

2.1 节点信息

IP系统端口MySQL版本节点读写说明
10.0.0.247 Centos6.5 3306 5.7.9 Master 读写 主节点
10.0.0.248 Centos6.5 3306 5.7.9 Standby 只读,可切换为读写 备主节点
10.0.0.249 Centos6.5 3306 5.7.9 Slave 只读 从节点
10.0.0.24 Centos6.5 - - manager - MHA Manager
10.0.0.237 - - - - - VIP

2.2 架构图

2.3 参考配置

Master1

[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock

[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2473306

default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0

auto_increment_offset = 1
auto_increment_increment = 2

#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log

#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%

#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1

Master2

[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock

[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2483306

default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0

auto_increment_offset = 2
auto_increment_increment = 2

#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log

#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%

#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1

Slave

[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock

[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2493306

default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0

read_only=1

#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log

#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%

#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1

3 安装配置 MHA

3.1 下载 MHA

进入 MHA 下载页面 Downloads, 下载ManagerNode节点安装包,由于我的服务器是centos6,所以下载了MHA Manager 0.56 rpm RHEL6MHA Node 0.56 rpm RHEL6

3.2 安装 MHA

Node安装

在所有主机(包括Manager)上执行

# 安装依赖
yum install perl perl-devel perl-DBD-MySQL
# 安装 node 工具
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

Manager安装

在 Manager 主机上执行

# 安装依赖
yum install -y perl perl-devel perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
# 安装 manager
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm

MHA Installation

3.3 创建 MHA 管理用户

管理用户需要执行一些数据库管理命令包括STOP SLAVE, CHANGE MASTER, RESET SLAVE

create user mha_manager@'%' identified by 'mha_manager';
grant all on *.* to mha_manager@'%';
flush privileges;

3.4 增加 MySQL 用户 sudo 权限

配置 VIP 需要有 sudo 权限

打开/etc/sudoers文件, 增加一条

root    ALL=(ALL)       ALL
# 这个是增加的
mysql   ALL=(ALL)       NOPASSWD: ALL

然后把Defaults requiretty注释掉

# Defaults    requiretty

3.5 配置各主机免密码登陆

所有主机执行

# 进入 mysql 用户
su - mysql

# 生成密钥对, 执行命令,然后按回车
ssh-keygen -t rsa

# 复制公钥到相应主机
ssh-copy-id mysql@10.0.0.247
ssh-copy-id mysql@10.0.0.248
ssh-copy-id mysql@10.0.0.249
ssh-copy-id mysql@10.0.1.24

3.6 配置 Manager

新建/etc/masterha目录,我们把配置文件放到这里

mkdir /etc/masterha

创建配置文件/etc/masterha/app1.cnf, 写上配置

[server default]
manager_workdir=/etc/masterha                  # 设置 manager 的工作目录, 可以自己调整
manager_log=/etc/masterha/manager.log          # 设置 manager 的日志文件
master_binlog_dir=/data/mysql_log/test_db      # 设置 master binlog 的日志的位置
master_ip_failover_script= /etc/masterha/script/master_ip_failover            # 设置自动 failover 时的切换脚本, 脚本参考附件
master_ip_online_change_script= /etc/masterha/script/master_ip_online_change  # 设置手动切换时执行的切换脚本, 脚本参考附件

user=mha_manager            # 设置管理用户, 用来监控、配置 MySQL(STOP SLAVE, CHANGE MASTER, RESET SLAVE), 默认为 root
password=mha_manager        # 设置管理用户密码

repl_user=repl              # 设置复制环境中的复制用户名
repl_password=repl          # 设置复制用户的密码

ping_interval=1             # 发送 ping 包的时间间隔,三次没有回应就自动进行 failover
remote_workdir=/tmp         # 设置远端 MySQL 的工作目录

report_script=/etc/masterha/script/send_report    # 设置发生切换后执行的脚本

# 检查脚本
secondary_check_script= /usr/bin/masterha_secondary_check-s 10.0.0.247 -s 10.0.0.248            

shutdown_script=""              #设置故障发生后关闭故障主机脚本(可以用于防止脑裂)

ssh_user=mysql                  #设置 ssh 的登录用户名

[server1]
hostname=10.0.0.247
port=3306

[server2]
hostname=10.0.0.248
port=3306
candidate_master=1   # 设置为候选 master, 如果发生宕机切换,会把该节点设为新 Master,即使它不是数据最新的节点
check_repl_delay=0   # 默认情况下,一个 Slave 落后 Master 100M 的中继日志,MHA 不会选择它作为新的 Master,因为这对于 Slave 恢复数据要很长时间,check_repl_delay=0 的时候会忽略延迟,可以和 candidate_master=1 配合用

[server3]
hostname=10.0.0.249
port=3306
no_master=1         # 从不将这台主机升级为 Master
ignore_fail=1       # 默认情况下,如果有 Slave 节点挂了, 就不进行切换,设置 ignore_fail=1 可以忽然它

创建配置文件/etc/masterha/app2.cnf, 以备用Master 为 Master, 方便切换后启动MHA

[server default]
manager_workdir=/etc/masterha                  # 设置 manager 的工作目录, 可以自己调整
manager_log=/etc/masterha/manager.log          # 设置 manager 的日志文件
master_binlog_dir=/data/mysql_log/test_db      # 设置 master binlog 的日志的位置
master_ip_failover_script= /etc/masterha/script/master_ip_failover            # 设置自动 failover 时的切换脚本
master_ip_online_change_script= /etc/masterha/script/master_ip_online_change  # 设置手动切换时执行的切换脚本

user=mha_manager            # 设置管理用户, 用来监控、配置 MySQL(STOP SLAVE, CHANGE MASTER, RESET SLAVE), 默认为 root
password=mha_manager        # 设置管理用户密码

repl_user=repl              # 设置复制环境中的复制用户名
repl_password=repl          # 设置复制用户的密码

ping_interval=1             # 发送 ping 包的时间间隔,三次没有回应就自动进行 failover
remote_workdir=/tmp         # 设置远端 MySQL 的工作目录

report_script=/etc/masterha/script/send_report    # 设置发生切换后执行的脚本

# 检查脚本
secondary_check_script= /usr/bin/masterha_secondary_check -s 10.0.0.248 -s 10.0.0.247

shutdown_script=""              #设置故障发生后关闭故障主机脚本(可以用于防止脑裂)

ssh_user=mysql                  #设置 ssh 的登录用户名

[server1]
hostname=10.0.0.248
port=3306

[server2]
hostname=10.0.0.247
port=3306
candidate_master=1   # 设置为候选 master, 如果发生宕机切换,会把该节点设为新 Master,即使它不是数据最新的节点
check_repl_delay=0   # 默认情况下,一个 Slave 落后 Master 100M 的中继日志,MHA 不会选择它作为新的 Master,因为这对于 Slave 恢复数据要很长时间,check_repl_delay=0 的时候会忽略延迟,可以和 candidate_master=1 配合用

[server3]
hostname=10.0.0.249
port=3306
no_master=1         # 从不将这台主机升级为 Master
ignore_fail=1       # 默认情况下,如果有 Slave 节点挂了, 就不进行切换,设置 ignore_fail=1 可以忽然它

注意:使用的时候去掉注释

3.7 配置切换脚本

管理 VIP 方式

MHA管理VIP有两种方案,一种是使用Keepalived,另一种是自己写命令实现增删VIP,由于Keepalived容易受到网络波动造成VIP切换,而且无法在多实例机器上使用,所以建议写脚本管理VIP

当前主机的网卡是eth0, 可以通过下列命令增删 VIP

up VIP

sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255

down VIP

sudo /sbin/ifconfig eth0:1 down

配置切换脚本

master_ip_failover , master_ip_online_changesend_report脚本在附录里面

更改 mysql 配置

MHA的检测比较严格,所以我们把除Master外的节点设为read_only, 有必要可以写进配置文件里面

# mysql shell
set global read_only=1;

MHA需要使用中继日志来实现数据一致性,所以所有节点要设置不自动清理中继日志

# mysql shell
set global relay_log_purge=0;

也可以写入配置文件

# my.cnf
relay_log_purge=0

MHA 常用命令

Manager

masterha_check_ssh              检查 MHA 的 SSH 配置状况    
masterha_check_repl             检查 MySQL 复制状况
masterha_manger                 启动 MHA
masterha_stop                   停止 MHA
masterha_check_status           检测当前 MHA 运行状态
masterha_master_monitor         检测 master 是否宕机
masterha_master_switch          手动故障转移
masterha_conf_host              添加或删除配置的 server 信息

Node

save_binary_logs                保存 master 的二进制日志
apply_diff_relay_logs           对比识别中继日志的差异部分
purge_relay_logs                清除中继日志(MHA中继日志需要使用这个命令清除)

命令的使用方法可以通过执行命令 --help 得到

验证 SSH 是否成功、主从状态是否正常

在 manager 节点执行 masterha_check_ssh --conf=/etc/masterha/app1.cnf 检测SSH状态,下面是执行结果

[mysql@chengqm ~]$ masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Dec 20 19:47:18 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 20 19:47:18 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Dec 20 19:47:18 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Dec 20 19:47:18 2018 - [info] Starting SSH connection tests..
Thu Dec 20 19:47:19 2018 - [debug] 
Thu Dec 20 19:47:18 2018 - [debug]  Connecting via SSH from mysql@10.0.0.247(10.0.0.247:22) to mysql@10.0.0.248(10.0.0.248:22)..
Thu Dec 20 19:47:19 2018 - [debug]   ok.
Thu Dec 20 19:47:19 2018 - [debug]  Connecting via SSH from mysql@10.0.0.247(10.0.0.247:22) to mysql@10.0.0.249(10.0.0.249:22)..
Thu Dec 20 19:47:19 2018 - [debug]   ok.
Thu Dec 20 19:47:19 2018 - [debug] 
Thu Dec 20 19:47:19 2018 - [debug]  Connecting via SSH from mysql@10.0.0.248(10.0.0.248:22) to mysql@10.0.0.247(10.0.0.247:22)..
Thu Dec 20 19:47:19 2018 - [debug]   ok.
Thu Dec 20 19:47:19 2018 - [debug]  Connecting via SSH from mysql@10.0.0.248(10.0.0.248:22) to mysql@10.0.0.249(10.0.0.249:22)..
Thu Dec 20 19:47:19 2018 - [debug]   ok.
Thu Dec 20 19:47:20 2018 - [debug] 
Thu Dec 20 19:47:19 2018 - [debug]  Connecting via SSH from mysql@10.0.0.249(10.0.0.249:22) to mysql@10.0.0.247(10.0.0.247:22)..
Thu Dec 20 19:47:20 2018 - [debug]   ok.
Thu Dec 20 19:47:20 2018 - [debug]  Connecting via SSH from mysql@10.0.0.249(10.0.0.249:22) to mysql@10.0.0.248(10.0.0.248:22)..
Thu Dec 20 19:47:20 2018 - [debug]   ok.
Thu Dec 20 19:47:20 2018 - [info] All SSH connection tests passed successfully.

在 manager 节点执行 masterha_check_repl --conf=/etc/masterha/app1.cnf 检测同步状态,下面是执行结果

[mysql@chengqm ~]$ masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Dec 20 20:05:03 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 20 20:05:03 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Dec 20 20:05:03 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Dec 20 20:05:03 2018 - [info] MHA::MasterMonitor version 0.56.
Thu Dec 20 20:05:03 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.0.0.247(10.0.0.247:3306)
Thu Dec 20 20:05:03 2018 - [info] Master configurations are as below: 
Master 10.0.0.247(10.0.0.247:3306), replicating from 10.0.0.248(10.0.0.248:3306)
Master 10.0.0.248(10.0.0.248:3306), replicating from 10.0.0.247(10.0.0.247:3306), read-only
================ 省略 ==================
Thu Dec 20 20:05:08 2018 - [info]   /etc/masterha/script/master_ip_failover --command=status --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 
Thu Dec 20 20:05:08 2018 - [info]  OK.
Thu Dec 20 20:05:08 2018 - [warning] shutdown_script is not defined.
Thu Dec 20 20:05:08 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

出现 MySQL Replication Health is OK. 表示成功

如果出现Failed to get master_ip_failover_script status with return code 255:0这个错误,就注释掉master_ip_failover脚本的FIXME_xxx

注意:要想正常运行,系统路径必须要有 mysqlbinlog 和 mysql 命令

4 启动和测试

4.1 启动

使用脚本管理 VIP 不会自动设置 VIP,所以先手动在 Master 设置 VIP

[root@cluster01 ~]# /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255
[root@cluster01 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr FA:16:3E:DE:80:33  
          inet addr:10.0.0.247  Bcast:10.0.255.255  Mask:255.255.0.0
          inet6 addr: fe80::f816:3eff:fede:8033/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:17333247 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5472004 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1476157398 (1.3 GiB)  TX bytes:1064253754 (1014.9 MiB)

eth0:1    Link encap:Ethernet  HWaddr FA:16:3E:DE:80:33  
          inet addr:10.0.0.237  Bcast:10.0.0.237  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

启动 MHA Manager

[mysql@chengqm ~]$ nohup /usr/bin/masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover &
[1] 21668
  • --ignore_last_failover 忽略上次切换。MHA每次故障切换后都会生成一个app1.failover.complete这样的文件,如果不加这个参数,需要删除这个文件才能再次启动

检查启动日志

[mysql@chengqm ~]$ tail -18 /etc/masterha/manager.log 
Fri Dec 21 13:56:39 2018 - [info] 
10.0.0.247(10.0.0.247:3306) (current master)
 +--10.0.0.248(10.0.0.248:3306)
 +--10.0.0.249(10.0.0.249:3306)

Fri Dec 21 13:56:39 2018 - [info] Checking master_ip_failover_script status:
Fri Dec 21 13:56:39 2018 - [info]   /etc/masterha/script/master_ip_failover --command=status --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 


 VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down

Check script.. OK 
Fri Dec 21 13:56:39 2018 - [info]  OK.
Fri Dec 21 13:56:39 2018 - [warning] shutdown_script is not defined.
Fri Dec 21 13:56:39 2018 - [info] Set master ping interval 1 seconds.
Fri Dec 21 13:56:39 2018 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s 10.0.0.247 -s 10.0.0.248
Fri Dec 21 13:56:39 2018 - [info] Starting ping health check on 10.0.0.247(10.0.0.247:3306)..
Fri Dec 21 13:56:39 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

日志中显示 Ping(SELECT) succeeded, waiting until MySQL doesn't respond 表示启动成功

如果查看Mastergeneral日志,会发现MHA不断执行SELECT 1 As Value检查命令

4.2 失效转移

我们模拟Master数据库宕机的情况

[root@cluster01 ~]# ps -ef | grep mysql
mysql    20061     1  0 11:19 pts/0    00:00:00 /bin/sh /usr/local/mysql57/bin/mysqld_safe --defaults-file=/data/mysql_db/test_db/my.cnf --datadir=/data/mysql_db/test_db --pid-file=/data/mysql_db/test_db/mysql.pid
mysql    20494 20061  0 11:19 pts/0    00:00:21 /usr/local/mysql57/bin/mysqld --defaults-file=/data/mysql_db/test_db/my.cnf --basedir=/usr/local/mysql57 --datadir=/data/mysql_db/test_db --plugin-dir=/usr/local/mysql57/lib/plugin --log-error=/data/mysql_log/test_db/mysql-error.log --pid-file=/data/mysql_db/test_db/mysql.pid --socket=/data/mysql_db/test_db/mysql.sock --port=3306
[root@cluster01 ~]# kill -9 20061 20494

查看MHA日志可以看到整个切换过程

Fri Dec 21 14:04:49 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Dec 21 14:04:49 2018 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 10.0.0.247 -s 10.0.0.248  --user=mysql  --master_host=10.0.0.247  --master_ip=10.0.0.247  --master_port=3306 --master_user=mha_manager --master_password=mha_manager --ping_type=SELECT
Fri Dec 21 14:04:49 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_log/test_db --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Monitoring server 10.0.0.247 is reachable, Master is not reachable from 10.0.0.247. OK.
Fri Dec 21 14:04:49 2018 - [info] HealthCheck: SSH to 10.0.0.247 is reachable.
Monitoring server 10.0.0.248 is reachable, Master is not reachable from 10.0.0.248. OK.
Fri Dec 21 14:04:49 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start.
=============== 省略 ================
Fri Dec 21 14:04:52 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Dec 21 14:04:52 2018 - [info] Executing master IP deactivation script:
Fri Dec 21 14:04:52 2018 - [info]   /etc/masterha/script/master_ip_failover --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 --command=stopssh --ssh_user=mysql  


 VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down

Disabling the VIP on old master: 10.0.0.247 
SIOCSIFFLAGS: Cannot assign requested address
Fri Dec 21 14:04:52 2018 - [info]  done.
=============== 省略 ================
Fri Dec 21 14:04:53 2018 - [info] Starting master failover..
Fri Dec 21 14:04:53 2018 - [info] 
From:
10.0.0.247(10.0.0.247:3306) (current master)
 +--10.0.0.248(10.0.0.248:3306)
 +--10.0.0.249(10.0.0.249:3306)

To:
10.0.0.248(10.0.0.248:3306) (new master)
 +--10.0.0.249(10.0.0.249:3306)
Fri Dec 21 14:04:53 2018 - [info]
=============== 省略 ================
Fri Dec 21 14:04:53 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.248', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Dec 21 14:04:53 2018 - [info] Executing master IP activate script:
Fri Dec 21 14:04:53 2018 - [info]   /etc/masterha/script/master_ip_failover --command=start --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 --new_master_host=10.0.0.248 --new_master_ip=10.0.0.248 --new_master_port=3306 --new_master_user='mha_manager' --new_master_password='mha_manager'  


 VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down

Set read_only=0 on the new master.
Enabling the VIP - 10.0.0.237 on the new master - 10.0.0.248 
=============== 省略 ================
Fri Dec 21 14:04:55 2018 - [info]  10.0.0.248: Resetting slave info succeeded.
Fri Dec 21 14:04:55 2018 - [info] Master failover to 10.0.0.248(10.0.0.248:3306) completed successfully.

查看新Master VIP

[mysql@cluster02 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr FA:16:3E:66:7E:E8  
          inet addr:10.0.0.248  Bcast:10.0.255.255  Mask:255.255.0.0
          inet6 addr: fe80::f816:3eff:fe66:7ee8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:40197173 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10470689 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4063358126 (3.7 GiB)  TX bytes:2269241789 (2.1 GiB)

eth0:1    Link encap:Ethernet  HWaddr FA:16:3E:66:7E:E8  
          inet addr:10.0.0.237  Bcast:10.0.0.237  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

可以看到VIP已经成功切换

查看新Mastergeneral日志,可以看到MHA的操作过程, 下面展示部分日志

...
2018-12-21T14:04:41.782336+08:00 5525 Query    SHOW SLAVE STATUS
2018-12-21T14:04:41.788318+08:00 5525 Query    STOP SLAVE IO_THREAD
2018-12-21T14:04:41.900734+08:00 5525 Query    SHOW SLAVE STATUS
2018-12-21T14:04:42.044801+08:00 5525 Query    SHOW SLAVE STATUS
2018-12-21T14:04:42.668581+08:00 5525 Query    SHOW SLAVE STATUS
2018-12-21T14:04:42.670336+08:00 5525 Query    STOP SLAVE SQL_THREAD
...
2018-12-21T14:04:42.863904+08:00 5526 Query    SET GLOBAL read_only=0
...
2018-12-21T14:04:43.950986+08:00 5527 Query    SET @rpl_semi_sync_slave= 1
...

查看Slavegeneral日志,可以看到Slave会重新指向

2018-12-21T14:04:04.835218+08:00   90 Query    STOP SLAVE IO_THREAD
2018-12-21T14:04:04.955706+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:05.092123+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.018838+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.034225+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.036613+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.038475+08:00   90 Query    STOP SLAVE SQL_THREAD
2018-12-21T14:04:06.160142+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.162224+08:00   90 Query    STOP SLAVE
2018-12-21T14:04:06.163171+08:00   90 Query    SHOW SLAVE STATUS
2018-12-21T14:04:06.164554+08:00   90 Query    RESET SLAVE
2018-12-21T14:04:06.825564+08:00   90 Query    CHANGE MASTER TO MASTER_HOST = '10.0.0.248' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTER_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000005' MASTER_LOG_POS = 154
2018-12-21T14:04:06.981718+08:00   90 Query    SET GLOBAL relay_log_purge=0
2018-12-21T14:04:06.982802+08:00   90 Query    START SLAVE

注意: MHA在切换完成后会结束 Manager 进程

4.3 手动切换

切换后MasterCluster2, 把Cluster1重新指向Cluster2,现在测试一下手动切换,把Master切回Cluster1, 命令如下

masterha_master_switch --conf=/etc/masterha/app2.cnf --master_state=alive --new_master_host=10.0.0.247 --new_master_port=3306 --orig_master_is_new_slave
  • --orig_master_is_new_slave 是将原master切换为新主的slave,默认情况下,是不添加的。

下面是执行过程, 有两个地方要回答 yes/no

[mysql@chengqm ~]$ masterha_master_switch --conf=/etc/masterha/app2.cnf --master_state=alive --new_master_host=10.0.0.247 --new_master_port=3306 --orig_master_is_new_slave

......

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.0.0.248(10.0.0.248:3306)? (YES/no): yes

......

Sun Dec 23 16:50:48 2018 - [info] 
From:
10.0.0.248(10.0.0.248:3306) (current master)
 +--10.0.0.247(10.0.0.247:3306)
 +--10.0.0.249(10.0.0.249:3306)

To:
10.0.0.247(10.0.0.247:3306) (new master)
 +--10.0.0.249(10.0.0.249:3306)
 +--10.0.0.248(10.0.0.248:3306)

Starting master switch from 10.0.0.248(10.0.0.248:3306) to 10.0.0.247(10.0.0.247:3306)? (yes/NO): yes

......

Sun Dec 23 16:51:36 2018 - [info]  10.0.0.247: Resetting slave info succeeded.
Sun Dec 23 16:51:36 2018 - [info] Switching master to 10.0.0.247(10.0.0.247:3306) completed successfully.

切换成功,查看Cluster1VIP

[mysql@cluster01 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr FA:16:3E:DE:80:33  
          inet addr:10.0.0.247  Bcast:10.0.255.255  Mask:255.255.0.0
          inet6 addr: fe80::f816:3eff:fede:8033/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20585872 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5519122 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1785787985 (1.6 GiB)  TX bytes:1068115408 (1018.6 MiB)

eth0:1    Link encap:Ethernet  HWaddr FA:16:3E:DE:80:33  
          inet addr:10.0.0.237  Bcast:10.0.0.237  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

注意:手动切换的时候先把 MHA Manager 停了

4.4 停止 MHA

停止 MHA 的命令如下,就不演示了

masterha_stop --conf=配置文件

5 总结

总的来说,MHA是一套非常优秀而且使用比较广的高可用程序,它可以自动补齐日志使得一致性有保证,部署的时候不需要改变原有架构就可以使用。但是使用起来还是有一点复杂的,因为MHA不接管VIP,所以要自己写脚本实现,而且只保证Master高可用,没有Slave高可用,还有就是中继日志要自己设定时任务来清理。

不管怎么说,在没有更好的方案下,MHA还是值得使用的。

master_ip_failover 脚本

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '10.0.0.237';
my $key = '1';
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip netmask 255.255.255.255";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {

  print "\n\n VIP Command: start=$ssh_start_vip stop=$ssh_stop_vip\n\n";
 
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
      print "Disabling the VIP on old master: $orig_master_host \n";
      &stop_vip();
      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

    # all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();
      $new_master_handler->disconnect();

      print "Enabling the VIP - $vip on the new master - $new_master_host \n";
      &start_vip();

      $exit_code = 0;
    };
    if ($@) {
      warn $@;

      # If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {
    print "Check script.. OK \n";
    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

master_ip_online_change 脚本

#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

my $vip = '10.0.0.237';
my $key = '1';
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip netmask 255.255.255.255";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      $orig_master_handler->disable_log_bin_local();

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

      print "Disabling the VIP on old master: $orig_master_host \n";
      &stop_vip();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

      print "Enabling the VIP - $vip on the new master - $new_master_host \n";
      &start_vip();

      ## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

    # do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip() {
    return 0  unless  ($new_master_ssh_user);
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($orig_master_ssh_user);
    `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}

send_report 脚本

#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body, $title, $content);
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

# 调用外部脚本
$title="[mha switch]";
$content="`date +'%Y-%m-%d %H:%M'` old_master=".$dead_master_host." new_master=".$new_master_host;
system("sh /etc/masterha/script/send_report.sh $title $content");

exit 0;

清理中继日志定时任务

下面是我的定时任务,参数自行替换, workdir 需要和中继日志在同一个盘

# 每小时清理一次
0 * * * * (/usr/bin/purge_relay_logs --user=mha_manager --password=mha_manager --disable_relay_log_purge --port=3306 --workdir=/tmp/relaylogtmp >> /var/log/purge_relay_logs.log 2>&1)

MHA Wiki: https://github.com/yoshinorim...

posted @ 2022-04-20 17:09  devops运维-小灰灰  阅读(340)  评论(0编辑  收藏  举报