MHA 高可用配置过程——附详细脚本
环境说明:
role: ip hostname mysql version vip地址
master: 10.10.10.11 BXVL-MYSQL-10-11 mysql-5.7.18 10.10.10.10
replica: 10.10.10.12 BXVL-MYSQL-10-12 mysql-5.7.18
replica: 10.10.10.13 BXVL-MYSQL-10-13 mysql-5.7.18
mha manager配置在10.10.10.13上.
1.配置互信:
(各节点配置) cd /dbatools/scripts/dba_mysql_python_scripts python mysql_task_scripts.py --task trust_config --trust-ip 10.10.10.13 python mysql_task_scripts.py --task trust_config --trust-ip 10.10.10.11 python mysql_task_scripts.py --task trust_config --trust-ip 10.10.10.12
2.配置/etc/hosts 各节点需配置/etc/hosts,以便能识别主机名进行切换
(各节点配置) 10.10.10.13 BXVL-MYSQL-10-13 10.10.10.11 BXVL-MYSQL-10-11 10.10.10.12 BXVL-MYSQL-10-12
3.检测互信
(各节点执行) ssh root@BXVL-MYSQL-10-13 date ssh root@BXVL-MYSQL-10-11 date ssh root@BXVL-MYSQL-10-12 date
4.安装mha相关工具
各节点安装node: 依赖:perl-DBD-MySQL yum install perl-DBD-MySQL -y yum -y install mha4mysql-node-0.58-0.el7.centos.noarch.rpm manager节点安装:10.13节点 依赖:perl-Config-Tiny、perl-Log-Dispatch、perl-Parallel-ForkManager 由于perl-Log-Dispatch和perl-Parallel-ForkManager这两个被依赖包在yum仓库找不到(国内的yum源找不到), 因此安装epel-release-7-14.noarch.rpm(软件源) rpm -ivh epel-release-7-14.noarch.rpm # 安装epel软件源 #yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager -y yum -y install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
5.配置mha账号
主库上创建mhauser用户并授权:create user user_mha_admin identified by 'yourpassword'; GRANT SELECT, INSERT, CREATE, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, EVENT, TRIGGER, CREATE TABLESPACE ON *.* TO 'user_mha_admin'@'%';
6.mha自带相关脚本说明:
# ll /usr/bin/master* -rwxr-xr-x. 1 root root 1995 Mar 23 2018 /usr/bin/masterha_check_repl -rwxr-xr-x. 1 root root 1779 Mar 23 2018 /usr/bin/masterha_check_ssh -rwxr-xr-x. 1 root root 1865 Mar 23 2018 /usr/bin/masterha_check_status -rwxr-xr-x. 1 root root 3201 Mar 23 2018 /usr/bin/masterha_conf_host -rwxr-xr-x. 1 root root 2517 Mar 23 2018 /usr/bin/masterha_manager -rwxr-xr-x. 1 root root 2165 Mar 23 2018 /usr/bin/masterha_master_monitor -rwxr-xr-x. 1 root root 2373 Mar 23 2018 /usr/bin/masterha_master_switch -rwxr-xr-x. 1 root root 5172 Mar 23 2018 /usr/bin/masterha_secondary_check -rwxr-xr-x. 1 root root 1739 Mar 23 2018 /usr/bin/masterha_stop save_binary_logs: 保存和复制master的binlog apply_diff_relay_logs: 识别差异日志,应用到其它slave purge_relay_logs: 清除中继日志
7.mha 基础配置
以下10.13操作:
mkdir -p /opt/mha/yourcluster_name/script
mha监控根目录:/opt/mha
mha全局配置文件:/etc/masterha_default.cnf
MHA监控实例根目录:/opt/mha/yourcluster_name/
mha监控实例日志文件:/opt/mha/yourcluster_name/manager.log
mha监控实例配置文件:/opt/mha/yourcluster_name/mhaapp.cnf
mha监控实例yourcluster_name脚本目录: /opt/mha/yourcluster_name/script
mha监控实例yourcluster_name自定义脚本文件包含:
mha_failover_sendmessage.py : 企业微信告警脚本,master_ip_failover、master_ip_online_change发生切换时调用
##以下脚本在监控实例目录或者参数文件变更时都要进行修改!!
master_ip_failover: failover切换脚本
master_ip_online_change: online switch主从脚本
mastermha_service.sh: mastermha_service.sh [start|stop|status] # mastermha manager启停、查看状态脚本
脚本文件需先授权:
cd /opt/mha/yourcluster_name/script
chmod a+x *
8.mha参数配置
参数里不要带注释
# vim /etc/masterha_default.cnf [server default] user=user_mha_admin password=yourpassword ssh_user=root repl_user=replication repl_password=yourpassword ping_interval=1
##mha集群yourcluster_name实例的配置文件
vim /opt/mha/yourcluster_name/mhaapp.cnf [server default] # 设置实例manager的工作目录 manager_workdir=/opt/mha/yourcluster_name/ # 设置实例manager的日志 manager_log=/opt/mha/yourcluster_name/manager.log secondary_check_script= masterha_secondary_check -s 10.10.10.12 -s 10.10.10.13 -s 10.10.10.11 master_ip_online_change_script= /opt/mha/yourcluster_name/script/master_ip_online_change master_ip_failover_script=/opt/mha/yourcluster_name/script/master_ip_failover [server1] hostname=10.10.10.11 port=3306 master_binlog_dir= /opt/mysql/log # 设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库 candidate_master=1 [server2] hostname=10.10.10.12 port=3306 master_binlog_dir= /opt/mysql/log candidate_master=1 [server3] hostname=10.10.10.13 port=3306 master_binlog_dir= /opt/mysql/log #candidate_master=1
##binlog server配置 如果以master的binlog目录为binlogserver failover后需要重新配置成新master地址
[binlog1] hostname=10.10.10.11 master_binlog_dir=/opt/mysql/log ignore_fail=1 no_master=1
9.定时任务配置
配置mastermha manager服务状态检测定时任务:每10分钟检测一次
## mha manager status check */10 * * * * cd /dbatools/scripts/dba_mysql_python_scripts/ && /usr/bin/python mysql_task_scripts.py --task masterha_check_status
配置relay_log_purge定时清理任务:
## relay_log_purge 30 */2 * * * /usr/bin/purge_relay_logs --user=user_mha_admin --password='yourpassword' --port=3306 --disable_relay_log_purge > /opt/mysql/log/purge_relay.log 2>&1
10.启动mastermha_manager
$ cd /opt/mha/yourcluster_name/script $ mastermha_service.sh start #启动服务 $ mastermha_service.sh status #查看状态 $ mastermha_service.sh stop #停止服务
11.相关命令介绍
(1)配置检查
##Manager进行互信检查:
$ masterha_check_ssh --conf=/opt/mha/yourcluster_name/mhaapp.cnf
##主从关系检查:
$ masterha_check_repl --conf=/opt/mha/yourcluster_name/mhaapp.cnf
(2)启动mha manager服务:
$ nohup masterha_manager --conf=/opt/mha/yourcluster_name/mhaapp.cnf --ignore_last_failover < /dev/null > /opt/mha/yourcluster_name/manager.log 2>&1 & #nohup masterha_manager --conf=/opt/mha/yourcluster_name/mhaapp.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /opt/mha/yourcluster_name/manager.log 2>&1 &
说明:
--remove_dead_master_conf //该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log //日志存放位置
--ignore_last_failover //在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。
该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。
(3)停止mha manager服务:
$ masterha_stop --conf=/opt/mha/yourcluster_name/mhaapp.cnf
(4)mastermha_manager运行状态检查:
$ masterha_check_status --conf=/opt/mha/yourcluster_name/mhaapp.cnf
app1 (pid:30135) is running(0:PING_OK), master:10.129.6.97
查看日志:
$ tail -f /opt/mha/yourcluster_name/manager.log
12.关于mha
MHA Manager:
1. masterha_check_ssh:检查MHA的SSH配置状况
2. masterha_check_repl:检查MySQL的复制状况
3. masterha_manager:启动MHA
4. masterha_check_status:检测当前MHA运行状态
5. masterha_master_monitor:检测master是否宕机
6. masterha_master_switch:控制故障转移(自动或手动)
7. masterha_conf_host:添加或删除配置的server信息
8. masterha_stop:关闭MHA
MHA Node:
save_binary_logs:保存或复制master的二进制日志
apply_diff_relay_logs:识别差异的relay log并将差异的event应用到其它slave中
filter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs:消除中继日志(不会堵塞SQL线程)
另有如下几个脚本需自定义:
1. master_ip_failover:管理VIP
2. master_ip_online_change:
3. masterha_secondary_check:当MHA manager检测到master不可用时,通过masterha_secondary_check脚本来进一步确认,减低误切的风险。
4. send_report:当发生故障切换时,可通过send_report脚本发送告警信息。
13.测试过程
测试环境: 10.129.6.97 10.129.6.98 10.129.6.99 vip:10.129.6.211 (1)自动切换 注意:切换后mha manager服务会自动停止。需手动再次启动!!! 停止原主10.129.6.97数据库服务,观察自动切换: 切换过程会自动将原主库的配置信息会从/opt/mha/app1/conf/app1.cnf删除掉 # mysqladmin -p -S/opt/mysql/3308/mysql.sock shutdown ①查看日志发现:tail -f /opt/mha/app1/log/manager.log Master failover to 10.129.6.98(10.129.6.98:3308) completed successfully. --表示切换完成 ②10.129.6.99上查看从库状态:show slave status 此时主库已经变为10.129.6.98 ③在10.129.6.98上查看ip a: vip地址已经漂移过来 [root@BYVL-DB-6-98 ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:50:56:a6:91:38 brd ff:ff:ff:ff:ff:ff inet 10.129.6.98/24 brd 10.129.6.255 scope global ens192 valid_lft forever preferred_lft forever inet 10.129.6.211/24 brd 10.129.6.255 scope global secondary ens192:1 valid_lft forever preferred_lft forever inet6 fe80::eb21:1d8e:5b56:bee0/64 scope link valid_lft forever preferred_lft forever 恢复原主为新从库:(mha manager启动至少三个节点,所以需要将其恢复后才能再次启动manager服务) 查看日志tail -f /opt/mha/app1/log/manager.log发现语句: Fri Feb 11 15:29:59 2022 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='BYVL-DB-6-97 or 10.129.6.97', MASTER_PORT=3308, MASTER_AUTO_POSITION=1, MASTER_USER='replication', MASTER_PASSWORD='xxx'; Fri Feb 11 15:29:59 2022 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000011, 1102, 5900e418-88fe-11ec-94e3-005056a6aa0e:1-15, 79440d2b-8901-11ec-8027-005056a69138:1-2 执行以下语句到原主库使其成为新从库: (/opt/mha/app1/conf/app1.cnf中binlog server切记修改为新主库的) --select @@GLOBAL.GTID_EXECUTED; reset master; set global gtid_purged='5900e418-88fe-11ec-94e3-005056a6aa0e:1-19, 79440d2b-8901-11ec-8027-005056a69138:1-2' CHANGE MASTER TO MASTER_HOST='10.129.6.98', MASTER_PORT=3308, MASTER_AUTO_POSITION=1, MASTER_USER='replication', MASTER_PASSWORD='beisen@123'; 启动mha manager nohup masterha_manager --conf=/opt/mha/app1/conf/app1.cnf --ignore_last_failover < /dev/null > /opt/mha/app1/log/manager.log 2>&1 & (2)手动切换 修改/opt/mha/app1/conf/app1.cnf配置 将原主库信息添加进去: [server1] hostname=10.129.6.97 port=3308 master_binlog_dir= /opt/mysql/3308/log # 设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库 candidate_master=1 首先结束mha服务: masterha_stop --conf=/opt/mha/app1/conf/app1.cnf 执行切换命令: masterha_master_switch --conf=/opt/mha/app1/conf/app1.cnf --master_state=alive --new_master_host=BYVL-DB-6-98 --new_master_port=3308 --orig_master_is_new_slave --running_updates_limit=10000 日志输出: Thu Feb 10 23:16:28 2022 - [info] Switching master to BYVL-DB-6-97(10.129.6.97:3308) completed successfully. 此时97成为新主库 vip也自动漂移到新主库上
测试过程出现的问题:
1.Can't exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 532. 解决: 在监控实例所有节点上都创建一下软连接: ln -s /opt/mysql/3308/base/bin/mysqlbinlog /usr/bin/mysqlbinlog ###ln -s /opt/mysql/3308/base/bin/mysql /usr/bin/mysql 2.Wed Feb 9 23:31:18 2022 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Can't exec "/opt/mha/yourcluster_name/script/master_ip_failover": Permission denied at /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm line 68. Wed Feb 9 23:31:18 2022 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers 解决: chown -R mysql.mysql master_ip_failover chmod a+x master_ip_failover 3.Wed Feb 9 23:33:09 2022 - [info] /opt/mha/script/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.129.6.97 --orig_master_ip=10.129.6.97 --orig_master_port=3308 : No such file or directory 解决:master_ip_failover 脚本必须加上perl才能执行 修改脚本首行为:#!/usr/bin/perl 后出现问题: /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.129.6.97 --orig_master_ip=10.129.6.97 --orig_master_port=3308 -bash: /usr/local/bin/master_ip_failover: /usr/bin/perl^M: bad interpreter: No such file or directory 解决:dos2unix master_ip_failover
修改脚本文件:
master_ip_failover、master_ip_online_change 添加内容: my $mha_conf = "/opt/mha/yourcluster_name/mhaapp.cnf"; sub change_binserver() { my $mha_conf_binlogserver_change = "sed -i '/binlog1/{n;s#hostname=.*#hostname=$new_master_host #;}' $mha_conf"; print "Change binlog1 in $mha_conf to $new_master_host \n"; `$mha_conf_binlogserver_change`; } master_ip_failover main()函数修改: .... eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &change_binserver(); ###添加此行 &stop_vip(); $exit_code = 0; .... master_ip_online_change main()函数修改: ..... print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &change_binserver(); ###添加此行 $exit_code = 0; .....
附各脚本文件:
master_ip_failover
##需要修改的参数:my $vip 、my $mha_conf 、my $mha_script_dir
增加函数:
change_binserver() #切换主库后跟随切换配置文件中的[binlog1]server对应主机
output_newmaster() #切换主从后将主信息输出到文件以备查看
#!/usr/bin/perl #####!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); my $vip = '10.10.10.10/24'; my $key = "1"; my $ssh_start_vip = "/sbin/ifconfig ens192:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens192:$key down"; my $mha_conf = "/opt/mha/yourcluster_name/mhaapp.cnf"; my $mha_script_dir = "/opt/mha/yourcluster_name/script"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { # updating global catalog, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &change_binserver(); &stop_vip(); &output_newmaster(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub change_binserver() { my $mha_conf_binlogserver_change = "sed -i '/binlog1/{n;s#hostname=.*#hostname=$new_master_host #;}' $mha_conf"; print "Change binlog1 value in $mha_conf to new master: $new_master_host \n"; `$mha_conf_binlogserver_change`; } sub output_newmaster() { `/usr/bin/python $mha_script_dir/mha_failover_sendmessage.py $new_master_ip $orig_master_ip $vip $mha_script_dir`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
master_ip_online_change
##需要修改的参数:my $vip 、my $brdc 、my $mha_conf
增加函数:
change_binserver() #切换主库后跟随切换配置文件中的[binlog1]server对应主机
#!/usr/bin/perl use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; use MHA::NodeUtil; use Time::HiRes qw( sleep gettimeofday tv_interval ); use Data::Dumper; my $_tstart; my $_running_interval = 0.1; my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user, ); my $vip = '10.10.10.10'; my $brdc = '10.23.30.255'; my $ifdev = 'ens192'; my $key = '1'; ## if config binlog server must modify $mha_conf my $mha_conf = "/opt/mha/yourcluster_name/mhaapp.cnf"; my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;"; my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key"; GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user, ); exit &main(); sub current_time_us { my ( $sec, $microsec ) = gettimeofday(); my $curdate = localtime($sec); return $curdate . " " . sprintf( "%06d", $microsec ); } sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval > $elapsed ) { sleep( $_running_interval - $elapsed ); } } sub get_threads_util { my $dbh = shift; my $my_connection_id = shift; my $running_time_threshold = shift; my $type = shift; $running_time_threshold = 0 unless ($running_time_threshold); $type = 0 unless ($type); my @threads; my $sth = $dbh->prepare("SHOW PROCESSLIST"); $sth->execute(); while ( my $ref = $sth->fetchrow_hashref() ) { my $id = $ref->{Id}; my $user = $ref->{User}; my $host = $ref->{Host}; my $command = $ref->{Command}; my $state = $ref->{State}; my $query_time = $ref->{Time}; my $info = $ref->{Info}; $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info); next if ( $my_connection_id == $id ); next if ( defined($query_time) && $query_time < $running_time_threshold ); next if ( defined($command) && $command eq "Binlog Dump" ); next if ( defined($user) && $user eq "system user" ); next if ( defined($command) && $command eq "Sleep" && defined($query_time) && $query_time >= 1 ); if ( $type >= 1 ) { next if ( defined($command) && $command eq "Sleep" ); next if ( defined($command) && $command eq "Connect" ); } if ( $type >= 2 ) { next if ( defined($info) && $info =~ m/^select/i ); next if ( defined($info) && $info =~ m/^show/i ); } push @threads, $ref; } return @threads; } sub main { if ( $command eq "stop" ) { ## Gracefully killing connections on the current master # 1. Set read_only= 1 on the new master # 2. DROP USER so that no app user can establish new connections # 3. Set read_only= 1 on the current master # 4. Kill current queries # * Any database access failure will result in script die. my $exit_code = 1; eval { ## Setting read_only=1 on the new master (to avoid accident) my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error(die_on_error)_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); print current_time_us() . " Set read_only on the new master.. "; $new_master_handler->enable_read_only(); if ( $new_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } $new_master_handler->disconnect(); # Connecting to the orig master, die if any database error happens my $orig_master_handler = new MHA::DBHelper(); $orig_master_handler->connect( $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, 1 ); ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand #$orig_master_handler->disable_log_bin_local(); #print current_time_us() . " Drpping app user on the orig master..\n"; #FIXME_xxx_drop_app_user($orig_master_handler); ## Waiting for N * 100 milliseconds so that current connections can exit my $time_until_read_only = 15; $_tstart = [gettimeofday]; my @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_read_only > 0 && $#threads >= 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_read_only--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } ## Setting read_only=1 on the current master so that nobody(except SUPER) can write print current_time_us() . " Set read_only=1 on the orig master.. "; $orig_master_handler->enable_read_only(); if ( $orig_master_handler->is_read_only() ) { print "ok.\n"; } else { die "Failed!\n"; } ## Waiting for M * 100 milliseconds so that current update queries can complete my $time_until_kill_threads = 5; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); while ( $time_until_kill_threads > 0 && $#threads >= 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n", current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads < 5 ) { print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n" foreach (@threads); } } sleep_until(); $_tstart = [gettimeofday]; $time_until_kill_threads--; @threads = get_threads_util( $orig_master_handler->{dbh}, $orig_master_handler->{connection_id} ); } print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); ## Terminating all threads print current_time_us() . " Killing all application threads..\n"; $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 ); print current_time_us() . " done.\n"; #$orig_master_handler->enable_log_bin_local(); $orig_master_handler->disconnect(); ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { ## Activating master ip on the new master # 1. Create app user with write privileges # 2. Moving backup script if needed # 3. Register new master's ip to the catalog database # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery. # If exit code is 0 or 10, MHA does not abort my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master #$new_master_handler->disable_log_bin_local(); print current_time_us() . " Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print current_time_us() . " Creating app user on the new master..\n"; #FIXME_xxx_create_app_user($new_master_handler); #$new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &change_binserver(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } # change binlog server to new master sub change_binserver() { my $mha_conf_binlogserver_change = "sed -i '/binlog1/{n;s#hostname=.*#hostname=$new_master_host #;}' $mha_conf"; print "Change binlog1 value in $mha_conf to new master: $new_master_host \n"; `$mha_conf_binlogserver_change`; } sub usage { print "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n"; die; }
mastermha_service.sh
master mha manager服务的启停脚本
需要修改参数:mha_conf 、 mha_log
#!/bin/bash mha_conf="/opt/mha/yourcluster_name/mhaapp.cnf" mha_log="/opt/mha/yourcluster_name/manager.log" mode=$1 # start or stop or status log_success_msg() { echo " SUCCESS! $@" } log_failure_msg() { echo " ERROR! $@" } check_running() { ext_status=`masterha_check_status --conf=${mha_conf} >/dev/null;echo $?` if test $ext_status -ne 0;then printf "Masterha manager is not running!\n" else printf "Masterha manager is running!\n" fi return $ext_status } case $mode in 'start') check_running ext_status=$? if test $ext_status -ne 0;then nohup masterha_manager --conf=${mha_conf} --ignore_last_failover < /dev/null > ${mha_log} 2>&1 & log_success_msg "Starting Masterha manager...." else log_failure_msg "Masterha manager already running!" fi ;; 'stop') check_running ext_status=$? if test $ext_status -eq 0;then masterha_stop --conf=${mha_conf} else log_failure_msg "Masterha manager already stopped!" fi ;; 'status') check_running ;; *) printf "\n\$1:[start|stop|status] input error\n" esac
mha_failover_sendmessage.py
# 集群发生主从切换时发送企业微信告警,master_ip_failover 、master_ip_online_change 脚本中调用
#!/usr/bin/python # -*- coding: utf-8 -*- """ @Create Time : 2022-03-25 @Author : wushengnan @FileName : mha_failover_sendmessage @Note: """ import datetime import requests import sys,os reload(sys) sys.setdefaultencoding('utf8') def record_vip(): new_master = sys.argv[1] old_master = sys.argv[2] vip_addr = sys.argv[3] get_dir = sys.argv[4] etime = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") cmd1 = "echo 'current_master_ip %s' > %s/current_vip_master" % (new_master, get_dir) cmd2 = "echo 'vip_ip %s' >> %s/current_vip_master" % (vip_addr, get_dir) os.system(cmd1) os.system(cmd2) ms = '''**DBA MySQL集群报警:** <font color="warning">MHA Manager发生故障转移</font> \n >%s\n >MySQL MHA集群主库切换到**%s**上\n >原主库主机%s上集群vip地址**%s**已漂移至**%s**\n''' % ( etime, new_master, old_master, vip_addr, new_master) return ms class SendWeCom(object): def __init__(self): self.url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key= your api" def sendMessage(self,ms): data = dict(msgtype="markdown", markdown={"content": ms}) res = requests.post(self.url,json=data) return res.json() if __name__ == "__main__": ms = record_vip() #print(ms) sw = SendWeCom() re = sw.sendMessage(ms) if re['errcode'] == 0: print('Send messages success...') else: print('Send messages error...')
--- 原创文档