MHA 安装与简单使用

MHA 在过去几年一直用的比较火,特别是在在传统复制的那个年代。至从有了GTID好像我们也可以把MHA给忘记了,但是很多企业现在还是在用的比较多。每个公司的MHA玩法也不太一样,但是本质都是差不多了。下面我就带大家简单快速的搭建一个MHA玩一下,看看MHA是怎么玩的。如果要深入的了解MHA我建议去github里面看看一些官方的文档或者别人的一些博客,google一下或者bing一下就能找到了。我也不去重复的写了。
MHA 搭建全过程                                                             
有时候yum localinstall 安装不了原因,可能是上面的几个包没有安装
#rpm -ql mha4mysql-manager-0.56-0.el6.noarch
 master     端口3310
 slave1:    端口3310
 slave2:     端口3310
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
sed -i 's/SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
1:master 的搭建过程
/usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql_3310/my_3310.cnf --initialize-insecure
/usr/local/mysql/bin/mysqld --defaults-file=/data/mysql/mysql_3310/my_3310.cnf &
#mysql>CREATE USER rpl@'192.168.5.%';
#mysql>GRANT REPLICATION SLAVE ON *.* TO  'rpl'@'192.168.5.%'  identified by '123456';
 change master  to master_host='',master_port=3310, master_user='rpl', master_password='123456',master_auto_position=1;


 使用 ssh-keygen 生成key
# ssh-keygen 一路回车就行了。最后会在~/,ssh 下面产生:id_rsa 两个文件。
#cd ~/.ssh/
#cat>authorized_keys  ## 每个机器都需要执行下
#chmod 600 *
保留.ssh 下面只有id_rsa, 其他文件可以删或是备份移走。
#cd ~
#scp -r .ssh
#scp   -r   .ssh
[server default]
log_level=debug    ### 开启debug的模式
#masterha_check_ssh --conf=/etc/masterha/app1.conf
#masterha_check_ssh --conf=/etc/masterha/app1.conf
Wed Nov  1 01:56:31 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Nov  1 01:56:31 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Nov  1 01:56:31 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Nov  1 01:56:31 2017 - [info] Starting SSH connection tests..
Wed Nov  1 01:56:32 2017 - [debug]
Wed Nov  1 01:56:31 2017 - [debug]  Connecting via SSH from root@ to root@
Wed Nov  1 01:56:32 2017 - [debug]   ok.
Wed Nov  1 01:56:32 2017 - [debug]
Wed Nov  1 01:56:32 2017 - [debug]  Connecting via SSH from root@ to root@
Wed Nov  1 01:56:32 2017 - [debug]   ok.
Wed Nov  1 01:56:32 2017 - [info] All SSH connection tests passed successfully.
  1. 检查 主从复制是不是正常:masterha_check_repl --conf=/etc/masterha/app1.conf
# masterha_check_repl --conf=/etc/masterha/app1.conf
Wed Nov  1 01:37:46 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Nov  1 01:37:46 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Nov  1 01:37:46 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Nov  1 01:37:46 2017 - [info] MHA::MasterMonitor version 0.56.
Wed Nov  1 01:37:47 2017 - [info] GTID failover mode = 1
Wed Nov  1 01:37:47 2017 - [info] Dead Servers:
Wed Nov  1 01:37:47 2017 - [info] Alive Servers:
Wed Nov  1 01:37:47 2017 - [info]
Wed Nov  1 01:37:47 2017 - [info]
Wed Nov  1 01:37:47 2017 - [info] Alive Slaves:
Wed Nov  1 01:37:47 2017 - [info]  Version=5.7.19-17-29.22-log (oldest major version between slaves) log-bin:enabled
Wed Nov  1 01:37:47 2017 - [info]     GTID ON
Wed Nov  1 01:37:47 2017 - [info]     Replicating from
Wed Nov  1 01:37:47 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Nov  1 01:37:47 2017 - [info] Current Alive Master:
Wed Nov  1 01:37:47 2017 - [info] Checking slave configurations..
Wed Nov  1 01:37:47 2017 - [info]  read_only=1 is not set on slave
Wed Nov  1 01:37:47 2017 - [info] Checking replication filtering settings..
Wed Nov  1 01:37:47 2017 - [info]  binlog_do_db= , binlog_ignore_db=
Wed Nov  1 01:37:47 2017 - [info]  Replication filtering check ok.
Wed Nov  1 01:37:47 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Nov  1 01:37:47 2017 - [info] Checking SSH publickey authentication settings on the current master..
Wed Nov  1 01:37:47 2017 - [info] HealthCheck: SSH to is reachable.
Wed Nov  1 01:37:47 2017 - [info] (current master)
Wed Nov  1 01:37:47 2017 - [info] Checking replication health on
Wed Nov  1 01:37:47 2017 - [info]  ok.
Wed Nov  1 01:37:47 2017 - [info] Checking master_ip_failover_script status:
Wed Nov  1 01:37:47 2017 - [info]   /etc/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host= --orig_master_ip= --orig_master_port=3310
Wed Nov  1 01:37:47 2017 - [info]  OK.
Wed Nov  1 01:37:47 2017 - [warning] shutdown_script is not defined.
Wed Nov  1 01:37:47 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
 3 检查MHA Manager的状态 #masterha_check_status --conf=/etc/masterha/app1.conf
#masterha_check_status --conf=/etc/masterha/app1.conf
app1 is stopped(2:NOT_RUNNING).
4 启动mha 监控服务:nohup masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /etc/masterha/app1/manager.log 2>&1 &
#nohup masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover < /dev/null > /etc/masterha/app1/manager.log 2>&1 &
[1] 9440
#ps aux | grep master
root      1108  0.0  0.1  89544  2148 ?        Ss   Oct30   0:00 /usr/libexec/postfix/master -w
root      9440  2.0  1.0 287396 20200 pts/5    S    01:42   0:00 perl /usr/bin/masterha_manager --conf=/etc/masterha/app1.conf --remove_dead_master_conf --ignore_last_failover
root      9463  0.0  0.0 112660   976 pts/5    R+   01:42   0:00 grep --color=auto master
5:关闭MHA Manage监控   masterha_stop --conf=/etc/masterha/app1.conf
手动切换命令:masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host= --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
#masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host= --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Wed Nov  1 04:40:30 2017 - [info] MHA::MasterRotate version 0.56.
Wed Nov  1 04:40:30 2017 - [info] Starting online master switch..
Wed Nov  1 04:40:30 2017 - [info]
Wed Nov  1 04:40:30 2017 - [info] * Phase 1: Configuration Check Phase..
Wed Nov  1 04:40:30 2017 - [info]
Wed Nov  1 04:40:30 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Nov  1 04:40:30 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Nov  1 04:40:30 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Nov  1 04:40:30 2017 - [info] GTID failover mode = 1
Wed Nov  1 04:40:30 2017 - [info] Current Alive Master:
Wed Nov  1 04:40:30 2017 - [info] Alive Slaves:
Wed Nov  1 04:40:30 2017 - [info]  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Wed Nov  1 04:40:30 2017 - [info]     GTID ON
Wed Nov  1 04:40:30 2017 - [info]     Replicating from
Wed Nov  1 04:40:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Nov  1 04:40:30 2017 - [info]  Version=5.7.19-17-29.22-log (oldest major version between slaves) log-bin:enabled
Wed Nov  1 04:40:30 2017 - [info]     GTID ON
Wed Nov  1 04:40:30 2017 - [info]     Replicating from
Wed Nov  1 04:40:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Nov  1 04:40:30 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed Nov  1 04:40:30 2017 - [info]  ok.
Wed Nov  1 04:40:30 2017 - [info] Checking MHA is not monitoring or doing failover..
Wed Nov  1 04:40:30 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
Wed Nov  1 04:40:30 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
       [root@node1 05:57:52 /etc/masterha/app1]
#masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=node2  --running_updates_limit=10000 --interactive=0
Thu Nov  2 05:58:52 2017 - [info] MHA::MasterRotate version 0.56.
Thu Nov  2 05:58:52 2017 - [info] Starting online master switch..
Thu Nov  2 05:58:52 2017 - [info]
Thu Nov  2 05:58:52 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Nov  2 05:58:52 2017 - [info]
Thu Nov  2 05:58:52 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Nov  2 05:58:52 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:58:52 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:58:52 2017 - [info] GTID failover mode = 1
Thu Nov  2 05:58:52 2017 - [info] Current Alive Master:
Thu Nov  2 05:58:52 2017 - [info] Alive Slaves:
Thu Nov  2 05:58:52 2017 - [info]  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Thu Nov  2 05:58:52 2017 - [info]     GTID ON
Thu Nov  2 05:58:52 2017 - [info]     Replicating from
Thu Nov  2 05:58:52 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  2 05:58:52 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Nov  2 05:58:52 2017 - [info]  ok.
Thu Nov  2 05:58:52 2017 - [info] Checking MHA is not monitoring or doing failover..
Thu Nov  2 05:58:52 2017 - [info] Checking replication health on
Thu Nov  2 05:58:52 2017 - [info]  ok.
Thu Nov  2 05:58:52 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln1218] node2 is not alive!
Thu Nov  2 05:58:52 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln232] Failed to get new master!
Thu Nov  2 05:58:52 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
[root@node1 05:58:52 /etc/masterha/app1]
#masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=node2  
Thu Nov  2 05:59:00 2017 - [info] MHA::MasterRotate version 0.56.
Thu Nov  2 05:59:00 2017 - [info] Starting online master switch..
Thu Nov  2 05:59:00 2017 - [info]
Thu Nov  2 05:59:00 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Nov  2 05:59:00 2017 - [info]
Thu Nov  2 05:59:00 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Nov  2 05:59:00 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:59:00 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:59:00 2017 - [info] GTID failover mode = 1
Thu Nov  2 05:59:00 2017 - [info] Current Alive Master:
Thu Nov  2 05:59:00 2017 - [info] Alive Slaves:
Thu Nov  2 05:59:00 2017 - [info]  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Thu Nov  2 05:59:00 2017 - [info]     GTID ON
Thu Nov  2 05:59:00 2017 - [info]     Replicating from
Thu Nov  2 05:59:00 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on (YES/no): yes
Thu Nov  2 05:59:03 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Nov  2 05:59:03 2017 - [info]  ok.
Thu Nov  2 05:59:03 2017 - [info] Checking MHA is not monitoring or doing failover..
Thu Nov  2 05:59:03 2017 - [info] Checking replication health on
Thu Nov  2 05:59:03 2017 - [info]  ok.
Thu Nov  2 05:59:03 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln1218] node2 is not alive!
Thu Nov  2 05:59:03 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln232] Failed to get new master!
Thu Nov  2 05:59:03 2017 - [error][/usr/share/perl5/vendor_perl/MHA/, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53.
[root@node1 05:59:03 /etc/masterha/app1]
#masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive
Thu Nov  2 05:59:26 2017 - [info] MHA::MasterRotate version 0.56.
Thu Nov  2 05:59:26 2017 - [info] Starting online master switch..
Thu Nov  2 05:59:26 2017 - [info]
Thu Nov  2 05:59:26 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Nov  2 05:59:26 2017 - [info]
Thu Nov  2 05:59:26 2017 - [info] Reading default configuration from /etc/masterha_default.cnf..
Thu Nov  2 05:59:26 2017 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:59:26 2017 - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu Nov  2 05:59:26 2017 - [info] GTID failover mode = 1
Thu Nov  2 05:59:26 2017 - [info] Current Alive Master:
Thu Nov  2 05:59:26 2017 - [info] Alive Slaves:
Thu Nov  2 05:59:26 2017 - [info]  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Thu Nov  2 05:59:26 2017 - [info]     GTID ON
Thu Nov  2 05:59:26 2017 - [info]     Replicating from
Thu Nov  2 05:59:26 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on (YES/no): yes
Thu Nov  2 05:59:28 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Nov  2 05:59:28 2017 - [info]  ok.
Thu Nov  2 05:59:28 2017 - [info] Checking MHA is not monitoring or doing failover..
Thu Nov  2 05:59:28 2017 - [info] Checking replication health on
Thu Nov  2 05:59:28 2017 - [info]  ok.
Thu Nov  2 05:59:28 2017 - [info] Searching new master from slaves..
Thu Nov  2 05:59:28 2017 - [info]  Candidate masters from the configuration file:
Thu Nov  2 05:59:28 2017 - [info]  Version=5.7.19-log log-bin:enabled
Thu Nov  2 05:59:28 2017 - [info]     GTID ON
Thu Nov  2 05:59:28 2017 - [info]  Version=5.7.19-log (oldest major version between slaves) log-bin:enabled
Thu Nov  2 05:59:28 2017 - [info]     GTID ON
Thu Nov  2 05:59:28 2017 - [info]     Replicating from
Thu Nov  2 05:59:28 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  2 05:59:28 2017 - [info]  Non-candidate masters:
Thu Nov  2 05:59:28 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Nov  2 05:59:28 2017 - [info]
From: (current master)
To: (new master)
Starting master switch from to (yes/NO): yes
Thu Nov  2 05:59:30 2017 - [info] Checking whether is ok for the new master..
Thu Nov  2 05:59:30 2017 - [info]  ok.
Thu Nov  2 05:59:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Nov  2 05:59:30 2017 - [info]
Thu Nov  2 05:59:30 2017 - [info] * Phase 2: Rejecting updates Phase..
Thu Nov  2 05:59:30 2017 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Thu Nov  2 05:59:33 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Thu Nov  2 05:59:33 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Thu Nov  2 05:59:33 2017 - [info]  ok.
Thu Nov  2 05:59:33 2017 - [info] Orig master binlog:pos is mysql-bin.000013:190.
Thu Nov  2 05:59:33 2017 - [info]  Waiting to execute all relay logs on
Thu Nov  2 05:59:33 2017 - [info]  master_pos_wait(mysql-bin.000013:190) completed on Executed 0 events.
Thu Nov  2 05:59:33 2017 - [info]   done.
Thu Nov  2 05:59:33 2017 - [info] Getting new master's binlog name and position..
Thu Nov  2 05:59:33 2017 - [info]  mysql-bin.000011:1167
Thu Nov  2 05:59:33 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='', MASTER_PORT=3310, MASTER_AUTO_POSITION=1, MASTER_USER='rpl', MASTER_PASSWORD='xxx';
Thu Nov  2 05:59:33 2017 - [info]
Thu Nov  2 05:59:33 2017 - [info] * Switching slaves in parallel..
Thu Nov  2 05:59:33 2017 - [info]
Thu Nov  2 05:59:33 2017 - [info] Unlocking all tables on the orig master:
Thu Nov  2 05:59:33 2017 - [info] Executing UNLOCK TABLES..
Thu Nov  2 05:59:33 2017 - [info]  ok.
Thu Nov  2 05:59:33 2017 - [info] All new slave servers switched successfully.
Thu Nov  2 05:59:33 2017 - [info]
Thu Nov  2 05:59:33 2017 - [info] * Phase 5: New master cleanup phase..
Thu Nov  2 05:59:33 2017 - [info]
Thu Nov  2 05:59:33 2017 - [info] Resetting slave info succeeded.
Thu Nov  2 05:59:33 2017 - [info] Switching master to completed successfully.
[root@node1 05:59:33 /etc/masterha/app1]
1.绑定 vip的 shell 脚本(当然,也可以自己手动的去绑定或者删除)
vip=""   #####注意我们的子网掩码还是需要加上的
/sbin/ip addr add $vip dev enp0s8
2.删除vip 的shell 脚本
/sbin/ip addr del $vip dev enp0s8
注意:如果,MHA manager检测到没有dead的server,将报错,并结束failover: 
