MySQL高可用方案 MHA之二 master_ip_failover

异步主从复制架构
master:
10.150.20.90 ed3jrdba90
slave:
10.15.20.97 ed3jrdba97
10.150.20.132 ed3jrdba132
manager:
10.150.20.95 ed3jrdba95
#新增VIP
vip:10.150.20.200

四台机器的系统情况：
OS:CentOS7.3
MySQL:5.7.21
MHA:0.58
网卡名：ens3

mha manager节点

1：配置app1.cnf文件
添加master_ip_failover_script的文件路径，mysql master失败时执行的切换脚本。
#vi /etc/mysql_mha/app1.cnf
#自动failover时候的切换脚本
master_ip_failover_script= /usr/local/bin/master_ip_failover

[root@dev05 ~]# cat /etc/mysql_mha/app1.cnf

[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor

[server1]
hostname=10.150.20.90
port=33061

[server1]
hostname=10.150.20.97
port=33061

[server3]
hostname=10.150.20.132
port=33061

编辑master_ip_failover脚本文件：没有使用 keepalived ，通过脚本的方式管理vip

# cp /usr/local/bin/master_ip_failover /usr/local/bin/master_ip_failover.bak
# vi /usr/local/bin/master_ip_failover

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;

my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
#############################添加内容部分#########################################
my $vip = '10.150.20.200';
my $brdc = '10.150.20.255';
my $ifdev = 'ens3';
my $key = '1';
my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
##################################################################################
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {

my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

master_ip_failover

更换ip后，一定要执行下arping

检查复制环境ssh
# masterha_check_ssh --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 14:43:27 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 14:43:27 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:43:27 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:43:27 2018 - [info] Starting SSH connection tests..
Wed Dec 12 14:43:28 2018 - [debug]
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.97(10.150.20.97:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.90(10.150.20.90:22) to root@10.150.20.132(10.150.20.132:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:28 2018 - [debug]
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.90(10.150.20.90:22)..
Wed Dec 12 14:43:27 2018 - [debug] ok.
Wed Dec 12 14:43:27 2018 - [debug] Connecting via SSH from root@10.150.20.97(10.150.20.97:22) to root@10.150.20.132(10.150.20.132:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:29 2018 - [debug]
Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.90(10.150.20.90:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:28 2018 - [debug] Connecting via SSH from root@10.150.20.132(10.150.20.132:22) to root@10.150.20.97(10.150.20.97:22)..
Wed Dec 12 14:43:28 2018 - [debug] ok.
Wed Dec 12 14:43:29 2018 - [info] All SSH connection tests passed successfully.
检查整个复制环境
# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf
Wed Dec 12 14:44:35 2018 - [info] Slaves settings check done.
Wed Dec 12 14:44:35 2018 - [info]
10.150.20.90(10.150.20.90:33061) (current master)
+--10.150.20.97(10.150.20.97:33061)
+--10.150.20.132(10.150.20.132:33061)

Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.97..
Wed Dec 12 14:44:35 2018 - [info] ok.
Wed Dec 12 14:44:35 2018 - [info] Checking replication health on 10.150.20.132..
Wed Dec 12 14:44:35 2018 - [info] ok.
Wed Dec 12 14:44:35 2018 - [info] Checking master_ip_failover_script status:
Wed Dec 12 14:44:35 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061
Wed Dec 12 14:44:35 2018 - [info] OK.
Wed Dec 12 14:44:35 2018 - [warning] shutdown_script is not defined.
Wed Dec 12 14:44:35 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

启动 mha manager
# nohup masterha_manager --conf=/etc/mysql_mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mysql_mha/app1-manager.log 2>&1 &
查看 manager status
# masterha_check_status --conf=/etc/mysql_mha/app1.cnf
查看 manager log
# tail -n 1000 -f /var/log/masterha/app1-manager.log

验证 failover

在主库qa05.010150020090.yz节点，进行vip绑定：
[root@qa05 ~]#ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
[root@qa05 ~]#/usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200

#vip解绑：
# ip addr del 10.150.20.200/24 dev ens3 label ens3:1

模拟故障，在qa05.010150020090.yz上 kill 掉 mysqld 进程
[root@qa05 ~]## ps -ef|grep -i mysql
mysql 3114 1 0 Aug06 ? 00:00:51 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
root 15551 10466 0 Aug06 pts/1 00:00:00 mysql
root 25521 21213 0 03:52 pts/2 00:00:00 grep --color=auto -i mysql
[root@qa05 ~]## kill -9 27593 26101

观察 mha manager 之前打开的日志输出
[root@dev05 ~]# tail -n 1000 -f /data/mysql_mha/app1-manager.log

Wed Dec 12 14:54:10 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Wed Dec 12 14:54:10 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Wed Dec 12 14:54:10 2018 - [info] HealthCheck: SSH to 10.150.20.90 is reachable.
Wed Dec 12 14:54:15 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
Wed Dec 12 14:54:15 2018 - [warning] Connection failed 2 time(s)..
Wed Dec 12 14:54:20 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
Wed Dec 12 14:54:20 2018 - [warning] Connection failed 3 time(s)..
Wed Dec 12 14:54:25 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.150.20.90' (111))
Wed Dec 12 14:54:25 2018 - [warning] Connection failed 4 time(s)..
Wed Dec 12 14:54:25 2018 - [warning] Master is not reachable from health checker!
Wed Dec 12 14:54:25 2018 - [warning] Master 10.150.20.90(10.150.20.90:33061) is not reachable!
Wed Dec 12 14:54:25 2018 - [warning] SSH is reachable.
Wed Dec 12 14:54:25 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mysql_mha/app1.cnf again, and trying to connect to all servers to check server status..
Wed Dec 12 14:54:25 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 14:54:25 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:54:25 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 14:54:26 2018 - [info] GTID failover mode = 0
Wed Dec 12 14:54:26 2018 - [info] Dead Servers:
Wed Dec 12 14:54:26 2018 - [info] 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:26 2018 - [info] Alive Servers:
Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061)
Wed Dec 12 14:54:26 2018 - [info] Alive Slaves:
Wed Dec 12 14:54:26 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:26 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:26 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:26 2018 - [info] Checking slave configurations..
Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 14:54:26 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.97(10.150.20.97:33061).
Wed Dec 12 14:54:26 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
Wed Dec 12 14:54:26 2018 - [info] Checking replication filtering settings..
Wed Dec 12 14:54:26 2018 - [info] Replication filtering check ok.
Wed Dec 12 14:54:26 2018 - [info] Master is down!
Wed Dec 12 14:54:26 2018 - [info] Terminating monitoring script.
Wed Dec 12 14:54:26 2018 - [info] Got exit code 20 (Master dead).
Wed Dec 12 14:54:26 2018 - [info] MHA::MasterFailover version 0.58.
Wed Dec 12 14:54:26 2018 - [info] Starting master failover.
Wed Dec 12 14:54:26 2018 - [info] 
Wed Dec 12 14:54:26 2018 - [info] * Phase 1: Configuration Check Phase..
Wed Dec 12 14:54:26 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] GTID failover mode = 0
Wed Dec 12 14:54:27 2018 - [info] Dead Servers:
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] Checking master reachability via MySQL(double check)...
Wed Dec 12 14:54:27 2018 - [info] ok.
Wed Dec 12 14:54:27 2018 - [info] Alive Servers:
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061)
Wed Dec 12 14:54:27 2018 - [info] Alive Slaves:
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] Starting Non-GTID based failover.
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Dec 12 14:54:27 2018 - [info] Executing master IP deactivation script:
Wed Dec 12 14:54:27 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --command=stopssh --ssh_user=root 
Wed Dec 12 14:54:27 2018 - [info] done.
Wed Dec 12 14:54:27 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Dec 12 14:54:27 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] * Phase 3: Master Recovery Phase..
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000006:154
Wed Dec 12 14:54:27 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000006:154
Wed Dec 12 14:54:27 2018 - [info] Oldest slaves:
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.97(10.150.20.97:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 14:54:27 2018 - [info] Replicating from 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Dec 12 14:54:27 2018 - [info] 
Wed Dec 12 14:54:27 2018 - [info] Fetching dead master's binary logs..
Wed Dec 12 14:54:27 2018 - [info] Executing command on the dead master 10.150.20.90(10.150.20.90:33061): save_binary_logs --command=save --start_file=mysql-bin.000006 --start_pos=154 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /data/mysql_mha/app1 if not exists.. ok.
Concat binary/relay logs from mysql-bin.000006 pos 154 to mysql-bin.000006 EOF into /data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
No need to dump effective binlog data from /data/mysql_33061/logs/mysql-bin.000006 (pos starts 154, filesize 154). Skipping.
Binlog Checksum enabled
/data/mysql_mha/app1/saved_master_binlog_from_10.150.20.90_33061_20181212145426.binlog has no effective data events.
Event not exists.
Wed Dec 12 14:54:28 2018 - [info] Additional events were not found from the orig master. No need to save.
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] * Phase 3.3: Determining New Master Phase..
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Dec 12 14:54:28 2018 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Dec 12 14:54:28 2018 - [info] Searching new master from slaves..
Wed Dec 12 14:54:28 2018 - [info] Candidate masters from the configuration file:
Wed Dec 12 14:54:28 2018 - [info] Non-candidate masters:
Wed Dec 12 14:54:28 2018 - [info] New master is 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 14:54:28 2018 - [info] Starting master failover..
Wed Dec 12 14:54:28 2018 - [info] 
From:
10.150.20.90(10.150.20.90:33061) (current master)
+--10.150.20.97(10.150.20.97:33061)
+--10.150.20.132(10.150.20.132:33061)

To:
10.150.20.97(10.150.20.97:33061) (new master)
+--10.150.20.132(10.150.20.132:33061)
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] * Phase 3.5: Master Log Apply Phase..
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Dec 12 14:54:28 2018 - [info] Starting recovery on 10.150.20.97(10.150.20.97:33061)..
Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
Wed Dec 12 14:54:28 2018 - [info] done.
Wed Dec 12 14:54:28 2018 - [info] All relay logs were successfully applied.
Wed Dec 12 14:54:28 2018 - [info] Getting new master's binlog name and position..
Wed Dec 12 14:54:28 2018 - [info] mysql-bin.000010:2774
Wed Dec 12 14:54:28 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.150.20.97', MASTER_PORT=33061, MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=2774, MASTER_USER='replicator', MASTER_PASSWORD='xxx';
Wed Dec 12 14:54:28 2018 - [info] Executing master IP activate script:
Wed Dec 12 14:54:28 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.150.20.90 --orig_master_ip=10.150.20.90 --orig_master_port=33061 --new_master_host=10.150.20.97 --new_master_ip=10.150.20.97 --new_master_port=33061 --new_master_user='mha_monitor' --new_master_password=xxx
Set read_only=0 on the new master.
Creating app user on the new master..
Wed Dec 12 14:54:28 2018 - [info] OK.
Wed Dec 12 14:54:28 2018 - [info] ** Finished master recovery successfully.
Wed Dec 12 14:54:28 2018 - [info] * Phase 3: Master Recovery Phase completed.
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] * Phase 4: Slaves Recovery Phase..
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Dec 12 14:54:28 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] -- Slave diff file generation on host 10.150.20.132(10.150.20.132:33061) started, pid: 25117. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
Wed Dec 12 14:54:29 2018 - [info] 
Wed Dec 12 14:54:29 2018 - [info] Log messages from 10.150.20.132 ...
Wed Dec 12 14:54:29 2018 - [info] 
Wed Dec 12 14:54:28 2018 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Dec 12 14:54:29 2018 - [info] End of log messages from 10.150.20.132.
Wed Dec 12 14:54:29 2018 - [info] -- 10.150.20.132(10.150.20.132:33061) has the latest relay log events.
Wed Dec 12 14:54:29 2018 - [info] Generating relay diff files from the latest slave succeeded.
Wed Dec 12 14:54:29 2018 - [info] 
Wed Dec 12 14:54:29 2018 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Dec 12 14:54:29 2018 - [info] 
Wed Dec 12 14:54:29 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) started, pid: 25119. Check tmp log /data/mysql_mha/app1/10.150.20.132_33061_20181212145426.log if it takes time..
Wed Dec 12 14:54:30 2018 - [info] 
Wed Dec 12 14:54:30 2018 - [info] Log messages from 10.150.20.132 ...
Wed Dec 12 14:54:30 2018 - [info] 
Wed Dec 12 14:54:29 2018 - [info] Starting recovery on 10.150.20.132(10.150.20.132:33061)..
Wed Dec 12 14:54:29 2018 - [info] This server has all relay logs. Waiting all logs to be applied.. 
Wed Dec 12 14:54:29 2018 - [info] done.
Wed Dec 12 14:54:29 2018 - [info] All relay logs were successfully applied.
Wed Dec 12 14:54:29 2018 - [info] Resetting slave 10.150.20.132(10.150.20.132:33061) and starting replication from the new master 10.150.20.97(10.150.20.97:33061)..
Wed Dec 12 14:54:29 2018 - [info] Executed CHANGE MASTER.
Wed Dec 12 14:54:29 2018 - [info] Slave started.
Wed Dec 12 14:54:30 2018 - [info] End of log messages from 10.150.20.132.
Wed Dec 12 14:54:30 2018 - [info] -- Slave recovery on host 10.150.20.132(10.150.20.132:33061) succeeded.
Wed Dec 12 14:54:30 2018 - [info] All new slave servers recovered successfully.
Wed Dec 12 14:54:30 2018 - [info] 
Wed Dec 12 14:54:30 2018 - [info] * Phase 5: New master cleanup phase..
Wed Dec 12 14:54:30 2018 - [info] 
Wed Dec 12 14:54:30 2018 - [info] Resetting slave info on the new master..
Wed Dec 12 14:54:30 2018 - [info] 10.150.20.97: Resetting slave info succeeded.
Wed Dec 12 14:54:30 2018 - [info] Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.
Wed Dec 12 14:54:30 2018 - [info] Deleted server1 entry from /etc/mysql_mha/app1.cnf .
Wed Dec 12 14:54:30 2018 - [info]

----- Failover Report -----

app1: MySQL Master failover 10.150.20.90(10.150.20.90:33061) to 10.150.20.97(10.150.20.97:33061) succeeded

Master 10.150.20.90(10.150.20.90:33061) is down!

Check MHA Manager logs at dev05.010150020095.yz:/data/mysql_mha/app1-manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 10.150.20.90(10.150.20.90:33061)
The latest slave 10.150.20.97(10.150.20.97:33061) has all relay logs for recovery.
Selected 10.150.20.97(10.150.20.97:33061) as a new master.
10.150.20.97(10.150.20.97:33061): OK: Applying all logs succeeded.
10.150.20.97(10.150.20.97:33061): OK: Activated master IP address.
10.150.20.132(10.150.20.132:33061): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.150.20.132(10.150.20.132:33061): OK: Applying all logs succeeded. Slave started, replicating from 10.150.20.97(10.150.20.97:33061)
10.150.20.97(10.150.20.97:33061): Resetting slave info succeeded.
Master failover to 10.150.20.97(10.150.20.97:33061) completed successfully.

app1-manager.log

从日志，可以看出new master切换至10.150.20.97，此时manager节点mha manager关闭

[root@dev05 ~]# masterha_check_status --conf=/etc/mysql_mha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

而新主qa06.010150020097.yz，vip绑定到ens3网卡上
[root@qa06 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 54:52:00:49:48:92 brd ff:ff:ff:ff:ff:ff
inet 10.150.20.97/24 brd 10.150.20.255 scope global ens3
valid_lft forever preferred_lft forever
inet 10.150.20.200/24 brd 10.150.20.255 scope global secondary ens3:1
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:7f:36:38:fe brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever

此时的mha manager节点的配置文件app1.cnf被修改为：
[root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor

[server2]
hostname=10.150.20.97
port=33061

[server3]
hostname=10.150.20.132
port=33061

重新编辑app1.cnf
[root@dev05 ~]#cat /etc/mysql_mha/app1.cnf
[server default]
manager_log=/data/mysql_mha/app1-manager.log
manager_workdir=/data/mysql_mha/app1
master_binlog_dir=/data/mysql_33061/logs
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha_monitor
ping_interval=5
remote_workdir=/data/mysql_mha/app1
repl_password=replicator
repl_user=replicator
shutdown_script=""
ssh_user=root
user=mha_monitor

[server1]
hostname=10.150.20.97
port=33061
[server2]
hostname=10.150.20.90
port=33061
[server3]
hostname=10.150.20.132
port=33061

重启qa05.010150020090.yz的MySQL,搭建主从，指向新主
mysql> change master to
master_host='10.150.20.97',
master_user='replicator',
master_password='replicator',
master_port=33061,
master_log_file='mysql-bin.000010',
master_log_pos=2774;
mysql> start slave;

检测复制环境
# masterha_check_repl --conf=/etc/mysql_mha/app1.cnf

Wed Dec 12 15:06:56 2018 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed Dec 12 15:06:56 2018 - [info] Reading application default configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 15:06:56 2018 - [info] Reading server configuration from /etc/mysql_mha/app1.cnf..
Wed Dec 12 15:06:56 2018 - [info] MHA::MasterMonitor version 0.58.
Wed Dec 12 15:06:57 2018 - [info] GTID failover mode = 0
Wed Dec 12 15:06:57 2018 - [info] Dead Servers:
Wed Dec 12 15:06:57 2018 - [info] Alive Servers:
Wed Dec 12 15:06:57 2018 - [info] 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061)
Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061)
Wed Dec 12 15:06:57 2018 - [info] Alive Slaves:
Wed Dec 12 15:06:57 2018 - [info] 10.150.20.90(10.150.20.90:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 15:06:57 2018 - [info] 10.150.20.132(10.150.20.132:33061) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Wed Dec 12 15:06:57 2018 - [info] Replicating from 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 15:06:57 2018 - [info] Current Alive Master: 10.150.20.97(10.150.20.97:33061)
Wed Dec 12 15:06:57 2018 - [info] Checking slave configurations..
Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.90(10.150.20.90:33061).
Wed Dec 12 15:06:57 2018 - [warning] relay_log_purge=0 is not set on slave 10.150.20.90(10.150.20.90:33061).
Wed Dec 12 15:06:57 2018 - [info] read_only=1 is not set on slave 10.150.20.132(10.150.20.132:33061).
Wed Dec 12 15:06:57 2018 - [info] Checking replication filtering settings..
Wed Dec 12 15:06:57 2018 - [info] binlog_do_db= , binlog_ignore_db= 
Wed Dec 12 15:06:57 2018 - [info] Replication filtering check ok.
Wed Dec 12 15:06:57 2018 - [info] GTID (with auto-pos) is not supported
Wed Dec 12 15:06:57 2018 - [info] Starting SSH connection tests..
Wed Dec 12 15:06:59 2018 - [info] All SSH connection tests passed successfully.
Wed Dec 12 15:06:59 2018 - [info] Checking MHA Node version..
Wed Dec 12 15:07:00 2018 - [info] Version check ok.
Wed Dec 12 15:07:00 2018 - [info] Checking SSH publickey authentication settings on the current master..
Wed Dec 12 15:07:00 2018 - [info] HealthCheck: SSH to 10.150.20.97 is reachable.
Wed Dec 12 15:07:00 2018 - [info] Master MHA Node version is 0.58.
Wed Dec 12 15:07:00 2018 - [info] Checking recovery script configurations on 10.150.20.97(10.150.20.97:33061)..
Wed Dec 12 15:07:00 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_33061/logs --output_file=/data/mysql_mha/app1/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000010 
Wed Dec 12 15:07:00 2018 - [info] Connecting to root@10.150.20.97(10.150.20.97:22).. 
Creating /data/mysql_mha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql_33061/logs, up to mysql-bin.000010
Wed Dec 12 15:07:01 2018 - [info] Binlog setting check done.
Wed Dec 12 15:07:01 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.90 --slave_ip=10.150.20.90 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.90(10.150.20.90:22).. 
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/logs, up to relaylog.000002
Temporary relay log file is /data/mysql_33061/logs/relaylog.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Dec 12 15:07:01 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha_monitor' --slave_host=10.150.20.132 --slave_ip=10.150.20.132 --slave_port=33061 --workdir=/data/mysql_mha/app1 --target_version=5.7.21-log --manager_version=0.58 --relay_log_info=/data/mysql_33061/logs/relay-log.info --relay_dir=/data/mysql_33061/data/ --slave_pass=xxx
Wed Dec 12 15:07:01 2018 - [info] Connecting to root@10.150.20.132(10.150.20.132:22).. 
Checking slave recovery environment settings..
Opening /data/mysql_33061/logs/relay-log.info ... ok.
Relay log found at /data/mysql_33061/data, up to cgdb-relay-bin.000002
Temporary relay log file is /data/mysql_33061/data/cgdb-relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Dec 12 15:07:01 2018 - [info] Slaves settings check done.
Wed Dec 12 15:07:01 2018 - [info] 
10.150.20.97(10.150.20.97:33061) (current master)
+--10.150.20.90(10.150.20.90:33061)
+--10.150.20.132(10.150.20.132:33061)

Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.90..
Wed Dec 12 15:07:01 2018 - [info] ok.
Wed Dec 12 15:07:01 2018 - [info] Checking replication health on 10.150.20.132..
Wed Dec 12 15:07:01 2018 - [info] ok.
Wed Dec 12 15:07:01 2018 - [info] Checking master_ip_failover_script status:
Wed Dec 12 15:07:01 2018 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.150.20.97 --orig_master_ip=10.150.20.97 --orig_master_port=33061 
Wed Dec 12 15:07:02 2018 - [info] OK.
Wed Dec 12 15:07:02 2018 - [warning] shutdown_script is not defined.
Wed Dec 12 15:07:02 2018 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

复制环境

小结：
1：搭建MHA时，vip绑定需要自行绑定到主库；当主库发生failover，vip会绑定到新主
2:发生master_ip_failover之后，mha监控程序自动断掉；
3：vip绑定：
# ip addr add 10.150.20.200/24 brd 10.150.20.255 dev ens3 label ens3:1
# /usr/sbin/arping -q -A -c 1 -I ens3 10.150.20.200
vip解绑：
# ip addr del 10.150.20.200/24 dev ens3 label ens3:1
4：关闭mha监控程序为：
# masterha_stop --conf=/etc/mysql_mha/app1.cnf
Stopped app1 successfully.

5：failover的过程，基本为以下步骤：
1）.配置文件检查阶段，这个阶段会检查整个集群配置文件
2）.宕机的master处理，这个阶段包括虚拟ip摘除操作，主机关机操作
3）.复制dead master和最新slave相差的relay log，并保存到MHA Manger具体的目录下
4）.识别含有最新更新的slave
5）.应用从master保存的二进制日志事件（binlog events）
6）.提升一个slave为新的master进行复制
7）.使其他的slave连接新的master进行复制

posted on 2018-12-10 10:49 HelonTian 阅读(2176) 评论(0) 编辑收藏举报

MySQL高可用方案 MHA之二 master_ip_failover

搜索

常用链接

随笔分类

随笔档案

文章分类

友情链接

阅读排行榜

推荐排行榜