各节点架构:
192.168.1.20(mysql5.5) | master主库 |
192.168.1.21(mysql5.5) | slave1,目标:主库宕可提升为主库 |
192.168.1.22(mysql5.5) | slave2,目标:主库宕不可提升为主库 |
192.168.1.25(percona5.6) | slave3、mha-manager、binlog server,目标:主库宕不可提升为主库 |
配置各节点ssh信任,在其中一台执行:
# cd ~/.ssh # cat id_rsa.pub > authorized_keys # chmod 600 * # scp -r /root/.ssh 192.168.1.20:~/ # scp -r /root/.ssh 192.168.1.21:~/ # scp -r /root/.ssh 192.168.1.22:~/ (注意目标文件权限应为600) # ssh 192.168.1.20 完成测试
192.168.1.25上binlog server启动:(5.6版本后才有)
[root@mysql1 /]# /data/mysql/percona_3309/master_binlog --用于后面配置binlog的接收目录 [root@mysql1 /]# mysqlbinlog -R --host=192.168.1.20 --user=root --password=root --raw --stop-never mysql-bin.000001 & [1] 6777
mha manager节点安装(25): --后两个包需要先配置epel网络源才能安装
yum install perl-DBD-MySQL yum install perl-Config-Tiny yum install perl-Log-Dispatch yum install perl-Parallel-ForkManager [root@mysql1 ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm [root@mysql1 ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
mha node节点安装(20 、21 、22):
yum install perl-DBD-MySQL [root@mysql1 ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
mha-master机器配置(25):
-
[root@mysql1 master_binlog]# cat /etc/masterha_default.cnf [server default] user=root password=root ssh_user=root repl_user=slave repl_password=slave ping_interval=1 shutdown_script=""
-
[root@mysql1 master_binlog]# cat /etc/app1.cnf [server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/app1.log remote_workdir=/var/log/masterha/app1 [server1] hostname=192.168.1.20 master_binlog_dir=/mysql/data/ candidate_master=1 check_repl_delay=0 [server2] hostname=192.168.1.21 master_binlog_dir=/mysql/data/ candidate_master=1 check_repl_delay=0 [server3] hostname=192.168.1.22 master_binlog_dir=/mysql/data/ no_master=1 ignore_fail=1 [server4] hostname=192.168.1.25 master_binlog_dir=/data/mysql/user_3306/data/ no_master=1 ignore_fail=1 [binlog1] hostname=192.168.1.25 master_binlog_dir=/data/mysql/percona_3309/master_binlog no_master=1 ignore_fail=1
master节点做信任检查、环境检查
masterha_check_repl遇见的几处故障:
[root@mysql1 ~]# masterha_check_repl --conf=/etc/app1.cnf ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln781]Multi-master configuration is detected, but two or more masters are either writable (read-only is not set) or dead!Check configurations for details.Master configurations are as below: Master192.168.1.20(192.168.1.20:3306), replicating from 192.168.1.21(192.168.1.21:3306) Master192.168.1.21(192.168.1.21:3306), replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424]Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326 ThuJul3100:25:482014-[error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523]Error happened on monitoring servers. ThuJul3100:25:482014-[info]Got exit code 1(Not master dead). MySQLReplicationHealth is NOT OK! 处理办法:日志报说有多个主,经过检查,发现20和21为主主,关闭20的slave。
Can't exec "mysqlbinlog":没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 493
处理办法:
在所有节点上执行
which mysqlbinlog; --/mysql/bin/mysqlbinlog
ln -s /mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ThuJul3100:56:012014-[info] Connecting to root@192.168.1.21(192.168.1.21:22).. Creating directory /var/log/masterha/app1.. done. Checking slave recovery environment settings.. Opening/mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to likun1-relay-bin.143287 Temporary relay log file is /mysql/data/likun1-relay-bin.143287 Testing mysql connection and privileges..sh: mysql: command not found mysql command failed with rc 127:0! at /usr/bin/apply_diff_relay_logs line 375 main::check() called at /usr/bin/apply_diff_relay_logs line 497 eval {...} called at /usr/bin/apply_diff_relay_logs line 475 main::main() called at /usr/bin/apply_diff_relay_logs line 120 解决办法:跟上面一样 ln -s `which mysql`/usr/bin/mysql
ThuJul3101:07:032014-[info] Connecting to root@192.168.1.21(192.168.1.21:22).. Checking slave recovery environment settings.. Opening/mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to likun1-relay-bin.164206 Temporary relay log file is /mysql/data/likun1-relay-bin.164206 Testing mysql connection and privileges.. done. Testing mysqlbinlog output..mysqlbinlog:File'/mysql/data/likun1-relay-bin.164206' not found (Errcode:2) mysqlbinlog failed with rc 1:0! 解决办法:多个从之间server-id重复,导致从库大量重连,产生大量relay-bin-log,由于relay_log_purge=ON,验证时文件不存在。
masterha_check_repl正确的完整输出:
[root@mysql1 bin]# masterha_check_repl --conf=/etc/app1.cnf ThuJul3101:35:322014-[info]Readingdefault configuration from /etc/masterha_default.cnf.. ThuJul3101:35:322014-[info]Reading application default configuration from /etc/app1.cnf.. ThuJul3101:35:322014-[info]Reading server configuration from /etc/app1.cnf.. ThuJul3101:35:322014-[info] MHA::MasterMonitor version 0.56. ThuJul3101:35:332014-[info] GTID failover mode =0 ThuJul3101:35:332014-[info]DeadServers: ThuJul3101:35:332014-[info]AliveServers: ThuJul3101:35:332014-[info] 192.168.1.20(192.168.1.20:3306) ThuJul3101:35:332014-[info] 192.168.1.21(192.168.1.21:3306) ThuJul3101:35:332014-[info] 192.168.1.22(192.168.1.22:3306) ThuJul3101:35:332014-[info] 192.168.1.25(192.168.1.25:3306) ThuJul3101:35:332014-[info]AliveSlaves: ThuJul3101:35:332014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:35:332014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:35:332014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:35:332014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:35:332014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:35:332014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:35:332014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:35:332014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:35:332014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:35:332014-[info]CurrentAliveMaster:192.168.1.20(192.168.1.20:3306) ThuJul3101:35:332014-[info]Checking slave configurations.. ThuJul3101:35:332014-[info] read_only=1 is not set on slave 192.168.1.21(192.168.1.21:3306). ThuJul3101:35:332014-[warning] relay_log_purge=0 is not set on slave 192.168.1.21(192.168.1.21:3306). ThuJul3101:35:332014-[info] read_only=1 is not set on slave 192.168.1.22(192.168.1.22:3306). ThuJul3101:35:332014-[warning] relay_log_purge=0 is not set on slave 192.168.1.22(192.168.1.22:3306). ThuJul3101:35:332014-[info] read_only=1 is not set on slave 192.168.1.25(192.168.1.25:3306). ThuJul3101:35:332014-[warning] relay_log_purge=0 is not set on slave 192.168.1.25(192.168.1.25:3306). ThuJul3101:35:332014-[info]Checking replication filtering settings.. ThuJul3101:35:332014-[info] binlog_do_db=, binlog_ignore_db= ThuJul3101:35:332014-[info] Replication filtering check ok. ThuJul3101:35:332014-[info] GTID (with auto-pos) is not supported ThuJul3101:35:332014-[info]Starting SSH connection tests.. ThuJul3101:35:562014-[info]All SSH connection tests passed successfully. ThuJul3101:35:562014-[info]Checking MHA Node version.. ThuJul3101:36:042014-[info] Version check ok. ThuJul3101:36:042014-[info]Checking SSH publickey authentication settings on the current master.. ThuJul3101:36:062014-[info]HealthCheck: SSH to 192.168.1.20 is reachable. ThuJul3101:36:092014-[info]Master MHA Node version is 0.56. ThuJul3101:36:092014-[info]Checking recovery script configurations on 192.168.1.20(192.168.1.20:3306).. ThuJul3101:36:092014-[info] Executing command: save_binary_logs --command=test --start_pos=4--binlog_dir=/mysql/data/--output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56--start_file=mysql-bin.000017 ThuJul3101:36:092014-[info] Connecting to root@192.168.1.20(192.168.1.20:22).. Creating/var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /mysql/data/, up to mysql-bin.000017 ThuJul3101:36:112014-[info]Binlog setting check done. ThuJul3101:36:112014-[info]Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. ThuJul3101:36:112014-[info] Executing command : apply_diff_relay_logs --command=test --slave_user='root'--slave_host=192.168.1.21--slave_ip=192.168.1.21--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.30-log --manager_version=0.56--relay_log_info=/mysql/data/relay-log.info --relay_dir=/mysql/data/ --slave_pass=xxx ThuJul3101:36:112014-[info] Connecting to root@192.168.1.21(192.168.1.21:22).. Checking slave recovery environment settings.. Opening/mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to likun1-relay-bin.197850 Temporary relay log file is /mysql/data/likun1-relay-bin.197850 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. ThuJul3101:36:142014-[info] Executing command : apply_diff_relay_logs --command=test --slave_user='root'--slave_host=192.168.1.22--slave_ip=192.168.1.22--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.30-log --manager_version=0.56--relay_log_info=/mysql/data/relay-log.info --relay_dir=/mysql/data/ --slave_pass=xxx ThuJul3101:36:142014-[info] Connecting to root@192.168.1.22(192.168.1.22:22).. Creating directory /var/log/masterha/app1.. done. Checking slave recovery environment settings.. Opening/mysql/data/relay-log.info ... ok. Relay log found at /mysql/data, up to likun1-relay-bin.197850 Temporary relay log file is /mysql/data/likun1-relay-bin.197850 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. ThuJul3101:36:172014-[info] Executing command : apply_diff_relay_logs --command=test --slave_user='root'--slave_host=192.168.1.25--slave_ip=192.168.1.25--slave_port=3306--workdir=/var/log/masterha/app1 --target_version=5.5.37-log --manager_version=0.56--relay_log_info=/data/mysql/user_3306/data/relay-log.info --relay_dir=/data/mysql/user_3306/data/ --slave_pass=xxx ThuJul3101:36:172014-[info] Connecting to root@192.168.1.25(192.168.1.25:22).. Checking slave recovery environment settings.. Opening/data/mysql/user_3306/data/relay-log.info ... ok. Relay log found at /data/mysql/user_3306/data, up to mysql1-relay-bin.000026 Temporary relay log file is /data/mysql/user_3306/data/mysql1-relay-bin.000026 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. ThuJul3101:36:212014-[info]Slaves settings check done. ThuJul3101:36:212014-[info] 192.168.1.20(192.168.1.20:3306)(current master) +--192.168.1.21(192.168.1.21:3306) +--192.168.1.22(192.168.1.22:3306) +--192.168.1.25(192.168.1.25:3306) ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.21.. ThuJul3101:36:212014-[info] ok. ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.22.. ThuJul3101:36:212014-[info] ok. ThuJul3101:36:212014-[info]Checking replication health on 192.168.1.25.. ThuJul3101:36:212014-[info] ok. ThuJul3101:36:212014-[warning] master_ip_failover_script is not defined. ThuJul3101:36:212014-[warning] shutdown_script is not defined. ThuJul3101:36:212014-[info]Got exit code 0(Not master dead). MySQLReplicationHealth is OK.
masterha_check_ssh正确的完整输出:
[root@mysql1 bin]# masterha_check_ssh -conf=/etc/app1.cnf ThuJul3101:47:522014-[info]Readingdefault configuration from /etc/masterha_default.cnf.. ThuJul3101:47:522014-[info]Reading application default configuration from /etc/app1.cnf.. ThuJul3101:47:522014-[info]Reading server configuration from /etc/app1.cnf.. ThuJul3101:47:522014-[info]Starting SSH connection tests.. ThuJul3101:48:002014-[debug] ThuJul3101:47:522014-[debug] Connecting via SSH from root@192.168.1.21(192.168.1.21:22) to root@192.168.1.20(192.168.1.20:22).. ThuJul3101:47:542014-[debug] ok. ThuJul3101:47:542014-[debug] Connecting via SSH from root@192.168.1.21(192.168.1.21:22) to root@192.168.1.22(192.168.1.22:22).. ThuJul3101:47:572014-[debug] ok. ThuJul3101:47:572014-[debug] Connecting via SSH from root@192.168.1.21(192.168.1.21:22) to root@192.168.1.25(192.168.1.25:22).. ThuJul3101:48:002014-[debug] ok. ThuJul3101:48:002014-[debug] ThuJul3101:47:532014-[debug] Connecting via SSH from root@192.168.1.22(192.168.1.22:22) to root@192.168.1.20(192.168.1.20:22).. ThuJul3101:47:552014-[debug] ok. ThuJul3101:47:552014-[debug] Connecting via SSH from root@192.168.1.22(192.168.1.22:22) to root@192.168.1.21(192.168.1.21:22).. ThuJul3101:47:582014-[debug] ok. ThuJul3101:47:582014-[debug] Connecting via SSH from root@192.168.1.22(192.168.1.22:22) to root@192.168.1.25(192.168.1.25:22).. ThuJul3101:48:002014-[debug] ok. ThuJul3101:48:052014-[debug] ThuJul3101:47:522014-[debug] Connecting via SSH from root@192.168.1.20(192.168.1.20:22) to root@192.168.1.21(192.168.1.21:22).. Address192.168.1.21 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! ThuJul3101:47:542014-[debug] ok. ThuJul3101:47:542014-[debug] Connecting via SSH from root@192.168.1.20(192.168.1.20:22) to root@192.168.1.22(192.168.1.22:22).. Address192.168.1.22 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! ThuJul3101:47:572014-[debug] ok. ThuJul3101:47:572014-[debug] Connecting via SSH from root@192.168.1.20(192.168.1.20:22) to root@192.168.1.25(192.168.1.25:22).. Address192.168.1.25 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT! ThuJul3101:48:052014-[debug] ok. ThuJul3101:48:062014-[debug] ThuJul3101:47:532014-[debug] Connecting via SSH from root@192.168.1.25(192.168.1.25:22) to root@192.168.1.20(192.168.1.20:22).. ThuJul3101:47:572014-[debug] ok. ThuJul3101:47:572014-[debug] Connecting via SSH from root@192.168.1.25(192.168.1.25:22) to root@192.168.1.21(192.168.1.21:22).. ThuJul3101:48:022014-[debug] ok. ThuJul3101:48:022014-[debug] Connecting via SSH from root@192.168.1.25(192.168.1.25:22) to root@192.168.1.22(192.168.1.22:22).. ThuJul3101:48:062014-[debug] ok. ThuJul3101:48:062014-[info]All SSH connection tests passed successfully.
启动mha master:
[root@mysql1 bin]# nohup masterha_manager --conf=/etc/app1.cnf > /tmp/mha_manager.log 2>&1 &
[2] 6389
检查mha master运行状态
[root@mysql1 master_binlog]# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:6389) is running(0:PING_OK), master:192.168.1.20
停止mha
[root@mysql1 master_binlog]# masterha_stop --conf=/etc/app1.cnf
Stopped app1 successfully.
到此mha搭建完成!!!
主库fail over 测试:
20机器停止mysql运行,tail -f /var/log/masterha/app1/app1.log 查看 mha处理过程,观察主切换到哪里,还有change master语句也会打印出来。
ThuJul3101:58:482014-[warning]Got error on MySQL select ping:2006(MySQL server has gone away) ThuJul3101:58:482014-[info]Executing SSH check script: save_binary_logs --command=test --start_pos=4--binlog_dir=/mysql/data/--output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56--binlog_prefix=mysql-bin ThuJul3101:58:492014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at 'reading initial communication packet', system error:111) ThuJul3101:58:492014-[warning]Connection failed 2 time(s).. ThuJul3101:58:502014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at 'reading initial communication packet', system error:111) ThuJul3101:58:502014-[warning]Connection failed 3 time(s).. ThuJul3101:58:512014-[warning]Got error on MySQL connect:2013(Lost connection to MySQL server at 'reading initial communication packet', system error:111) ThuJul3101:58:512014-[warning]Connection failed 4 time(s).. ThuJul3101:58:532014-[warning]HealthCheck:Got timeout on checking SSH connection to 192.168.1.20! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. ThuJul3101:58:532014-[warning]Master is not reachable from health checker! ThuJul3101:58:532014-[warning]Master192.168.1.20(192.168.1.20:3306) is not reachable! ThuJul3101:58:532014-[warning] SSH is NOT reachable. ThuJul3101:58:532014-[info]Connecting to a master server failed.Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status.. ThuJul3101:58:532014-[info]Readingdefault configuration from /etc/masterha_default.cnf.. ThuJul3101:58:532014-[info]Reading application default configuration from /etc/app1.cnf.. ThuJul3101:58:532014-[info]Reading server configuration from /etc/app1.cnf.. ThuJul3101:58:542014-[info] GTID failover mode =0 ThuJul3101:58:542014-[info]DeadServers: ThuJul3101:58:542014-[info] 192.168.1.20(192.168.1.20:3306) ThuJul3101:58:542014-[info]AliveServers: ThuJul3101:58:542014-[info] 192.168.1.21(192.168.1.21:3306) ThuJul3101:58:542014-[info] 192.168.1.22(192.168.1.22:3306) ThuJul3101:58:542014-[info] 192.168.1.25(192.168.1.25:3306) ThuJul3101:58:542014-[info]AliveSlaves: ThuJul3101:58:542014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:58:542014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:58:542014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:58:542014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:58:542014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:58:542014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:58:542014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:58:542014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:58:542014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:58:542014-[info]Checking slave configurations.. ThuJul3101:58:542014-[info] read_only=1 is not set on slave 192.168.1.21(192.168.1.21:3306). ThuJul3101:58:542014-[warning] relay_log_purge=0 is not set on slave 192.168.1.21(192.168.1.21:3306). ThuJul3101:58:542014-[info] read_only=1 is not set on slave 192.168.1.22(192.168.1.22:3306). ThuJul3101:58:542014-[warning] relay_log_purge=0 is not set on slave 192.168.1.22(192.168.1.22:3306). ThuJul3101:58:542014-[info] read_only=1 is not set on slave 192.168.1.25(192.168.1.25:3306). ThuJul3101:58:542014-[warning] relay_log_purge=0 is not set on slave 192.168.1.25(192.168.1.25:3306). ThuJul3101:58:542014-[info]Checking replication filtering settings.. ThuJul3101:58:542014-[info] Replication filtering check ok. ThuJul3101:58:542014-[info]Master is down! ThuJul3101:58:542014-[info]Terminating monitoring script. ThuJul3101:58:542014-[info]Got exit code 20(Master dead). ThuJul3101:58:542014-[info] MHA::MasterFailover version 0.56. ThuJul3101:58:542014-[info]Starting master failover. ThuJul3101:58:542014-[info] ThuJul3101:58:542014-[info]*Phase1:ConfigurationCheckPhase.. ThuJul3101:58:542014-[info] ThuJul3101:58:582014-[info]HealthCheck: SSH to 192.168.1.25 is reachable. ThuJul3101:59:022014-[info]Binlog server 192.168.1.25 is reachable. ThuJul3101:59:032014-[info] GTID failover mode =0 ThuJul3101:59:032014-[info]DeadServers: ThuJul3101:59:032014-[info] 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info]Checking master reachability via MySQL(double check)... ThuJul3101:59:032014-[info] ok. ThuJul3101:59:032014-[info]AliveServers: ThuJul3101:59:032014-[info] 192.168.1.21(192.168.1.21:3306) ThuJul3101:59:032014-[info] 192.168.1.22(192.168.1.22:3306) ThuJul3101:59:032014-[info] 192.168.1.25(192.168.1.25:3306) ThuJul3101:59:032014-[info]AliveSlaves: ThuJul3101:59:032014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:59:032014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info]StartingNon-GTID based failover. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]**Phase1:ConfigurationCheckPhase completed. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase2:DeadMasterShutdownPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Forcing shutdown so that applications never connect to the current master.. ThuJul3101:59:032014-[warning] master_ip_failover_script is not set.Skipping invalidating dead master IP address. ThuJul3101:59:032014-[warning] shutdown_script is not set.Skippingexplicit shutting down of the dead master. ThuJul3101:59:032014-[info]*Phase2:DeadMasterShutdownPhase completed. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3:MasterRecoveryPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3.1:GettingLatestSlavesPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]The latest binary log file/position on all slaves is mysql-bin.000017:486 ThuJul3101:59:032014-[info]Latest slaves (Slaves that received relay log files to the latest): ThuJul3101:59:032014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:59:032014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info]The oldest binary log file/position on all slaves is mysql-bin.000017:486 ThuJul3101:59:032014-[info]Oldest slaves: ThuJul3101:59:032014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:59:032014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3.2:SavingDeadMaster's BinlogPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[warning]DeadMaster is not SSH reachable.Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3.3:DeterminingNewMasterPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Finding the latest slave that has all relay logs for recovering other slaves.. ThuJul3101:59:032014-[info]All slaves received relay logs to the same position.No need to resync each other. ThuJul3101:59:032014-[info]Searchingnew master from slaves.. ThuJul3101:59:032014-[info] Candidate masters from the configuration file: ThuJul3101:59:032014-[info] 192.168.1.21(192.168.1.21:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Primary candidate for the newMaster(candidate_master is set) ThuJul3101:59:032014-[info] Non-candidate masters: ThuJul3101:59:032014-[info] 192.168.1.22(192.168.1.22:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] 192.168.1.25(192.168.1.25:3306) Version=5.5.37-log (oldest major version between slaves) log-bin:enabled ThuJul3101:59:032014-[info] Replicating from 192.168.1.20(192.168.1.20:3306) ThuJul3101:59:032014-[info] Not candidate for the newMaster(no_master is set) ThuJul3101:59:032014-[info] Searching from candidate_master slaves which have received the latest relay log events.. ThuJul3101:59:032014-[info]New master is 192.168.1.21(192.168.1.21:3306) ThuJul3101:59:032014-[info]Starting master failover.. ThuJul3101:59:032014-[info] From: 192.168.1.20(192.168.1.20:3306)(current master) +--192.168.1.21(192.168.1.21:3306) +--192.168.1.22(192.168.1.22:3306) +--192.168.1.25(192.168.1.25:3306) To: 192.168.1.21(192.168.1.21:3306)(new master) +--192.168.1.22(192.168.1.22:3306) +--192.168.1.25(192.168.1.25:3306) ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3.3:NewMasterDiffLogGenerationPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info] This server has all relay logs.No need to generate diff files from the latest slave. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase3.4:MasterLogApplyPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*NOTICE:If any error happens from this phase, manual recovery is needed. ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.21(192.168.1.21:3306).. ThuJul3101:59:032014-[info] This server has all relay logs.Waiting all logs to be applied.. ThuJul3101:59:032014-[info] done. ThuJul3101:59:032014-[info] All relay logs were successfully applied. ThuJul3101:59:032014-[info]Gettingnew master's binlog name and position.. ThuJul3101:59:032014-[info] mysql-bin.000011:569 ThuJul3101:59:032014-[info] All other slaves should start replication from here.Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.21', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=569, MASTER_USER='slave', MASTER_PASSWORD='xxx'; ThuJul3101:59:032014-[warning] master_ip_failover_script is not set.Skipping taking over new master IP address. ThuJul3101:59:032014-[info]**Finished master recovery successfully. ThuJul3101:59:032014-[info]*Phase3:MasterRecoveryPhase completed. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase4:SlavesRecoveryPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase4.1:StartingParallelSlaveDiffLogGenerationPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]--Slave diff file generation on host 192.168.1.22(192.168.1.22:3306) started, pid:6708.Check tmp log /var/log/masterha/app1/192.168.1.22_3306_20140731015854.log if it takes time.. ThuJul3101:59:032014-[info]--Slave diff file generation on host 192.168.1.25(192.168.1.25:3306) started, pid:6709.Check tmp log /var/log/masterha/app1/192.168.1.25_3306_20140731015854.log if it takes time.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Log messages from 192.168.1.22... ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info] This server has all relay logs.No need to generate diff files from the latest slave. ThuJul3101:59:032014-[info]End of log messages from 192.168.1.22. ThuJul3101:59:032014-[info]--192.168.1.22(192.168.1.22:3306) has the latest relay log events. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Log messages from 192.168.1.25... ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info] This server has all relay logs.No need to generate diff files from the latest slave. ThuJul3101:59:032014-[info]End of log messages from 192.168.1.25. ThuJul3101:59:032014-[info]--192.168.1.25(192.168.1.25:3306) has the latest relay log events. ThuJul3101:59:032014-[info]Generating relay diff files from the latest slave succeeded. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase4.2:StartingParallelSlaveLogApplyPhase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.22(192.168.1.22:3306) started, pid:6712.Check tmp log /var/log/masterha/app1/192.168.1.22_3306_20140731015854.log if it takes time.. ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.25(192.168.1.25:3306) started, pid:6713.Check tmp log /var/log/masterha/app1/192.168.1.25_3306_20140731015854.log if it takes time.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Log messages from 192.168.1.22... ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.22(192.168.1.22:3306).. ThuJul3101:59:032014-[info] This server has all relay logs.Waiting all logs to be applied.. ThuJul3101:59:032014-[info] done. ThuJul3101:59:032014-[info] All relay logs were successfully applied. ThuJul3101:59:032014-[info] Resetting slave 192.168.1.22(192.168.1.22:3306) and starting replication from the new master 192.168.1.21(192.168.1.21:3306).. ThuJul3101:59:032014-[info] Executed CHANGE MASTER. ThuJul3101:59:032014-[info] Slave started. ThuJul3101:59:032014-[info]End of log messages from 192.168.1.22. ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.22(192.168.1.22:3306) succeeded. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Log messages from 192.168.1.25... ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Starting recovery on 192.168.1.25(192.168.1.25:3306).. ThuJul3101:59:032014-[info] This server has all relay logs.Waiting all logs to be applied.. ThuJul3101:59:032014-[info] done. ThuJul3101:59:032014-[info] All relay logs were successfully applied. ThuJul3101:59:032014-[info] Resetting slave 192.168.1.25(192.168.1.25:3306) and starting replication from the new master 192.168.1.21(192.168.1.21:3306).. ThuJul3101:59:032014-[info] Executed CHANGE MASTER. ThuJul3101:59:032014-[info] Slave started. ThuJul3101:59:032014-[info]End of log messages from 192.168.1.25. ThuJul3101:59:032014-[info]--Slave recovery on host 192.168.1.25(192.168.1.25:3306) succeeded. ThuJul3101:59:032014-[info]Allnew slave servers recovered successfully. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]*Phase5:New master cleanup phase.. ThuJul3101:59:032014-[info] ThuJul3101:59:032014-[info]Resetting slave info on the new master.. ThuJul3101:59:032014-[info] 192.168.1.21:Resetting slave info succeeded. ThuJul3101:59:032014-[info]Master failover to 192.168.1.21(192.168.1.21:3306) completed successfully. ThuJul3101:59:032014-[info] -----FailoverReport----- app1:MySQLMaster failover 192.168.1.20(192.168.1.20:3306) to 192.168.1.21(192.168.1.21:3306) succeeded Master192.168.1.20(192.168.1.20:3306) is down! Check MHA Manager logs at mysql1.com:/var/log/masterha/app1/app1.log for details. Started automated(non-interactive) failover. The latest slave 192.168.1.21(192.168.1.21:3306) has all relay logs for recovery. Selected192.168.1.21(192.168.1.21:3306) as a new master. 192.168.1.21(192.168.1.21:3306): OK:Applying all logs succeeded. 192.168.1.22(192.168.1.22:3306):This host has the latest relay log events. 192.168.1.25(192.168.1.25:3306):This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.1.22(192.168.1.22:3306): OK:Applying all logs succeeded.Slave started, replicating from 192.168.1.21(192.168.1.21:3306) 192.168.1.25(192.168.1.25:3306): OK:Applying all logs succeeded.Slave started, replicating from 192.168.1.21(192.168.1.21:3306) 192.168.1.21(192.168.1.21:3306):Resetting slave info succeeded. Master failover to 192.168.1.21(192.168.1.21:3306) completed successfully.
恢复20机器的mysql,然后执行change master:
CHANGE MASTER TO MASTER_HOST='192.168.1.21', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=569, MASTER_USER='slave', MASTER_PASSWORD='slave'; (在上面日志中有)
start slave;
恢复环境后,要删除/var/log/masterha/app1 下的app1.failover.complete ,并重启mha-master进程。才能下次fail over.
对于ip地址的接管,需要修改这2个脚本:
master_ip_failover_script='' 模板在安装包的sample/scripts下
master_ip_online_change_script="" 手工切换要配置这个脚本,否则会出现只切mysql,没切vip的状况
参考吴总:https://github.com/wubx/mha-helper
一个比较全的博客:http://blog.itpub.net/14594028/viewspace-1073516/
MYSQL + MHA +keepalive + VIP安装配置(一)
http://www.cnblogs.com/yuanermen/p/3726572.html