lxgi&

导航

关于mha手动切换的一些记录(mha方案来自网络)

mha方案出自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html

当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下:

先停MHA Manager:

192.168.2.131 [root ~]$  masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1  (wd: /usr/local/bin)
(wd now: ~)
192.168.2.131 [root ~]$ 

在Manager主机上操作如下:

复制代码
192.168.2.131 [root bin]$  masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.2.128 --dead_master_port=3306 --new_master_host=192.168.2.129 --new_master_port=3306 --ignore_last_failover       
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.2.128.
Mon Jan 19 00:42:18 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 00:42:18 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Jan 19 00:42:18 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Mon Jan 19 00:42:18 2015 - [info] MHA::MasterFailover version 0.56.
Mon Jan 19 00:42:18 2015 - [info] Starting master failover.
Mon Jan 19 00:42:18 2015 - [info] 
Mon Jan 19 00:42:18 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 00:42:18 2015 - [info] 
Mon Jan 19 00:42:19 2015 - [info] Dead Servers:
Mon Jan 19 00:42:19 2015 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.
Mon Jan 19 00:42:19 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53
复制代码

看到报错了,报错的原因:MHA manager检测到没有dead的server,将报错,并结束failover,也就说,我们要手动关了主库,才能正常切换:

192.168.2.128 [root ~]$ /etc/init.d/mysqld stop
Shutting down MySQL... SUCCESS! 

再执行手动failover命令:

192.168.2.131 [root bin]$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.2.128 --dead_master_port=3306 --new_master_host=192.168.2.129 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.2.128.
Sun Jan 18 19:49:20 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jan 18 19:49:20 2015 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sun Jan 18 19:49:20 2015 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sun Jan 18 19:49:20 2015 - [info] MHA::MasterFailover version 0.53.
Sun Jan 18 19:49:20 2015 - [info] Starting master failover.
Sun Jan 18 19:49:20 2015 - [info] 
Sun Jan 18 19:49:20 2015 - [info] * Phase 1: Configuration Check Phase..
Sun Jan 18 19:49:20 2015 - [info] 
Sun Jan 18 19:49:20 2015 - [info] Dead Servers:
Sun Jan 18 19:49:20 2015 - [info]   192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:20 2015 - [info] Checking master reachability via mysql(double check)..
Sun Jan 18 19:49:20 2015 - [info]  ok.
Sun Jan 18 19:49:20 2015 - [info] Alive Servers:
Sun Jan 18 19:49:20 2015 - [info]   192.168.2.129(192.168.2.129:3306)
Sun Jan 18 19:49:20 2015 - [info]   192.168.2.130(192.168.2.130:3306)
Sun Jan 18 19:49:20 2015 - [info] Alive Slaves:
Sun Jan 18 19:49:20 2015 - [info]   192.168.2.129(192.168.2.129:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:20 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:20 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jan 18 19:49:20 2015 - [info]   192.168.2.130(192.168.2.130:3306)  Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:20 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Master 192.168.2.128 is dead. Proceed? (yes/NO): yes
Sun Jan 18 19:49:24 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] HealthCheck: SSH to 192.168.2.128 is reachable.
Sun Jan 18 19:49:24 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Jan 18 19:49:24 2015 - [info] Executing master IP deactivatation script:
Sun Jan 18 19:49:24 2015 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.2.88/24===

Disabling the VIP on old master: 192.168.2.128 
Sun Jan 18 19:49:24 2015 - [info]  done.
Sun Jan 18 19:49:24 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sun Jan 18 19:49:24 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] * Phase 3: Master Recovery Phase..
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] The latest binary log file/position on all slaves is mysql-bin.000016:107
Sun Jan 18 19:49:24 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sun Jan 18 19:49:24 2015 - [info]   192.168.2.129(192.168.2.129:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:24 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:24 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jan 18 19:49:24 2015 - [info]   192.168.2.130(192.168.2.130:3306)  Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:24 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:24 2015 - [info] The oldest binary log file/position on all slaves is mysql-bin.000016:107
Sun Jan 18 19:49:24 2015 - [info] Oldest slaves:
Sun Jan 18 19:49:24 2015 - [info]   192.168.2.129(192.168.2.129:3306)  Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:24 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:24 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jan 18 19:49:24 2015 - [info]   192.168.2.130(192.168.2.130:3306)  Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Sun Jan 18 19:49:24 2015 - [info]     Replicating from 192.168.2.128(192.168.2.128:3306)
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:24 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sun Jan 18 19:49:24 2015 - [info] 
Sun Jan 18 19:49:25 2015 - [info] Fetching dead master's binary logs..
Sun Jan 18 19:49:25 2015 - [info] Executing command on the dead master 192.168.2.128(192.168.2.128:3306): save_binary_logs --command=save --start_file=mysql-bin.000016  --start_pos=107 --binlog_dir=/data/mysql --output_file=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53
  Creating /tmp if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000016 pos 107 to mysql-bin.000016 EOF into /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /data/mysql/mysql-bin.000016 position 107 to tail(126).. ok.
 Concat succeeded.
saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog                                                                    100%  126     0.1KB/s   00:00    
Sun Jan 18 19:49:25 2015 - [info] scp from root@192.168.2.128:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded.
Sun Jan 18 19:49:25 2015 - [info] HealthCheck: SSH to 192.168.2.129 is reachable.
Sun Jan 18 19:49:26 2015 - [info] HealthCheck: SSH to 192.168.2.130 is reachable.
Sun Jan 18 19:49:26 2015 - [info] 
Sun Jan 18 19:49:26 2015 - [info] * Phase 3.3: Determining New Master Phase..
Sun Jan 18 19:49:26 2015 - [info] 
Sun Jan 18 19:49:26 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sun Jan 18 19:49:26 2015 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sun Jan 18 19:49:26 2015 - [info] 192.168.2.129 can be new master.
Sun Jan 18 19:49:26 2015 - [info] New master is 192.168.2.129(192.168.2.129:3306)
Sun Jan 18 19:49:26 2015 - [info] Starting master failover..
Sun Jan 18 19:49:26 2015 - [info] 
From:
192.168.2.128 (current master)
 +--192.168.2.129
 +--192.168.2.130

To:
192.168.2.129 (new master)
 +--192.168.2.130

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Sun Jan 18 19:49:31 2015 - [info] New master decided manually is 192.168.2.129(192.168.2.129:3306)
Sun Jan 18 19:49:31 2015 - [info] 
Sun Jan 18 19:49:31 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sun Jan 18 19:49:31 2015 - [info] 
Sun Jan 18 19:49:31 2015 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sun Jan 18 19:49:31 2015 - [info] Sending binlog..
saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog                                                                    100%  126     0.1KB/s   00:00    
Sun Jan 18 19:49:31 2015 - [info] scp from local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to root@192.168.2.129:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded.
Sun Jan 18 19:49:31 2015 - [info] 
Sun Jan 18 19:49:31 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Sun Jan 18 19:49:31 2015 - [info] 
Sun Jan 18 19:49:31 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sun Jan 18 19:49:31 2015 - [info] Starting recovery on 192.168.2.129(192.168.2.129:3306)..
Sun Jan 18 19:49:31 2015 - [info]  Generating diffs succeeded.
Sun Jan 18 19:49:31 2015 - [info] Waiting until all relay logs are applied.
Sun Jan 18 19:49:31 2015 - [info]  done.
Sun Jan 18 19:49:31 2015 - [info] Getting slave status..
Sun Jan 18 19:49:31 2015 - [info] This slave(192.168.2.129)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000016:107). No need to recover from Exec_Master_Log_Pos.
Sun Jan 18 19:49:31 2015 - [info] Connecting to the target slave host 192.168.2.129, running recover script..
Sun Jan 18 19:49:31 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.2.129 --slave_ip=192.168.2.129  --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --workdir=/tmp --target_version=5.5.30-log --timestamp=20150118194920 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sun Jan 18 19:49:32 2015 - [info] 
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog on 192.168.2.129:3306. This may take long time...
Applying log files succeeded.
Sun Jan 18 19:49:32 2015 - [info]  All relay logs were successfully applied.
Sun Jan 18 19:49:32 2015 - [info] Getting new master's binlog name and position..
Sun Jan 18 19:49:32 2015 - [info]  mysql-bin.000005:61791
Sun Jan 18 19:49:32 2015 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Sun Jan 18 19:49:32 2015 - [info] Executing master IP activate script:
Sun Jan 18 19:49:32 2015 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306  


IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.2.88/24===

Enabling the VIP - 192.168.2.88/24 on the new master - 192.168.2.129 
Sun Jan 18 19:49:32 2015 - [info]  OK.
Sun Jan 18 19:49:32 2015 - [info] ** Finished master recovery successfully.
Sun Jan 18 19:49:32 2015 - [info] * Phase 3: Master Recovery Phase completed.
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] * Phase 4: Slaves Recovery Phase..
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] -- Slave diff file generation on host 192.168.2.130(192.168.2.130:3306) started, pid: 20692. Check tmp log /var/log/masterha/app1.log/192.168.2.130_3306_20150118194920.log if it takes time..
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] Log messages from 192.168.2.130 ...
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sun Jan 18 19:49:32 2015 - [info] End of log messages from 192.168.2.130.
Sun Jan 18 19:49:32 2015 - [info] -- 192.168.2.130(192.168.2.130:3306) has the latest relay log events.
Sun Jan 18 19:49:32 2015 - [info] Generating relay diff files from the latest slave succeeded.
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sun Jan 18 19:49:32 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] -- Slave recovery on host 192.168.2.130(192.168.2.130:3306) started, pid: 20694. Check tmp log /var/log/masterha/app1.log/192.168.2.130_3306_20150118194920.log if it takes time..
saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog                                                                    100%  126     0.1KB/s   00:00    
Sun Jan 18 19:49:33 2015 - [info] 
Sun Jan 18 19:49:33 2015 - [info] Log messages from 192.168.2.130 ...
Sun Jan 18 19:49:33 2015 - [info] 
Sun Jan 18 19:49:32 2015 - [info] Sending binlog..
Sun Jan 18 19:49:32 2015 - [info] scp from local:/var/log/masterha/app1.log/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog to root@192.168.2.130:/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog succeeded.
Sun Jan 18 19:49:33 2015 - [info] Starting recovery on 192.168.2.130(192.168.2.130:3306)..
Sun Jan 18 19:49:33 2015 - [info]  Generating diffs succeeded.
Sun Jan 18 19:49:33 2015 - [info] Waiting until all relay logs are applied.
Sun Jan 18 19:49:33 2015 - [info]  done.
Sun Jan 18 19:49:33 2015 - [info] Getting slave status..
Sun Jan 18 19:49:33 2015 - [info] This slave(192.168.2.130)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000016:107). No need to recover from Exec_Master_Log_Pos.
Sun Jan 18 19:49:33 2015 - [info] Connecting to the target slave host 192.168.2.130, running recover script..
Sun Jan 18 19:49:33 2015 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.2.130 --slave_ip=192.168.2.130  --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog --workdir=/tmp --target_version=5.5.25-log --timestamp=20150118194920 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx
Sun Jan 18 19:49:33 2015 - [info] 
Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.2.128_3306_20150118194920.binlog on 192.168.2.130:3306. This may take long time...
Applying log files succeeded.
Sun Jan 18 19:49:33 2015 - [info]  All relay logs were successfully applied.
Sun Jan 18 19:49:33 2015 - [info]  Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Sun Jan 18 19:49:33 2015 - [info]  Executed CHANGE MASTER.
Sun Jan 18 19:49:33 2015 - [info]  Slave started.
Sun Jan 18 19:49:33 2015 - [info] End of log messages from 192.168.2.130.
Sun Jan 18 19:49:33 2015 - [info] -- Slave recovery on host 192.168.2.130(192.168.2.130:3306) succeeded.
Sun Jan 18 19:49:33 2015 - [info] All new slave servers recovered successfully.
Sun Jan 18 19:49:33 2015 - [info] 
Sun Jan 18 19:49:33 2015 - [info] * Phase 5: New master cleanup phease..
Sun Jan 18 19:49:33 2015 - [info] 
Sun Jan 18 19:49:33 2015 - [info] Resetting slave info on the new master..
Sun Jan 18 19:49:33 2015 - [info]  192.168.2.129: Resetting slave info succeeded.
Sun Jan 18 19:49:33 2015 - [info] Master failover to 192.168.2.129(192.168.2.129:3306) completed successfully.
Sun Jan 18 19:49:33 2015 - [info] 

----- Failover Report -----

app1: MySQL Master failover 192.168.2.128 to 192.168.2.129 succeeded

Master 192.168.2.128 is down!

Check MHA Manager logs at localhost.localdomain for details.

Started manual(interactive) failover.
Invalidated master IP address on 192.168.2.128.
The latest slave 192.168.2.129(192.168.2.129:3306) has all relay logs for recovery.
Selected 192.168.2.129 as a new master.
192.168.2.129: OK: Applying all logs succeeded.
192.168.2.129: OK: Activated master IP address.
192.168.2.130: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.2.130: OK: Applying all logs succeeded. Slave started, replicating from 192.168.2.129.
192.168.2.129: Resetting slave info succeeded.
Master failover to 192.168.2.129(192.168.2.129:3306) completed successfully.
Sun Jan 18 19:49:33 2015 - [info] Sending mail..


总结:根据在虚拟机上的测试效果,此模式适合如下场景
1.首先manager没有运行
2.master损坏
3.执行完此切换后,集群就变成了普通的主从复制,新master挂了后,剩下的slave不会变成master(这是在只剩1个slave的情况下测试的结果,剩余多个slave的情况没有测试)
3.老master修复后,不能自加入集群了,check_repl会提示,集群中“there are 2 non-slave servers”,集群中有两个非slave节点
4.mha集群在没有第二主节点时怎么样加入一个第二主节点? 切换后,首先将old master以slave身份去同步new master,并修改配置文件,用masterha_check_repl检查,只要提示集群health ok就行了,也可以在适当的时候在线切换,那样就是无损切换。

posted on 2016-01-15 14:29  lxgi&  阅读(1081)  评论(0编辑  收藏  举报