MHA ssh检查,repl复制检查和在线切换日志分析

 

一、SSh 检查日志分析

执行过程及对应的日志:

1、读取MHA manger 节点上的配置文件

2、根据配置文件,得到各个主机的信息,逐一进行SSH检查

3、每个主机都通过SSH连接除了自己以外的其他所有主机

4、当所有主机相互之间都能通过SSH免密登录,SSH检查就通过。

[root@A2 app1]# masterha_check_ssh --conf=/etc/masterha/app1.conf 
Sun Jun 23 18:36:45 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 23 18:36:45 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun 23 18:36:45 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun 23 18:36:45 2019 - [info] Starting SSH connection tests..
Sun Jun 23 18:36:45 2019 - [debug] 
Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.13.15(172.16.13.15:22) to root@172.16.15.3(172.16.15.3:22)..
Sun Jun 23 18:36:45 2019 - [debug]   ok.
Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.13.15(172.16.13.15:22) to root@172.16.15.2(172.16.15.2:22)..
Sun Jun 23 18:36:45 2019 - [debug]   ok.
Sun Jun 23 18:36:46 2019 - [debug] 
Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.15.3(172.16.15.3:22) to root@172.16.13.15(172.16.13.15:22)..
Sun Jun 23 18:36:45 2019 - [debug]   ok.
Sun Jun 23 18:36:45 2019 - [debug]  Connecting via SSH from root@172.16.15.3(172.16.15.3:22) to root@172.16.15.2(172.16.15.2:22)..
Sun Jun 23 18:36:46 2019 - [debug]   ok.
Sun Jun 23 18:36:46 2019 - [debug] 
Sun Jun 23 18:36:46 2019 - [debug]  Connecting via SSH from root@172.16.15.2(172.16.15.2:22) to root@172.16.13.15(172.16.13.15:22)..
Sun Jun 23 18:36:46 2019 - [debug]   ok.
Sun Jun 23 18:36:46 2019 - [debug]  Connecting via SSH from root@172.16.15.2(172.16.15.2:22) to root@172.16.15.3(172.16.15.3:22)..
Sun Jun 23 18:36:46 2019 - [debug]   ok.
Sun Jun 23 18:36:46 2019 - [info] All SSH connection tests passed successfully.

 

二、主从复制检查日志分析

1、读取配置文件,根据配置文件,检查当前的所有主机状态,MHA Node版本,是否支持GTID主从复制,得到当前的主从复制架构

[root@A2 app1]# masterha_check_repl --conf=/etc/masterha/app1.conf 
Sun Jun 23 18:36:28 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 23 18:36:28 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun 23 18:36:28 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun 23 18:36:28 2019 - [info] MHA::MasterMonitor version 0.56.
Sun Jun 23 18:36:29 2019 - [info] GTID failover mode = 0
Sun Jun 23 18:36:29 2019 - [info] Dead Servers:
Sun Jun 23 18:36:29 2019 - [info] Alive Servers:
Sun Jun 23 18:36:29 2019 - [info]   172.16.13.15(172.16.13.15:3306)
Sun Jun 23 18:36:29 2019 - [info]   172.16.15.3(172.16.15.3:3306)
Sun Jun 23 18:36:29 2019 - [info]   172.16.15.2(172.16.15.2:3306)
Sun Jun 23 18:36:29 2019 - [info] Alive Slaves:
Sun Jun 23 18:36:29 2019 - [info]   172.16.13.15(172.16.13.15:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
Sun Jun 23 18:36:29 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
Sun Jun 23 18:36:29 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun 23 18:36:29 2019 - [info]   172.16.15.2(172.16.15.2:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
Sun Jun 23 18:36:29 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
Sun Jun 23 18:36:29 2019 - [info] Current Alive Master: 172.16.15.3(172.16.15.3:3306)
2、检查从库配置,检查主从复制是否过滤,是否支持GTID复制
Sun Jun 23 18:36:29 2019 - [info] Checking slave configurations..
Sun Jun 23 18:36:29 2019 - [info] Checking replication filtering settings..
Sun Jun 23 18:36:29 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jun 23 18:36:29 2019 - [info]  Replication filtering check ok.
Sun Jun 23 18:36:29 2019 - [info] GTID (with auto-pos) is not supported
3、进行SSH连接测试,MHA版本检查

Sun Jun 23 18:36:29 2019 - [info] Starting SSH connection tests..
Sun Jun 23 18:36:31 2019 - [info] All SSH connection tests passed successfully.
Sun Jun 23 18:36:31 2019 - [info] Checking MHA Node version..
Sun Jun 23 18:36:31 2019 - [info]  Version check ok.
4、检查主库上SSH 配置,测试恢复脚本(save_binary_logs)的可用性,对binlog设置进行检查
Sun Jun 23 18:36:31 2019 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun 23 18:36:31 2019 - [info] HealthCheck: SSH to 172.16.15.3 is reachable.
Sun Jun 23 18:36:32 2019 - [info] Master MHA Node version is 0.56.
Sun Jun 23 18:36:32 2019 - [info] Checking recovery script configurations on 172.16.15.3(172.16.15.3:3306)..
Sun Jun 23 18:36:32 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql_bin.000025 
Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.15.3(172.16.15.3:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /usr/local/mysql/data, up to mysql_bin.000025
Sun Jun 23 18:36:32 2019 - [info] Binlog setting check done.
5、检查从库SSH配置,测试应用差异日志脚本(apply_diff_relay_logs)的可用性,检查从库恢复环境和 relay log 的情况,检查MySQL的连接和权限,清理测试文件
Sun Jun 23 18:36:32 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun 23 18:36:32 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.13.15 --slave_ip=172.16.13.15 --slave_port=3306 --workdir=/tmp --target_version=5.7.16-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.13.15(172.16.13.15:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to mysqlserver-relay-bin.000002
    Temporary relay log file is /usr/local/mysql/data/mysqlserver-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
    
Sun Jun 23 18:36:32 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.15.2 --slave_ip=172.16.15.2 --slave_port=3306 --workdir=/tmp --target_version=5.7.16-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
Sun Jun 23 18:36:32 2019 - [info]   Connecting to root@172.16.15.2(172.16.15.2:22).. 
  Checking slave recovery environment settings..
    Opening /usr/local/mysql/data/relay-log.info ... ok.
    Relay log found at /usr/local/mysql/data, up to A2-relay-bin.000002
    Temporary relay log file is /usr/local/mysql/data/A2-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 23 18:36:32 2019 - [info] Slaves settings check done.
6、得到当前的主从结构,检查每个从库的复制状态。检查故障切换等脚本的状态,完成主从复制检查
Sun Jun 23 18:36:32 2019 - [info] 
172.16.15.3(172.16.15.3:3306) (current master)
 +--172.16.13.15(172.16.13.15:3306)
 +--172.16.15.2(172.16.15.2:3306)

Sun Jun 23 18:36:32 2019 - [info] Checking replication health on 172.16.13.15..
Sun Jun 23 18:36:32 2019 - [info]  ok.
Sun Jun 23 18:36:32 2019 - [info] Checking replication health on 172.16.15.2..
Sun Jun 23 18:36:32 2019 - [info]  ok.
Sun Jun 23 18:36:32 2019 - [info] Checking master_ip_failover_script status:
Sun Jun 23 18:36:32 2019 - [info]   /var/log/masterha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 
Checking the Status of the script.. OK 
Sun Jun 23 18:36:32 2019 - [info]  OK.
Sun Jun 23 18:36:32 2019 - [warning] shutdown_script is not defined.
Sun Jun 23 18:36:32 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

 

三、在线日志切换分析

1、读取配置文件,检查是否支持GTID复制,得到当前的主从结构
[root@A2 app1]# masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=172.16.13.15 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1000
Sun Jun 23 19:29:06 2019 - [info] MHA::MasterRotate version 0.56.
Sun Jun 23 19:29:06 2019 - [info] Starting online master switch..
Sun Jun 23 19:29:06 2019 - [info] 
Sun Jun 23 19:29:06 2019 - [info] * Phase 1: Configuration Check Phase..
Sun Jun 23 19:29:06 2019 - [info] 
Sun Jun 23 19:29:06 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 23 19:29:06 2019 - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun 23 19:29:06 2019 - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun 23 19:29:08 2019 - [info] GTID failover mode = 0
Sun Jun 23 19:29:08 2019 - [info] Current Alive Master: 172.16.15.3(172.16.15.3:3306)
Sun Jun 23 19:29:08 2019 - [info] Alive Slaves:
Sun Jun 23 19:29:08 2019 - [info]   172.16.13.15(172.16.13.15:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
Sun Jun 23 19:29:08 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
Sun Jun 23 19:29:08 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun 23 19:29:08 2019 - [info]   172.16.15.2(172.16.15.2:3306)  Version=5.7.16-log (oldest major version between slaves) log-bin:enabled
Sun Jun 23 19:29:08 2019 - [info]     Replicating from 172.16.15.3(172.16.15.3:3306)
2、在主库上确认执行 FLUSH NO_WRITE_TO_BINLOG TABLES,关闭已经打开的表,不再记录binlog,进行主从复制检查,得到新主库的信息
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.16.15.3(172.16.15.3:3306)? (YES/no): yes
Sun Jun 23 19:29:11 2019 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Jun 23 19:29:11 2019 - [info]  ok.
Sun Jun 23 19:29:11 2019 - [info] Checking MHA is not monitoring or doing failover..
Sun Jun 23 19:29:11 2019 - [info] Checking replication health on 172.16.13.15..
Sun Jun 23 19:29:11 2019 - [info]  ok.
Sun Jun 23 19:29:11 2019 - [info] Checking replication health on 172.16.15.2..
Sun Jun 23 19:29:11 2019 - [info]  ok.
Sun Jun 23 19:29:11 2019 - [info] 172.16.13.15 can be new master.
Sun Jun 23 19:29:11 2019 - [info] 
From:
172.16.15.3(172.16.15.3:3306) (current master)
 +--172.16.13.15(172.16.13.15:3306)
 +--172.16.15.2(172.16.15.2:3306)

To:
172.16.13.15(172.16.13.15:3306) (new master)
 +--172.16.15.2(172.16.15.2:3306)
 +--172.16.15.3(172.16.15.3:3306)
3、开始从旧主切换到新主,检查新主能否成为主库。检查复制过滤,临时将旧主change master to到一个dummy地址
Starting master switch from 172.16.15.3(172.16.15.3:3306) to 172.16.13.15(172.16.13.15:3306)? (yes/NO): yes
Sun Jun 23 19:29:13 2019 - [info] Checking whether 172.16.13.15(172.16.13.15:3306) is ok for the new master..
Sun Jun 23 19:29:13 2019 - [info]  ok.
Sun Jun 23 19:29:13 2019 - [info] 172.16.15.3(172.16.15.3:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Jun 23 19:29:13 2019 - [info] 172.16.15.3(172.16.15.3:3306): Resetting slave pointing to the dummy host.
Sun Jun 23 19:29:14 2019 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jun 23 19:29:14 2019 - [info] 

 

4、旧主库上执行master_ip_online_change,停止虚拟IP。在旧主上执行 FLUSH TABLES WITH READ LOCK..,实现全局读锁。
Sun Jun 23 19:29:14 2019 - [info] * Phase 2: Rejecting updates Phase..
Sun Jun 23 19:29:14 2019 - [info] 
Sun Jun 23 19:29:14 2019 - [info] Executing master ip online change script to disable write on the current master:
Sun Jun 23 19:29:14 2019 - [info]   /var/log/masterha/scripts/master_ip_online_change --command=stop --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='115433' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port=3306 --new_master_user='root' --new_master_password='115433' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave

***************************************************************

Disabling the VIP - 172.16.13.141/16 on old master: 172.16.15.3

Disabled the VIP successfully
***************************************************************


Sun Jun 23 19:29:14 2019 - [info]  ok.
Sun Jun 23 19:29:14 2019 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Jun 23 19:29:14 2019 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Jun 23 19:29:15 2019 - [info]  ok.
5、获取旧主的binlog位置,在新主上面应用中继日志。得到新主的binlog 位置,用于后期在其他从库上执行change master  to,在新主库上面开启虚拟IP,set read_only =0

 

Sun Jun 23 19:29:15 2019 - [info] Orig master binlog:pos is mysql_bin.000025:150734601.
Sun Jun 23 19:29:15 2019 - [info]  Waiting to execute all relay logs on 172.16.13.15(172.16.13.15:3306)..
Sun Jun 23 19:30:48 2019 - [info]  master_pos_wait(mysql_bin.000025:150734601) completed on 172.16.13.15(172.16.13.15:3306). Executed 137 events.
Sun Jun 23 19:30:48 2019 - [info]   done.
Sun Jun 23 19:30:48 2019 - [info] Getting new master's binlog name and position..
Sun Jun 23 19:30:48 2019 - [info]  mysql_bin.000058:91914599
Sun Jun 23 19:30:48 2019 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.13.15', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000058', MASTER_LOG_POS=91914599, MASTER_USER='root', MASTER_PASSWORD='xxx';
Sun Jun 23 19:30:48 2019 - [info] Executing master ip online change script to allow write on the new master:
Sun Jun 23 19:30:48 2019 - [info]   /var/log/masterha/scripts/master_ip_online_change --command=start --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='115433' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port=3306 --new_master_user='root' --new_master_password='115433' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave


***************************************************************

Enabling the VIP - 172.16.13.141/16 on new master: 172.16.13.15 

Enabled the VIP successfully
***************************************************************

Sun Jun 23 19:30:48 2019 - [info]  ok.
Sun Jun 23 19:30:48 2019 - [info] Setting read_only=0 on 172.16.13.15(172.16.13.15:3306)..
Sun Jun 23 19:30:48 2019 - [info]  ok.

 

6、并行切换从库,应用中继日志到 旧主binlog位置,执行change master  to;旧主也同时执行,而且执行 UNLOCK  TABLES ,解锁。至此,从库切换完成
Sun Jun 23 19:30:48 2019 - [info] 
Sun Jun 23 19:30:48 2019 - [info] * Switching slaves in parallel..
Sun Jun 23 19:30:48 2019 - [info] 
Sun Jun 23 19:30:48 2019 - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:3306) started, pid: 6048
Sun Jun 23 19:30:48 2019 - [info] 
Sun Jun 23 19:30:49 2019 - [info] Log messages from 172.16.15.2 ...
Sun Jun 23 19:30:49 2019 - [info] 
Sun Jun 23 19:30:48 2019 - [info]  Waiting to execute all relay logs on 172.16.15.2(172.16.15.2:3306)..
Sun Jun 23 19:30:48 2019 - [info]  master_pos_wait(mysql_bin.000025:150734601) completed on 172.16.15.2(172.16.15.2:3306). Executed 0 events.
Sun Jun 23 19:30:48 2019 - [info]   done.
Sun Jun 23 19:30:48 2019 - [info]  Resetting slave 172.16.15.2(172.16.15.2:3306) and starting replication from the new master 172.16.13.15(172.16.13.15:3306)..
Sun Jun 23 19:30:48 2019 - [info]  Executed CHANGE MASTER.
Sun Jun 23 19:30:48 2019 - [info]  Slave started.
Sun Jun 23 19:30:49 2019 - [info] End of log messages from 172.16.15.2 ...
Sun Jun 23 19:30:49 2019 - [info] 
Sun Jun 23 19:30:49 2019 - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:3306) succeeded.
Sun Jun 23 19:30:49 2019 - [info] Unlocking all tables on the orig master:
Sun Jun 23 19:30:49 2019 - [info] Executing UNLOCK TABLES..
Sun Jun 23 19:30:49 2019 - [info]  ok.
Sun Jun 23 19:30:49 2019 - [info] Starting orig master as a new slave..
Sun Jun 23 19:30:49 2019 - [info]  Resetting slave 172.16.15.3(172.16.15.3:3306) and starting replication from the new master 172.16.13.15(172.16.13.15:3306)..
Sun Jun 23 19:30:49 2019 - [info]  Executed CHANGE MASTER.
Sun Jun 23 19:30:49 2019 - [info]  Slave started.
Sun Jun 23 19:30:49 2019 - [info] All new slave servers switched successfully.
Sun Jun 23 19:30:49 2019 - [info] 

7、对新主清理,更改从库信息
Sun Jun 23 19:30:49 2019 - [info] * Phase 5: New master cleanup phase..
Sun Jun 23 19:30:49 2019 - [info] 
Sun Jun 23 19:30:49 2019 - [info]  172.16.13.15: Resetting slave info succeeded.
Sun Jun 23 19:30:49 2019 - [info] Switching master to 172.16.13.15(172.16.13.15:3306) completed successfully.

 

posted on 2019-06-26 11:47  ^老中医^  阅读(493)  评论(0编辑  收藏  举报

导航