mysql高可用集群——MHA架构
目录
1.下载
2.搭建mha
2.1 系统配置
2.2 架构
2.3 添加ssh公钥信任
2.4 安装mha节点
2.5 manager配置文件
2.6 检查
2.7 启动manager进程
2.8 碰到的问题
3.测试切换
3.1 正常切换测试
3.2 回切测试
3.3 雪崩测试
3.4 主从不一致切换测试
下载
mha链接地址:http://pan.baidu.com/s/1pJkDGX9#dir/path=%2Fmysql%2FHA%2Fmha
或者:https://code.google.com/p/mysql-master-ha/
添加一个yum源:wget http://dl.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
rpm -ivh epel-release-5-4.noarch.rpm
Mha分manager节点和node节点。可以下载源码包,或者rpm包,随个人喜好,注意,版本与系统内核要匹配。
搭建mha
1.系统配置
配置如下:
系统 |
内核 |
Mysql版本 |
内存 |
CentOS release 5.8 |
Linux 2.6.18-308.el5xen |
Mysql 5.5.35 |
2G |
2.架构
服务器列表:
IP |
机器名 |
角色 |
192.168.2.7 |
haproxy001 |
manager |
192.168.1.241 |
Hd-dm-test01 |
Node |
192.168.1.242 |
Hd-dm-test02 |
Node |
192.168.1.243 |
Hd-dm-test03 |
node |
架构图:
3.添加ssh公钥信任
(1)node节点之间配置公钥信任,Manager节点向node节点添加公钥信任
(2)创建公钥密钥,并互传公钥
[root@haproxy001 ~]$ ssh-keygen -t rsa
结果如下:
[root@haproxy001 ~]$ ls -l ~/.ssh/
-rw------- 1 leiche leiche 1675 07-04 10:18 id_rsa
-rw-r--r-- 1 leiche leiche 399 07-04 10:18 id_rsa.pub
-rw-r--r-- 1 leiche leiche 1182 07-04 10:27 known_hosts
配置manager公钥信任:
[root@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.1.241
[root@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.1.242
[root@haproxy001 ~] ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.1.243
配置node节点间的公钥信任:
[root@hd-dm-test01~] ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.1.242
[root@hd-dm-test01~] ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.1.243
Ssh-copy-id加节点信任是追加的方式,在~/.ssh/目录下生成authorized_keys文件,可以打开查看:
[root@hd-dm-test03 .ssh]# sudo cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA35i+lWMDeWJYvamobI3OdmSFXLMt4ngsI6aiJCkv2ZMaeoKEzdeK2GWP9P/F7dU2j1DCgKh8yzF9o4E713k5KW92RmeyglDt590R0PPNvGDFAauQ5wEkufylYLbaJ0fwsJPjeM4ZBrT7RqdYGn3lGQFW32Cih9LcV2A+8NkQbrh/kG2wcTiaejFQFgSKn87nsokhg5L/zY6qkeuV0dPfqfpeSI8uhTI+VUC83/5odeunXrPE30o6fZZDpp2oszt2TEElldZ6pS9mL3ZhGuP/o/IPe+w5/cK9J4C815y2mr9Agr/UUQoj1K4WWRn5uZ5y/pyvAAlbPteNVVlx4djoIw== root@haproxy001
...
(3)检查
[root@haproxy001 ~]# ssh 172.16.1.243 "/sbin/ifconfig |grep 'inet addr' |head -1"
inet addr:172.16.1.243 Bcast:172.16.1.255 Mask:255.255.255.0
4.安装mha节点
在manager服务器2.7上安装manager节点:
rpm -ivh mha4mysql-manager-0.53-0.noarch.rpm
在manager和node服务器安装node节点:
sudo rpm -ivh mha4mysql-node-0.53-0.noarch.rpm
依赖包:
sudo yum -y install perl-DBD-MySQL.x86_64
sudo yum -y install perl-Log-Dispatch
sudo yum -y install perl-Config-Tiny
sudo yum -y install perl-Parallel-ForkManager
5.配置文件
在manager服务器上创建目录
mkdir /etc/masterha
mkdir -p /masterha/app1
生成配置文件/etc/masterha/app1.cnf
[server default]
#manager dir
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/masterha/app1
#mysql manager user
user=root
password=123456
#node server user
ssh_user=root
#replication_user
repl_user=repluser
repl_password=hoodong
#checking master every second
ping_interval=1
#promote script
#shutdown_script=""
#master_ip_failover_script="/usr/local/bin/master_ip_failover"
#master_ip_online_change_script=""
#report_script=""
[server1]
hostname=172.16.1.241
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
[server2]
hostname=172.16.1.242
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
[server3]
hostname=172.16.1.243
master_binlog_dir="/data/mysqllog/3306"
ssh_port=22
candidate_master=1
6.检查:
检查ssh
[root@haproxy001 app1]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Sat Jul 5 12:57:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 12:57:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:24 2014 - [info] Starting SSH connection tests..
Sat Jul 5 12:57:26 2014 - [debug]
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from root@172.16.1.241(172.16.1.241:22) to root@172.16.1.242(172.16.1.242:22)..
Sat Jul 5 12:57:24 2014 - [debug] ok.
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from root@172.16.1.241(172.16.1.241:22) to root@172.16.1.243(172.16.1.243:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:26 2014 - [debug]
Sat Jul 5 12:57:24 2014 - [debug] Connecting via SSH from root@172.16.1.242(172.16.1.242:22) to root@172.16.1.241(172.16.1.241:22)..
Sat Jul 5 12:57:25 2014 - [debug] ok.
Sat Jul 5 12:57:25 2014 - [debug] Connecting via SSH from root@172.16.1.242(172.16.1.242:22) to root@172.16.1.243(172.16.1.243:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:27 2014 - [debug]
Sat Jul 5 12:57:25 2014 - [debug] Connecting via SSH from root@172.16.1.243(172.16.1.243:22) to root@172.16.1.241(172.16.1.241:22)..
Sat Jul 5 12:57:26 2014 - [debug] ok.
Sat Jul 5 12:57:26 2014 - [debug] Connecting via SSH from root@172.16.1.243(172.16.1.243:22) to root@172.16.1.242(172.16.1.242:22)..
Sat Jul 5 12:57:27 2014 - [debug] ok.
Sat Jul 5 12:57:27 2014 - [info] All SSH connection tests passed successfully.
检查复制
[root@haproxy001 app1]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Sat Jul 5 12:57:05 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 5 12:57:05 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:05 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Sat Jul 5 12:57:05 2014 - [info] MHA::MasterMonitor version 0.53.
Sat Jul 5 12:57:06 2014 - [info] Dead Servers:
Sat Jul 5 12:57:06 2014 - [info] Alive Servers:
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.242(172.16.1.242:3306)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.243(172.16.1.243:3306)
Sat Jul 5 12:57:06 2014 - [info] Alive Slaves:
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 12:57:06 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 12:57:06 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled
Sat Jul 5 12:57:06 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Jul 5 12:57:06 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306)
Sat Jul 5 12:57:06 2014 - [info] Checking slave configurations..
Sat Jul 5 12:57:06 2014 - [info] Checking replication filtering settings..
Sat Jul 5 12:57:06 2014 - [info] binlog_do_db= , binlog_ignore_db=
Sat Jul 5 12:57:06 2014 - [info] Replication filtering check ok.
Sat Jul 5 12:57:06 2014 - [info] Starting SSH connection tests..
Sat Jul 5 12:57:09 2014 - [info] All SSH connection tests passed successfully.
Sat Jul 5 12:57:09 2014 - [info] Checking MHA Node version..
Sat Jul 5 12:57:09 2014 - [info] Version check ok.
Sat Jul 5 12:57:09 2014 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul 5 12:57:10 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable.
Sat Jul 5 12:57:10 2014 - [info] Master MHA Node version is 0.53.
Sat Jul 5 12:57:10 2014 - [info] Checking recovery script configurations on the current master..
Sat Jul 5 12:57:10 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008
Sat Jul 5 12:57:10 2014 - [info] Connecting to root@172.16.1.241(172.16.1.241)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysqllog/3306, up to mysql3306.000008
Sat Jul 5 12:57:11 2014 - [info] Master setting check done.
Sat Jul 5 12:57:11 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul 5 12:57:11 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 12:57:11 2014 - [info] Connecting to root@172.16.1.242(172.16.1.242:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 12:57:11 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 12:57:11 2014 - [info] Connecting to root@172.16.1.243(172.16.1.243:22)..
Checking slave recovery environment settings..
Opening /data/mysqldata/3306/relay-log.info ... ok.
Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016
Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Jul 5 12:57:12 2014 - [info] Slaves settings check done.
Sat Jul 5 12:57:12 2014 - [info]
172.16.1.241 (current master)
+--172.16.1.242
+--172.16.1.243
Sat Jul 5 12:57:12 2014 - [info] Checking replication health on 172.16.1.242..
Sat Jul 5 12:57:12 2014 - [info] ok.
Sat Jul 5 12:57:12 2014 - [info] Checking replication health on 172.16.1.243..
Sat Jul 5 12:57:12 2014 - [info] ok.
Sat Jul 5 12:57:12 2014 - [warning] master_ip_failover_script is not defined.
Sat Jul 5 12:57:12 2014 - [warning] shutdown_script is not defined.
Sat Jul 5 12:57:12 2014 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
7.启动manager节点进程
启动manager节点进程,日志被计入/data/masterha/manager.log
masterha_manager --conf=/etc/masterha/app1.cnf &
查看日志
Sat Jul 5 17:41:15 2014 - [info] MHA::MasterMonitor version 0.53. Sat Jul 5 17:41:15 2014 - [info] Dead Servers: Sat Jul 5 17:41:15 2014 - [info] Alive Servers: Sat Jul 5 17:41:15 2014 - [info] 172.16.1.241(172.16.1.241:3306) Sat Jul 5 17:41:15 2014 - [info] 172.16.1.242(172.16.1.242:3306) Sat Jul 5 17:41:15 2014 - [info] 172.16.1.243(172.16.1.243:3306) Sat Jul 5 17:41:15 2014 - [info] Alive Slaves: Sat Jul 5 17:41:15 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 17:41:15 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 17:41:15 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 17:41:15 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 17:41:15 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 17:41:15 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 17:41:15 2014 - [info] Current Alive Master: 172.16.1.241(172.16.1.241:3306) Sat Jul 5 17:41:15 2014 - [info] Checking slave configurations.. Sat Jul 5 17:41:15 2014 - [info] Checking replication filtering settings.. Sat Jul 5 17:41:15 2014 - [info] binlog_do_db= , binlog_ignore_db= Sat Jul 5 17:41:15 2014 - [info] Replication filtering check ok. Sat Jul 5 17:41:15 2014 - [info] Starting SSH connection tests.. Sat Jul 5 17:41:18 2014 - [info] All SSH connection tests passed successfully. Sat Jul 5 17:41:18 2014 - [info] Checking MHA Node version.. Sat Jul 5 17:41:19 2014 - [info] Version check ok. Sat Jul 5 17:41:19 2014 - [info] Checking SSH publickey authentication settings on the current master.. Sat Jul 5 17:41:20 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable. Sat Jul 5 17:41:20 2014 - [info] Master MHA Node version is 0.53. Sat Jul 5 17:41:20 2014 - [info] Checking recovery script configurations on the current master.. Sat Jul 5 17:41:20 2014 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --start_file=mysql3306.000008 Sat Jul 5 17:41:20 2014 - [info] Connecting to root@172.16.1.241(172.16.1.241).. Creating /masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysqllog/3306, up to mysql3306.000008 Sat Jul 5 17:41:20 2014 - [info] Master setting check done. Sat Jul 5 17:41:20 2014 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sat Jul 5 17:41:20 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx Sat Jul 5 17:41:20 2014 - [info] Connecting to root@172.16.1.242(172.16.1.242:22).. Checking slave recovery environment settings.. Opening /data/mysqldata/3306/relay-log.info ... ok. Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000005 Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000005 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Jul 5 17:41:21 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx Sat Jul 5 17:41:21 2014 - [info] Connecting to root@172.16.1.243(172.16.1.243:22).. Checking slave recovery environment settings.. Opening /data/mysqldata/3306/relay-log.info ... ok. Relay log found at /data/mysqllog/3306, up to mysql-relay-bin.000016 Temporary relay log file is /data/mysqllog/3306/mysql-relay-bin.000016 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Jul 5 17:41:21 2014 - [info] Slaves settings check done. Sat Jul 5 17:41:21 2014 - [info] 172.16.1.241 (current master) +--172.16.1.242 +--172.16.1.243 Sat Jul 5 17:41:21 2014 - [warning] master_ip_failover_script is not defined. Sat Jul 5 17:41:21 2014 - [warning] shutdown_script is not defined. Sat Jul 5 17:41:21 2014 - [info] Set master ping interval 1 seconds. Sat Jul 5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Sat Jul 5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306).. Sat Jul 5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
查看状态
[root@haproxy001 app1]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:21576) is running(0:PING_OK), master:172.16.1.241
8.碰到的问题:
问题1:执行ssh检查时
Address 172.16.1.241 maps to localhost, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
解决:在manager服务器/etc/hosts加上其他服务器的解析
问题2:检查复制时
Fri Jul 4 12:29:25 2014 - [info] Starting SSH connection tests..
cat: /masterha/app1/172.16.1.241_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:25 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
cat: /masterha/app1/172.16.1.242_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
cat: /masterha/app1/172.16.1.243_22_ssh_check.log: No such file or directory
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/5.8.8/Parallel/ForkManager.pm, ln354]
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. SSH Configuration Check Failed!
at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 339
Fri Jul 4 12:29:26 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Fri Jul 4 12:29:26 2014 - [info] Got exit code 1 (Not master dead).
解决:刚开始用一个sudo权限用户执行,因此报错,换成root用户,就没问题了。
问题3:检查复制时
Sat Jul 5 10:38:33 2014 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=xxx
Sat Jul 5 10:38:33 2014 - [info] Connecting to root@172.16.1.242(172.16.1.242:22)..
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln371] Slaves settings check failed!
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln248] Slave configuration failed.
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/bin/masterha_check_repl line 48
Sat Jul 5 10:38:33 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers.
Sat Jul 5 10:38:33 2014 - [info] Got exit code 1 (Not master dead).
解决:/usr/bin/mysqlbinlog版本过低。
[root@haproxy001 ~]# apply_diff_relay_logs --command=test --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.5.35-log --manager_version=0.53 --relay_log_info=/data/mysqldata/3306/relay-log.info --relay_dir=/data/mysqldata/3306/ --slave_pass=hoodong
mysqlbinlog version is 3.2 (included in MySQL Client 5.0 or lower). This is not recommended. Consider upgrading MySQL Client to 5.1 or higher.
mysqlbinlog is 3.2 (included in MySQL Client 5.0 or lower), but MySQL server version is 5.5.35-log. mysqlbinlog can not parse row based events. Terminating script for safety reasons.
at /usr/bin/apply_diff_relay_logs line 463
[root@haproxy001 ~]# /usr/local/mysql/bin/mysqlbinlog -V
/usr/local/mysql/bin/mysqlbinlog Ver 3.3 for Linux at x86_64
[root@haproxy001 ~]# which mysqlbinlog
/usr/bin/mysqlbinlog
[root@haproxy001 ~]# /usr/bin/mysqlbinlog -V
/usr/bin/mysqlbinlog Ver 3.2 for redhat-linux-gnu at x86_64
[root@haproxy001 ~]# sudo rm -f /usr/bin/mysqlbinlog
[root@haproxy001 ~]# sudo ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
测试切换
未配置以下参数,进行切换测试:
#promote script #shutdown_script="" #master_ip_failover_script="/usr/local/bin/master_ip_failover" #master_ip_online_change_script="" #report_script=""
3.1.正常切换测试
测试内容:手动停止写库241mysql服务,观察主从状况,观察manager进程状况。
3.1.1停止主写241mysql端口
[root@hd-dm-test01 ~]# sudo /etc/init.d/mysqld3306 stop Stopping MySQL: [ OK ]
3.1.2观察manager日志,记录切换时间
[root@haproxy001 ~]# sudo tail -f /masterha/app1/manager.log 172.16.1.241 (current master) +--172.16.1.242 +--172.16.1.243 Sat Jul 5 17:41:21 2014 - [warning] master_ip_failover_script is not defined. Sat Jul 5 17:41:21 2014 - [warning] shutdown_script is not defined. Sat Jul 5 17:41:21 2014 - [info] Set master ping interval 1 seconds. Sat Jul 5 17:41:21 2014 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Sat Jul 5 17:41:21 2014 - [info] Starting ping health check on 172.16.1.241(172.16.1.241:3306).. Sat Jul 5 17:41:21 2014 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. Sat Jul 5 18:40:21 2014 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Sat Jul 5 18:40:21 2014 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.53 --binlog_prefix=mysql3306 Sat Jul 5 18:40:21 2014 - [info] HealthCheck: SSH to 172.16.1.241 is reachable. Sat Jul 5 18:40:22 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Jul 5 18:40:22 2014 - [warning] Connection failed 1 time(s).. Sat Jul 5 18:40:23 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Jul 5 18:40:23 2014 - [warning] Connection failed 2 time(s).. Sat Jul 5 18:40:24 2014 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sat Jul 5 18:40:24 2014 - [warning] Connection failed 3 time(s).. Sat Jul 5 18:40:24 2014 - [warning] Master is not reachable from health checker! Sat Jul 5 18:40:24 2014 - [warning] Master 172.16.1.241(172.16.1.241:3306) is not reachable! Sat Jul 5 18:40:24 2014 - [warning] SSH is reachable. Sat Jul 5 18:40:24 2014 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Sat Jul 5 18:40:24 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Jul 5 18:40:24 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Sat Jul 5 18:40:24 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf.. Sat Jul 5 18:40:24 2014 - [info] Dead Servers: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Alive Servers: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Sat Jul 5 18:40:24 2014 - [info] Alive Slaves: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] Checking slave configurations.. Sat Jul 5 18:40:24 2014 - [info] Checking replication filtering settings.. Sat Jul 5 18:40:24 2014 - [info] Replication filtering check ok. Sat Jul 5 18:40:24 2014 - [info] Master is down! Sat Jul 5 18:40:24 2014 - [info] Terminating monitoring script. Sat Jul 5 18:40:24 2014 - [info] Got exit code 20 (Master dead). Sat Jul 5 18:40:24 2014 - [info] MHA::MasterFailover version 0.53. Sat Jul 5 18:40:24 2014 - [info] Starting master failover. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] * Phase 1: Configuration Check Phase.. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] Dead Servers: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Checking master reachability via mysql(double check).. Sat Jul 5 18:40:24 2014 - [info] ok. Sat Jul 5 18:40:24 2014 - [info] Alive Servers: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Sat Jul 5 18:40:24 2014 - [info] Alive Slaves: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] ** Phase 1: Configuration Check Phase completed. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase.. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] Forcing shutdown so that applications never connect to the current master.. Sat Jul 5 18:40:24 2014 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address. Sat Jul 5 18:40:24 2014 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sat Jul 5 18:40:24 2014 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] * Phase 3: Master Recovery Phase.. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] The latest binary log file/position on all slaves is mysql3306.000008:157085 Sat Jul 5 18:40:24 2014 - [info] Latest slaves (Slaves that received relay log files to the latest): Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] The oldest binary log file/position on all slaves is mysql3306.000008:157085 Sat Jul 5 18:40:24 2014 - [info] Oldest slaves: Sat Jul 5 18:40:24 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:24 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:24 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Sat Jul 5 18:40:24 2014 - [info] Sat Jul 5 18:40:24 2014 - [info] Fetching dead master's binary logs.. Sat Jul 5 18:40:24 2014 - [info] Executing command on the dead master 172.16.1.241(172.16.1.241:3306): save_binary_logs --command=save --start_file=mysql3306.000008 --start_pos=157085 --binlog_dir=/data/mysqllog/3306 --output_file=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 Creating /masterha/app1 if not exists.. ok. Concat binary/relay logs from mysql3306.000008 pos 157085 to mysql3306.000008 EOF into /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog .. Dumping binlog format description event, from position 0 to 107.. ok. Dumping effective binlog data from /data/mysqllog/3306/mysql3306.000008 position 157085 to tail(157104).. ok. Concat succeeded. Sat Jul 5 18:40:26 2014 - [info] scp from root@172.16.1.241:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded. Sat Jul 5 18:40:26 2014 - [info] HealthCheck: SSH to 172.16.1.242 is reachable. Sat Jul 5 18:40:27 2014 - [info] HealthCheck: SSH to 172.16.1.243 is reachable. Sat Jul 5 18:40:27 2014 - [info] Sat Jul 5 18:40:27 2014 - [info] * Phase 3.3: Determining New Master Phase.. Sat Jul 5 18:40:27 2014 - [info] Sat Jul 5 18:40:27 2014 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sat Jul 5 18:40:27 2014 - [info] All slaves received relay logs to the same position. No need to resync each other. Sat Jul 5 18:40:27 2014 - [info] Searching new master from slaves.. Sat Jul 5 18:40:27 2014 - [info] Candidate masters from the configuration file: Sat Jul 5 18:40:27 2014 - [info] 172.16.1.242(172.16.1.242:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:27 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:27 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:27 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:40:27 2014 - [info] Replicating from 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:40:27 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:40:27 2014 - [info] Non-candidate masters: Sat Jul 5 18:40:27 2014 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sat Jul 5 18:40:27 2014 - [info] New master is 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:40:27 2014 - [info] Starting master failover.. Sat Jul 5 18:40:27 2014 - [info] From: 172.16.1.241 (current master) +--172.16.1.242 +--172.16.1.243 To: 172.16.1.242 (new master) +--172.16.1.243 Sat Jul 5 18:40:27 2014 - [info] Sat Jul 5 18:40:27 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sat Jul 5 18:40:27 2014 - [info] Sat Jul 5 18:40:27 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sat Jul 5 18:40:27 2014 - [info] Sending binlog.. Sat Jul 5 18:40:28 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to root@172.16.1.242:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] * Phase 3.4: Master Log Apply Phase.. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sat Jul 5 18:40:28 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306).. Sat Jul 5 18:40:28 2014 - [info] Generating diffs succeeded. Sat Jul 5 18:40:28 2014 - [info] Waiting until all relay logs are applied. Sat Jul 5 18:40:28 2014 - [info] done. Sat Jul 5 18:40:28 2014 - [info] Getting slave status.. Sat Jul 5 18:40:28 2014 - [info] This slave(172.16.1.242)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos. Sat Jul 5 18:40:28 2014 - [info] Connecting to the target slave host 172.16.1.242, running recover script.. Sat Jul 5 18:40:28 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.242 --slave_ip=172.16.1.242 --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx Sat Jul 5 18:40:28 2014 - [info] Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.242:3306. This may take long time... Applying log files succeeded. Sat Jul 5 18:40:28 2014 - [info] All relay logs were successfully applied. Sat Jul 5 18:40:28 2014 - [info] Getting new master's binlog name and position.. Sat Jul 5 18:40:28 2014 - [info] mysql3306.000006:107 Sat Jul 5 18:40:28 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000006', MASTER_LOG_POS=107, MASTER_USER='repluser', MASTER_PASSWORD='xxx'; Sat Jul 5 18:40:28 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address. Sat Jul 5 18:40:28 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306).. Sat Jul 5 18:40:28 2014 - [info] ok. Sat Jul 5 18:40:28 2014 - [info] ** Finished master recovery successfully. Sat Jul 5 18:40:28 2014 - [info] * Phase 3: Master Recovery Phase completed. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] * Phase 4: Slaves Recovery Phase.. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 25375. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time.. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] Log messages from 172.16.1.243 ... Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sat Jul 5 18:40:28 2014 - [info] End of log messages from 172.16.1.243. Sat Jul 5 18:40:28 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events. Sat Jul 5 18:40:28 2014 - [info] Generating relay diff files from the latest slave succeeded. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sat Jul 5 18:40:28 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 25377. Check tmp log /masterha/app1/172.16.1.243_3306_20140705184024.log if it takes time.. Sat Jul 5 18:40:30 2014 - [info] Sat Jul 5 18:40:30 2014 - [info] Log messages from 172.16.1.243 ... Sat Jul 5 18:40:30 2014 - [info] Sat Jul 5 18:40:28 2014 - [info] Sending binlog.. Sat Jul 5 18:40:29 2014 - [info] scp from local:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog to root@172.16.1.243:/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog succeeded. Sat Jul 5 18:40:29 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306).. Sat Jul 5 18:40:29 2014 - [info] Generating diffs succeeded. Sat Jul 5 18:40:29 2014 - [info] Waiting until all relay logs are applied. Sat Jul 5 18:40:29 2014 - [info] done. Sat Jul 5 18:40:29 2014 - [info] Getting slave status.. Sat Jul 5 18:40:29 2014 - [info] This slave(172.16.1.243)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql3306.000008:157085). No need to recover from Exec_Master_Log_Pos. Sat Jul 5 18:40:29 2014 - [info] Connecting to the target slave host 172.16.1.243, running recover script.. Sat Jul 5 18:40:29 2014 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=172.16.1.243 --slave_ip=172.16.1.243 --slave_port=3306 --apply_files=/masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog --workdir=/masterha/app1 --target_version=5.5.35-log --timestamp=20140705184024 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.53 --slave_pass=xxx Sat Jul 5 18:40:30 2014 - [info] Applying differential binary/relay log files /masterha/app1/saved_master_binlog_from_172.16.1.241_3306_20140705184024.binlog on 172.16.1.243:3306. This may take long time... Applying log files succeeded. Sat Jul 5 18:40:30 2014 - [info] All relay logs were successfully applied. Sat Jul 5 18:40:30 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306).. Sat Jul 5 18:40:30 2014 - [info] Executed CHANGE MASTER. Sat Jul 5 18:40:30 2014 - [info] Slave started. Sat Jul 5 18:40:30 2014 - [info] End of log messages from 172.16.1.243. Sat Jul 5 18:40:30 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded. Sat Jul 5 18:40:30 2014 - [info] All new slave servers recovered successfully. Sat Jul 5 18:40:30 2014 - [info] Sat Jul 5 18:40:30 2014 - [info] * Phase 5: New master cleanup phease.. Sat Jul 5 18:40:30 2014 - [info] Sat Jul 5 18:40:30 2014 - [info] Resetting slave info on the new master.. Sat Jul 5 18:40:30 2014 - [info] 172.16.1.242: Resetting slave info succeeded. Sat Jul 5 18:40:30 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully. Sat Jul 5 18:40:30 2014 - [info] ----- Failover Report ----- app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded Master 172.16.1.241 is down! Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery. Selected 172.16.1.242 as a new master. 172.16.1.242: OK: Applying all logs succeeded. 172.16.1.243: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242. 172.16.1.242: Resetting slave info succeeded. Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
耗时10s。242提升为主,查看242主从状态
mysql> show slave status\G; Empty set (0.00 sec) ERROR: No query specified mysql> show master status\G; *************************** 1. row *************************** File: mysql3306.000009 Position: 107 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec) ERROR: No query specified
242主从信息被清空。查看243主从状态。
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.1.242 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql3306.000006 Read_Master_Log_Pos: 107 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql3306.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 409 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 12423306 1 row in set (0.00 sec) ERROR: No query specified
3.1.3查看manager状态
masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
启动manager进程,发现启不来
[root@haproxy001 app1]# masterha_manager --conf=/etc/masterha/app1.cnf Sat Jul 5 18:51:34 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Jul 5 18:51:34 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf.. Sat Jul 5 18:51:34 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf.. [root@haproxy001 app1]# masterha_check_status --conf=/etc/masterha/app1.cnf
日志如下:
Sat Jul 5 18:51:34 2014 - [info] MHA::MasterMonitor version 0.53. Sat Jul 5 18:51:34 2014 - [info] Dead Servers: Sat Jul 5 18:51:34 2014 - [info] 172.16.1.241(172.16.1.241:3306) Sat Jul 5 18:51:34 2014 - [info] Alive Servers: Sat Jul 5 18:51:34 2014 - [info] 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:51:34 2014 - [info] 172.16.1.243(172.16.1.243:3306) Sat Jul 5 18:51:34 2014 - [info] Alive Slaves: Sat Jul 5 18:51:34 2014 - [info] 172.16.1.243(172.16.1.243:3306) Version=5.5.35-log (oldest major version between slaves) log-bin:enabled Sat Jul 5 18:51:34 2014 - [info] Replicating from 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:51:34 2014 - [info] Primary candidate for the new Master (candidate_master is set) Sat Jul 5 18:51:34 2014 - [info] Current Alive Master: 172.16.1.242(172.16.1.242:3306) Sat Jul 5 18:51:34 2014 - [info] Checking slave configurations.. Sat Jul 5 18:51:34 2014 - [info] Checking replication filtering settings.. Sat Jul 5 18:51:34 2014 - [info] binlog_do_db= , binlog_ignore_db= Sat Jul 5 18:51:34 2014 - [info] Replication filtering check ok. Sat Jul 5 18:51:34 2014 - [info] Starting SSH connection tests.. Sat Jul 5 18:51:35 2014 - [info] All SSH connection tests passed successfully. Sat Jul 5 18:51:35 2014 - [info] Checking MHA Node version.. Sat Jul 5 18:51:36 2014 - [info] Version check ok. Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln361] Server 172.16.1.241(172.16.1.241:3306) is dead, but must be alive! Check server settings. Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 361 Sat Jul 5 18:51:36 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers. Sat Jul 5 18:51:36 2014 - [info] Got exit code 1 (Not master dead).
测试结果:(1)无写负载压力下,切换正常,耗时10s。(2)manager进程停止,无法控制接下来的主从问题。(3)被提升为写的实例,原主从状态被清空。
预测:因为主从信息被清空,回切肯定失败。
3.2 回切测试1;
测试内容:(1)保留241宕,242主,243从的状态,未配置master_ip_failover_script。(2)启动241的mysql服务,观察主从。(3)启动manager服务,观察主从。(4)观察日志。
3.3.1 启动241的mysql服务。此时manager服务还是起不来,观察主从,观察manager日志:
242的show slave status:
mysql> show slave status\G; Empty set (0.00 sec) ERROR: No query specified mysql> show master status\G; *************************** 1. row *************************** File: mysql3306.000009 Position: 107 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec) ERROR: No query specified
243的show slave status:
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.1.242 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql3306.000007 Read_Master_Log_Pos: 107 Relay_Log_File: mysql-relay-bin.000004 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql3306.000007 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 1013 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 12423306 1 row in set (0.00 sec) ERROR: No query specified
3.3.2 启动manager。观察主从,观察manager日志。
242,243主从照旧,manager还是起不来。
Sat Jul 5 19:27:16 2014 - [info] MHA::MasterMonitor version 0.53. Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/ServerManager.pm, ln274] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations. Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln475] Error happend on checking configurations. at /usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm line 298 Sat Jul 5 19:27:17 2014 - [error][/usr/lib/perl5/vendor_perl/MHA/MasterMonitor.pm, ln644] Error happened on monitoring servers. Sat Jul 5 19:27:17 2014 - [info] Got exit code 1 (Not master dead).
测试结论:未配置master_ip_failover_script等参数状态下,回切失败。
3.3 雪崩测试
少一台服务器,在243再加一个实例3307。
测试内容:(1)写库241停止访问后,242提升为主库;(2)再停止242mysql服务;(3)观察是否会产生新的主库。
结论:没戏,manager起不来。
3.4 数据缺失测试
测试内容:判断是否按照server1,server2,server3的顺序提升新主库。
(1)还原241为主,242,243为从的状态;(2)通过事务提交,让242数据同步比243慢,宕掉241写库,观查哪个实例被提升为写库。(3)通过锁,让243比242数据同步慢,宕掉241写库,观察哪个从库会被提升为写库。
3.4.1 恢复主从,启动manager。
3.4.2 新建表
CREATE TABLE `t_mha` ( `id` int(8) NOT NULL AUTO_INCREMENT, `name` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8
锁242从库表t_mha为只读。
lock table t_mha read;
写数据。
mysql> insert into t_mha(name) select 'abf'; mysql> insert into t_mha(name) select 'abg'; mysql> insert into t_mha(name) select 'abe';
宕掉主写241,观察错误日志,发现一直在等待242的relay log执行,并未发生切换。
Mon Jul 7 17:08:38 2014 - [info] Mon Jul 7 17:08:38 2014 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Mon Jul 7 17:08:38 2014 - [info] Mon Jul 7 17:08:38 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Jul 7 17:08:38 2014 - [info] Mon Jul 7 17:08:38 2014 - [info] * Phase 3.4: Master Log Apply Phase.. Mon Jul 7 17:08:38 2014 - [info] Mon Jul 7 17:08:38 2014 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Mon Jul 7 17:08:38 2014 - [info] Starting recovery on 172.16.1.242(172.16.1.242:3306).. Mon Jul 7 17:08:38 2014 - [info] This server has all relay logs. Waiting all logs to be applied..
解锁242表,观察错误日志,观察主从。
Mon Jul 7 17:10:20 2014 - [info] done. Mon Jul 7 17:10:20 2014 - [info] All relay logs were successfully applied. Mon Jul 7 17:10:20 2014 - [info] Getting new master's binlog name and position.. Mon Jul 7 17:10:20 2014 - [info] mysql3306.000008:620 Mon Jul 7 17:10:20 2014 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.1.242', MASTER_PORT=3306, MASTER_LOG_FILE='mysql3306.000008', MASTER_LOG_POS=620, MASTER_USER='repluser', MASTER_PASSWORD='xxx'; Mon Jul 7 17:10:20 2014 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address. Mon Jul 7 17:10:20 2014 - [info] Setting read_only=0 on 172.16.1.242(172.16.1.242:3306).. Mon Jul 7 17:10:20 2014 - [info] ok. Mon Jul 7 17:10:20 2014 - [info] ** Finished master recovery successfully. Mon Jul 7 17:10:20 2014 - [info] * Phase 3: Master Recovery Phase completed. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] * Phase 4: Slaves Recovery Phase.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] -- Slave diff file generation on host 172.16.1.243(172.16.1.243:3306) started, pid: 9489. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ... Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Jul 7 17:10:20 2014 - [info] End of log messages from 172.16.1.243. Mon Jul 7 17:10:20 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events. Mon Jul 7 17:10:20 2014 - [info] Generating relay diff files from the latest slave succeeded. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9491. Check tmp log /masterha/app1/172.16.1.243_3306_20140707170837.log if it takes time.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] Log messages from 172.16.1.243 ... Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306).. Mon Jul 7 17:10:20 2014 - [info] This server has all relay logs. Waiting all logs to be applied.. Mon Jul 7 17:10:20 2014 - [info] done. Mon Jul 7 17:10:20 2014 - [info] All relay logs were successfully applied. Mon Jul 7 17:10:20 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306).. Mon Jul 7 17:10:20 2014 - [info] Executed CHANGE MASTER. Mon Jul 7 17:10:20 2014 - [info] Slave started. Mon Jul 7 17:10:20 2014 - [info] End of log messages from 172.16.1.243. Mon Jul 7 17:10:20 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded. Mon Jul 7 17:10:20 2014 - [info] All new slave servers recovered successfully. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] * Phase 5: New master cleanup phease.. Mon Jul 7 17:10:20 2014 - [info] Mon Jul 7 17:10:20 2014 - [info] Resetting slave info on the new master.. Mon Jul 7 17:10:20 2014 - [info] 172.16.1.242: Resetting slave info succeeded. Mon Jul 7 17:10:20 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully. Mon Jul 7 17:10:20 2014 - [info] ----- Failover Report ----- app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded Master 172.16.1.241 is down! Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery. Selected 172.16.1.242 as a new master. 172.16.1.242: OK: Applying all logs succeeded. 172.16.1.243: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242. 172.16.1.242: Resetting slave info succeeded. Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
3.4.3 这次锁243的t_mha表。
日志如下:
f it takes time.. Mon Jul 7 17:35:39 2014 - [info] Mon Jul 7 17:35:39 2014 - [info] Log messages from 172.16.1.243 ... Mon Jul 7 17:35:39 2014 - [info] Mon Jul 7 17:35:38 2014 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Jul 7 17:35:39 2014 - [info] End of log messages from 172.16.1.243. Mon Jul 7 17:35:39 2014 - [info] -- 172.16.1.243(172.16.1.243:3306) has the latest relay log events. Mon Jul 7 17:35:39 2014 - [info] Generating relay diff files from the latest slave succeeded. Mon Jul 7 17:35:39 2014 - [info] Mon Jul 7 17:35:39 2014 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Jul 7 17:35:39 2014 - [info] Mon Jul 7 17:35:39 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) started, pid: 9886. Check tmp log /masterha/app1/172.16.1.243_3306_20140707173538.log if it takes time.. Mon Jul 7 17:37:10 2014 - [info] Mon Jul 7 17:37:10 2014 - [info] Log messages from 172.16.1.243 ... Mon Jul 7 17:37:10 2014 - [info] Mon Jul 7 17:35:39 2014 - [info] Starting recovery on 172.16.1.243(172.16.1.243:3306).. Mon Jul 7 17:35:39 2014 - [info] This server has all relay logs. Waiting all logs to be applied.. Mon Jul 7 17:37:10 2014 - [info] done. Mon Jul 7 17:37:10 2014 - [info] All relay logs were successfully applied. Mon Jul 7 17:37:10 2014 - [info] Resetting slave 172.16.1.243(172.16.1.243:3306) and starting replication from the new master 172.16.1.242(172.16.1.242:3306).. Mon Jul 7 17:37:10 2014 - [info] Executed CHANGE MASTER. Mon Jul 7 17:37:10 2014 - [info] Slave started. Mon Jul 7 17:37:10 2014 - [info] End of log messages from 172.16.1.243. Mon Jul 7 17:37:10 2014 - [info] -- Slave recovery on host 172.16.1.243(172.16.1.243:3306) succeeded. Mon Jul 7 17:37:10 2014 - [info] All new slave servers recovered successfully. Mon Jul 7 17:37:10 2014 - [info] Mon Jul 7 17:37:10 2014 - [info] * Phase 5: New master cleanup phease.. Mon Jul 7 17:37:10 2014 - [info] Mon Jul 7 17:37:10 2014 - [info] Resetting slave info on the new master.. Mon Jul 7 17:37:10 2014 - [info] 172.16.1.242: Resetting slave info succeeded. Mon Jul 7 17:37:10 2014 - [info] Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully. Mon Jul 7 17:37:10 2014 - [info] ----- Failover Report ----- app1: MySQL Master failover 172.16.1.241 to 172.16.1.242 succeeded Master 172.16.1.241 is down! Check MHA Manager logs at haproxy001:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.16.1.242(172.16.1.242:3306) has all relay logs for recovery. Selected 172.16.1.242 as a new master. 172.16.1.242: OK: Applying all logs succeeded. 172.16.1.243: This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.16.1.243: OK: Applying all logs succeeded. Slave started, replicating from 172.16.1.242. 172.16.1.242: Resetting slave info succeeded. Master failover to 172.16.1.242(172.16.1.242:3306) completed successfully.
结论:(1)因为sql线程引起的数据延迟问题,mha会等待relay-log全部写入盘中后,才进行切换。(2)由于IO线程引起的数据不一致,目前还未测试。
小结:(1)无负载切换越需10s。
(2)默认按照server模块顺序进行切换。
(3)无脚本无法回切。
(4)主写down后,manager进程会停止。
(5)无法处理雪崩情况。
(6)需要等待relay-log全部追上之后,才会切换主从。
根据这些缺陷,来确认,相关脚本需要实现什么样的功能?
下面配置切换脚本参数,再测试一遍上面案例,绑定vip。
#promote script #shutdown_script="" #master_ip_failover_script="/usr/local/bin/master_ip_failover" #master_ip_online_change_script="" #report_script=""
配置做如下调整:
(1) 配置vip
(2) 配置promote script,master_ip_failover_script,master_ip_online_change_script