各节点架构
(说明:生产环境有两个节点可以组成一套完整集群,我是测试环境,因此对于manager以及atlas和binlog server都是单点,如果生产环境,相应的将manager以及atlas和binlog server每个节点都部署即可)
10.80.8.89 | mysql-master | manager,node | atlas |
10.80.8.90 | mysql-slave | node | binlog server |
安装步骤
10.80.8.89操作命令
1.#增加mha用户
useradd mha
passwd mha
2.#增加mysql用户
useradd mysql
passwd mysql
3.#生成私钥
ssh-keygen -t rsa
4.#各节点建立互信
su -mha ssh-copy-id -i ./.ssh/id_rsa.pub 10.80.8.89 ssh-copy-id -i ./.ssh/id_rsa.pub 10.80.8.90
5.#安装mysql,二进制安装
mkdir -p /soft cd /soft && wget http://cdn.mysql.com//Downloads/MySQL-5.6/mysql-5.6.30-linux-glibc2.5-x86_64.tar.gz tar -zxf mysql-5.6.30-linux-glibc2.5-x86_64.tar.gz ln -s /soft/mysql-5.6.30-linux-glibc2.5-x86_64 /usr/local/mysql echo "export $PATH=/usr/local/mysql/bin:$PATH">>/etc/profile source /etc/profile mkdir -p /data/mysql/3306/{data,logs,etc,tmp} chown -R mysql:mysql /data/mysql/3306
6.#上传配置文件my.cnf到/data/mysql/3306/etc下,配置文件见附件my.cnf,两个节点只需要修改其中的server-id为不一样即可
7.#初始化mysql
/usr/local/mysql/scripts/mysql_install_db --defaults-file=/data/mysql/3306/etc/my.cnf
8.#启动mysql
mysqld_safe --defaults-file=/data/mysql/3306/etc/my.cnf&
9.#安装mha
yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl yum localinstall mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm
10.#配置文件,注意由于我的主从复制启用了gtid,所以binlog server必须配置,如果未启用gtid那么可以不必配置
/etc/masterha_default.cnf
[server default] user=mha password=hellomha ssh_user=mha ping_interval=3 ping_type=INSERT log_level=debug
/etc/app1.cnf
[server default] master_binlog_dir= /data/mysql/3306/logs remote_workdir=/var/log/masterha/app1 manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/app1.log master_ip_failover_script=/bin/master_ip_failover [server1] hostname=10.80.8.89 ip=10.80.8.89 port=3306 master_binlog_dir=/data/mysql/3306/logs candidate_master=1 #check_repl_delay=0 [server2] hostname=10.80.8.90 ip=10.80.8.90 port=3306 master_binlog_dir=/data/mysql/3306/logs candidate_master=1 [binlog1] hostname=10.80.8.90 ip=10.80.8.90 master_binlog_dir=/data/binlog
11.#创建manager所需目录
mkdir -p /var/log/masterha/app1 chown -R mha:mha /var/log/masterha/app1
12.#安装并配置atlas
wget https://github.com/Qihoo360/Atlas/releases/download/2.2.1/Atlas-2.2.1.el6.x86_64.rpm rpm -ivh Atlas-2.2.1.el6.x86_64.rpm
13.#启动atlas
/usr/local/mysql-proxy/bin/mysql-proxyd test start
14.#配置atlas实例,由于本实验主要是配合mha在atlas上线下线mysql master,所以其他参数默认
mysql -h 10.80.8.89 -P2345 -uuser -ppwd -e "add master 10.80.8.89:3306;" mysql -h 10.80.8.89 -P2345 -uuser -ppwd -e "add slave 10.80.8.90:3306;"
15.#创建mha用户和复制用户
grant all privileges on *.* to 'mha'@'%' identified by 'hellomha'; grant replication slave on *.* to 'repl'@'%' identified by 'hellorepl';
16. #更改自动切换atlas里的master /bin/master_ip_failover
use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; my $proxy_ip_1 = "10.80.8.89"; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); sub add_vip { my $output1 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P 2345 -uuser -ppwd -e "remove backend 1;"` ; my $output2 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P 2345 -uuser -ppwd -e "remove backend 1;"` ; my $output3 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P 2345 -uuser -ppwd -e "save config;"` ; my $output4 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P 2345 -uuser -ppwd -e "add master $new_master_host:3306;"` ; my $output5 = `/usr/local/mysql/bin/mysql -h $proxy_ip_1 -P 2345 -uuser -ppwd -e "save config;"` ; # my $output1 = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $orig_master_host /sbin/ip addr del $vip/32 dev $if`; # my $output2 = `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $new_master_host /sbin/ip addr add $vip/32 dev $if`; } exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { # updating global catalog, etc $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print "Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master #print "Creating app user on the new master..\n"; #FIXME_xxx_create_user( $new_master_handler->{dbh} ); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc &add_vip(); $exit_code = 0; }; if ($@) { warn $@; # If you want to continue failover, exit 10. exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
10.80.8.90操作命令
1-9步参见89操作命令
10.#创建manager所需目录
mkdir -p /var/log/masterha/app1 chown -R mha:mha /var/log/masterha/app1
11.#与89建立主从复制关系
mysql>CHANGE MASTER TO MASTER_HOST="10.80.8.89", MASTER_USER="repl", MASTER_PASSWORD="hellomrepl", MASTER_AUTO_POSITION = 1; mysql>start slave; mysql>set global read_only=on;
12.建立binlog server
mkdir -p /data/binlog chown -R mha:mha /data/binlog su - mha /usr/local/mysql/bin/mysqlbinlog -R --raw --host=10.80.8.89 --user=repl --port=3306 --password=hellorepl --stop-never -t -r /data/binlog mysql_bin.000001 &
10.80.8.89操作命令
1.#测试mha ssh
su - mha [mha@iZ250pd1qtuZ ~]$ masterha_check_ssh --conf=/etc/app1.cnf Wed May 18 11:28:54 2016 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed May 18 11:28:54 2016 - [info] Reading application default configuration from /etc/app1.cnf.. Wed May 18 11:28:54 2016 - [info] Reading server configuration from /etc/app1.cnf.. Wed May 18 11:28:54 2016 - [info] Starting SSH connection tests.. Wed May 18 11:28:55 2016 - [debug] Wed May 18 11:28:54 2016 - [debug] Connecting via SSH from mha@10.80.8.89(10.80.8.89:22) to mha@10.80.8.90(10.80.8.90:22).. Wed May 18 11:28:55 2016 - [debug] ok. Wed May 18 11:28:55 2016 - [debug] Wed May 18 11:28:55 2016 - [debug] Connecting via SSH from mha@10.80.8.90(10.80.8.90:22) to mha@10.80.8.89(10.80.8.89:22).. Wed May 18 11:28:55 2016 - [debug] ok. Wed May 18 11:28:55 2016 - [info] All SSH connection tests passed successfully.
2.#测试mha repl
[mha@iZ250pd1qtuZ ~]$ masterha_check_repl --conf=/etc/app1.cnf Wed May 18 11:32:38 2016 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed May 18 11:32:38 2016 - [info] Reading application default configuration from /etc/app1.cnf.. Wed May 18 11:32:38 2016 - [info] Reading server configuration from /etc/app1.cnf.. Wed May 18 11:32:38 2016 - [info] MHA::MasterMonitor version 0.56. Wed May 18 11:32:39 2016 - [debug] Connecting to servers.. Wed May 18 11:32:39 2016 - [debug] Connected to: 10.110.18.89(10.110.18.89:3306), user=mha Wed May 18 11:32:39 2016 - [debug] Number of slave worker threads on host 10.110.18.89(10.110.18.89:3306): 0 Wed May 18 11:32:39 2016 - [debug] Connected to: 10.110.18.90(10.110.18.90:3306), user=mha Wed May 18 11:32:39 2016 - [debug] Number of slave worker threads on host 10.110.18.90(10.110.18.90:3306): 0 Wed May 18 11:32:39 2016 - [debug] Comparing MySQL versions.. Wed May 18 11:32:39 2016 - [debug] Comparing MySQL versions done. Wed May 18 11:32:39 2016 - [debug] Connecting to servers done. Wed May 18 11:32:39 2016 - [info] GTID failover mode = 1 Wed May 18 11:32:39 2016 - [info] Dead Servers: Wed May 18 11:32:39 2016 - [info] Alive Servers: Wed May 18 11:32:39 2016 - [info] 10.110.18.89(10.110.18.89:3306) Wed May 18 11:32:39 2016 - [info] 10.110.18.90(10.110.18.90:3306) Wed May 18 11:32:39 2016 - [info] Alive Slaves: Wed May 18 11:32:39 2016 - [info] 10.110.18.90(10.110.18.90:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled Wed May 18 11:32:39 2016 - [info] GTID ON Wed May 18 11:32:39 2016 - [debug] Relay log info repository: TABLE Wed May 18 11:32:39 2016 - [info] Replicating from 10.110.18.89(10.110.18.89:3306) Wed May 18 11:32:39 2016 - [info] Primary candidate for the new Master (candidate_master is set) Wed May 18 11:32:39 2016 - [info] Current Alive Master: 10.110.18.89(10.110.18.89:3306) Wed May 18 11:32:39 2016 - [info] Checking slave configurations.. Wed May 18 11:32:39 2016 - [info] read_only=1 is not set on slave 10.110.18.90(10.110.18.90:3306). Wed May 18 11:32:39 2016 - [info] Checking replication filtering settings.. Wed May 18 11:32:39 2016 - [info] binlog_do_db= , binlog_ignore_db= information_schema,monitor,performance_schema Wed May 18 11:32:39 2016 - [info] Replication filtering check ok. Wed May 18 11:32:39 2016 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Wed May 18 11:32:39 2016 - [debug] SSH connection test to 10.110.18.90, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Wed May 18 11:32:39 2016 - [info] HealthCheck: SSH to 10.110.18.90 is reachable. Wed May 18 11:32:39 2016 - [info] Binlog server 10.110.18.90 is reachable. Wed May 18 11:32:39 2016 - [info] Checking recovery script configurations on 10.110.18.90(10.110.18.90:3306).. Wed May 18 11:32:39 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/binlog --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql_bin.000006 --debug Wed May 18 11:32:39 2016 - [info] Connecting to mha@10.110.18.90(10.110.18.90:22).. Creating /var/log/masterha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/binlog, up to mysql_bin.000006 Wed May 18 11:32:39 2016 - [info] Binlog setting check done. Wed May 18 11:32:39 2016 - [info] Checking SSH publickey authentication settings on the current master.. Wed May 18 11:32:39 2016 - [debug] SSH connection test to 10.110.18.89, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=5, timeout 5 Wed May 18 11:32:39 2016 - [info] HealthCheck: SSH to 10.110.18.89 is reachable. Wed May 18 11:32:39 2016 - [info] 10.110.18.89(10.110.18.89:3306) (current master) +--10.110.18.90(10.110.18.90:3306) Wed May 18 11:32:39 2016 - [info] Checking replication health on 10.110.18.90.. Wed May 18 11:32:39 2016 - [info] ok. Wed May 18 11:32:39 2016 - [info] Checking master_ip_failover_script status: Wed May 18 11:32:39 2016 - [info] /bin/master_ip_failover --command=status --ssh_user=mha --orig_master_host=10.110.18.89 --orig_master_ip=10.110.18.89 --orig_master_port=3306 Wed May 18 11:32:39 2016 - [info] OK. Wed May 18 11:32:39 2016 - [warning] shutdown_script is not defined. Wed May 18 11:32:39 2016 - [debug] Disconnected from 10.110.18.89(10.110.18.89:3306) Wed May 18 11:32:39 2016 - [debug] Disconnected from 10.110.18.90(10.110.18.90:3306) Wed May 18 11:32:39 2016 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
3.#启动masterha_manager
nohup masterha_manager --conf=/etc/app1.cnf
实验阶段
实验一.关闭master,观察从库有没有切成主库,观察atlas里的backend是不是已经下线了旧主并且上线了新主
1.#查看atlas中的backends信息
[mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -P2345 -uuser -p -e "select * from backends;" Enter password: +-------------+-------------------+-------+------+ | backend_ndx | address | state | type | +-------------+-------------------+-------+------+ | 1 | 10.80.8.89:3306 | up | rw | | 2 | 10.80.8.90:3306 | up | ro | +-------------+-------------------+-------+------+
2.#关闭master
mysqladmin -uroot -p shutdown
3.#查看atlas中的从库信息
[mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -P2345 -uuser -p -e "select * from backends;" Enter password: +-------------+-------------------+-------+------+ | backend_ndx | address | state | type | +-------------+-------------------+-------+------+ | 1 | 10.80.8.90:3306 | up | ro | +-------------+-------------------+-------+------+
4.#查看mha切换日志
----- Failover Report ----- app1: MySQL Master failover 10.80.8.89(10.80.8.89:3306) to 10.80.8.90(10.80.8.90:3306) succeeded Master 10.80.8.89(10.80.8.89:3306) is down! Check MHA Manager logs at iZ250pd1qtuZ:/var/log/masterha/app1/app1.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 10.80.8.89(10.80.8.89:3306) Selected 10.80.8.90(10.80.8.90:3306) as a new master. 10.80.8.90(10.80.8.90:3306): OK: Applying all logs succeeded. 10.80.8.90(10.80.8.90:3306): OK: Activated master IP address. 10.80.8.90(10.80.8.90:3306): Resetting slave info succeeded. Master failover to 10.80.8.90(10.80.8.90:3306) completed successfully.
5.#将下线的主作为从挂到新主
mysqld_safe --defaults-file=/data/mysql/3306/etc/my.cnf & mysql -uroot -p mysql>CHANGE MASTER TO MASTER_HOST="10.80.8.90", MASTER_USER="repl", MASTER_PASSWORD="hellorepl", MASTER_AUTO_POSITION = 1; mysql>start slave; mysql>set global read_only=on;
6.#启动binlog server,指向新master90
su - mha /usr/local/mysql/bin/mysqlbinlog -R --raw --host=10.80.8.90 --user=repl --port=3306 --password=hellorepl --stop-never -t -r /data/binlog mysql_bin.000001 &
7.#清理mha目录下所有内容,发生切换后,状态文件还存在,会影响下次的切换,如果不删除日志,那么mha在8小时内如果再次发生宕机,那么不会发生自动切换
rm -rf /var/log/masterha/app1/*
8.#启动mha
nohup masterha_manager --conf=/etc/app1.cnf &
实验二.从库关闭slave 的io thread线程,主库写入数据,此时kill -9 mysql实例,观察主库写入的数据有没有正确被mha补全到新主
1.#从停止slave io_thread
mysql> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gaoquan |
| infra |
| mysql |
| performance_schema |
| test |
+--------------------+
6 rows in set (0.00 sec)
2.#主创建数据库
mysql> create database sbtest; Query OK, 1 row affected (0.00 sec) mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | gaoquan | | infra | | mysql | | performance_schema | | sbtest | | test | +--------------------+ 7 rows in set (0.01 sec)
3.#从查看数据库
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | gaoquan | | infra | | mysql | | performance_schema | | test | +--------------------+ 6 rows in set (0.00 sec)
4.#杀掉主的mysql进程,mysqld_safe和mysqld进程一起干掉
kill -9 27229 26253
5.#从查看数据库中的sbtest是否已经正常拉取过来
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | gaoquan | | infra | | mysql | | performance_schema | | sbtest | | test | +--------------------+ 7 rows in set (0.00 sec)
实验三.交互式切换主从数据库,适用于升级Master,主从都存活,但是需要升级内存或者更换控制器等必须重启服务器时。(注意,master_ip_online_change脚本是在切换到新主之前执行的,因此从库必须制定read_only)
1.#确保masterha_manager不存活,如果存活则停止
masterha_stop --conf=/etc/app1.cnf
2.#将主由90切换到89
[mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=alive --conf=/etc/app1.cnf --new_master_host=10.80.8.89 --new_master_port=3306
Wed May 18 20:07:17 2016 - [info] MHA::MasterRotate version 0.56.
Wed May 18 20:07:17 2016 - [info] Starting online master switch..
Wed May 18 20:07:17 2016 - [info]
Wed May 18 20:07:17 2016 - [info] * Phase 1: Configuration Check Phase..
Wed May 18 20:07:17 2016 - [info]
Wed May 18 20:07:17 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..
Wed May 18 20:07:17 2016 - [info] Reading application default configuration from /etc/app1.cnf..
Wed May 18 20:07:17 2016 - [info] Reading server configuration from /etc/app1.cnf..
Wed May 18 20:07:17 2016 - [debug] Connecting to servers..
Wed May 18 20:07:17 2016 - [debug] Connected to: 10.80.8.89(10.80.8.89:3306), user=mha
Wed May 18 20:07:17 2016 - [debug] Number of slave worker threads on host 10.80.8.89(10.80.8.89:3306): 0
Wed May 18 20:07:17 2016 - [debug] Connected to: 10.80.8.90(10.80.8.90:3306), user=mha
Wed May 18 20:07:17 2016 - [debug] Number of slave worker threads on host 10.80.8.90(10.80.8.90:3306): 0
Wed May 18 20:07:17 2016 - [debug] Comparing MySQL versions..
Wed May 18 20:07:17 2016 - [debug] Comparing MySQL versions done.
Wed May 18 20:07:17 2016 - [debug] Connecting to servers done.
Wed May 18 20:07:17 2016 - [info] GTID failover mode = 1
Wed May 18 20:07:17 2016 - [info] Current Alive Master: 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:17 2016 - [info] Alive Slaves:
Wed May 18 20:07:17 2016 - [info] 10.80.8.89(10.80.8.89:3306) Version=5.6.29-76.2-log (oldest major version between slaves) log-bin:enabled
Wed May 18 20:07:17 2016 - [info] GTID ON
Wed May 18 20:07:17 2016 - [debug] Relay log info repository: TABLE
Wed May 18 20:07:17 2016 - [info] Replicating from 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:17 2016 - [info] Primary candidate for the new Master (candidate_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.80.8.90(10.80.8.90:3306)? (YES/no): yes
Wed May 18 20:07:20 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Wed May 18 20:07:20 2016 - [info] ok.
Wed May 18 20:07:20 2016 - [info] Checking MHA is not monitoring or doing failover..
Wed May 18 20:07:20 2016 - [info] Checking replication health on 10.80.8.89..
Wed May 18 20:07:20 2016 - [info] ok.
Wed May 18 20:07:20 2016 - [info] 10.80.8.89 can be new master.
Wed May 18 20:07:20 2016 - [info]
From:
10.80.8.90(10.80.8.90:3306) (current master)
+--10.80.8.89(10.80.8.89:3306)
To:
10.80.8.89(10.80.8.89:3306) (new master)
Starting master switch from 10.80.8.90(10.80.8.90:3306) to 10.80.8.89(10.80.8.89:3306)? (yes/NO): yes
Wed May 18 20:07:21 2016 - [info] Checking whether 10.80.8.89(10.80.8.89:3306) is ok for the new master..
Wed May 18 20:07:21 2016 - [info] ok.
Wed May 18 20:07:21 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Wed May 18 20:07:21 2016 - [info]
Wed May 18 20:07:21 2016 - [debug] Disconnected from 10.80.8.90(10.80.8.90:3306)
Wed May 18 20:07:21 2016 - [info] * Phase 2: Rejecting updates Phase..
Wed May 18 20:07:21 2016 - [info]
Wed May 18 20:07:21 2016 - [info] Executing master ip online change script to disable write on the current master:
Wed May 18 20:07:21 2016 - [info] /bin/master_ip_online_change --command=stop --orig_master_host=10.80.8.90 --orig_master_ip=10.80.8.90 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='yz6xFOkF1mh3Wmkr1Rsz' --new_master_host=10.80.8.89 --new_master_ip=10.80.8.89 --new_master_port=3306 --new_master_user='mha' --new_master_password='yz6xFOkF1mh3Wmkr1Rsz' --orig_master_ssh_user=mha --new_master_ssh_user=mha
Wed May 18 20:07:21 2016 556919 Set read_only on the new master.. ok.
Wed May 18 20:07:21 2016 559821 drop vip 192.168.1.100..
Warning: Using a password on the command line interface can be insecure.
ERROR 1105 (07000) at line 1: invalid backend_id
Warning: Using a password on the command line interface can be insecure.
ERROR 1105 (07000) at line 1: invalid backend_id
Warning: Using a password on the command line interface can be insecure.
Wed May 18 20:07:21 2016 582940 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
{'Time' => '2598','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:22 2016 083378 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
{'Time' => '2599','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:22 2016 584152 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
{'Time' => '2599','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:23 2016 084569 Set read_only=1 on the orig master.. ok.
Wed May 18 20:07:23 2016 085676 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
{'Time' => '2600','db' => undef,'Id' => '51','User' => 'mha','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Rows_examined' => '0','Info' => undef,'Rows_sent' => '0','Host' => '10.80.8.89:11427'}
Wed May 18 20:07:23 2016 585008 Killing all application threads..
Wed May 18 20:07:23 2016 585529 done.
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Wed May 18 20:07:23 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] Orig master binlog:pos is mysql_bin.000007:193297.
Wed May 18 20:07:23 2016 - [debug] Fetching current slave status..
Wed May 18 20:07:23 2016 - [debug] Fetching current slave status done.
Wed May 18 20:07:23 2016 - [info] Waiting to execute all relay logs on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [info] master_pos_wait(mysql_bin.000007:193297) completed on 10.80.8.89(10.80.8.89:3306). Executed 0 events.
Wed May 18 20:07:23 2016 - [info] done.
Wed May 18 20:07:23 2016 - [debug] Stopping SQL thread on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [debug] done.
Wed May 18 20:07:23 2016 - [info] Getting new master's binlog name and position..
Wed May 18 20:07:23 2016 - [info] mysql_bin.000007:202995
Wed May 18 20:07:23 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.80.8.89', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='mha', MASTER_PASSWORD='xxx';
Wed May 18 20:07:23 2016 - [info] Executing master ip online change script to allow write on the new master:
Wed May 18 20:07:23 2016 - [info] /bin/master_ip_online_change --command=start --orig_master_host=10.80.8.90 --orig_master_ip=10.80.8.90 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='yz6xFOkF1mh3Wmkr1Rsz' --new_master_host=10.80.8.89 --new_master_ip=10.80.8.89 --new_master_port=3306 --new_master_user='mha' --new_master_password='yz6xFOkF1mh3Wmkr1Rsz' --orig_master_ssh_user=mha --new_master_ssh_user=mha
Wed May 18 20:07:23 2016 695963 Set read_only=0 on the new master.
Wed May 18 20:07:23 2016 696375Add vip 192.168.1.100 on eth0..
Warning: Using a password on the command line interface can be insecure.
Warning: Using a password on the command line interface can be insecure.
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] * Switching slaves in parallel..
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] Unlocking all tables on the orig master:
Wed May 18 20:07:23 2016 - [info] Executing UNLOCK TABLES..
Wed May 18 20:07:23 2016 - [info] ok.
Wed May 18 20:07:23 2016 - [info] All new slave servers switched successfully.
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [info] * Phase 5: New master cleanup phase..
Wed May 18 20:07:23 2016 - [info]
Wed May 18 20:07:23 2016 - [debug] Clearing slave info..
Wed May 18 20:07:23 2016 - [debug] Stopping slave IO/SQL thread on 10.80.8.89(10.80.8.89:3306)..
Wed May 18 20:07:23 2016 - [debug] done.
Wed May 18 20:07:23 2016 - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Wed May 18 20:07:23 2016 - [info] 10.80.8.89: Resetting slave info succeeded.
Wed May 18 20:07:23 2016 - [info] Switching master to 10.80.8.89(10.80.8.89:3306) completed successfully.
Wed May 18 20:07:23 2016 - [debug] Disconnected from 10.80.8.89(10.80.8.89:3306)
Wed May 18 20:07:23 2016 - [debug] Disconnected from 10.80.8.90(10.80.8.90:3306)
[mha@iZ250pd1qtuZ ~]$ mysql -h10.80.8.89 -uuser -ppwd -P2345 -e "select * from backends;";
Warning: Using a password on the command line interface can be insecure.
+-------------+-------------------+-------+------+
| backend_ndx | address | state | type |
+-------------+-------------------+-------+------+
| 1 | 10.80.8.89:3306 | up | rw |
+-------------+-------------------+-------+------+
实验四:master挂掉,手动进行切换
[mha@iZ250pd1qtuZ ~]$ mysqladmin -uroot -p shutdown Enter password: 160518 20:19:18 mysqld_safe mysqld from pid file /data/mysql/3306/logs/mysqld.3306.pid ended [mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=dead --conf=/etc/conf/masterha/app1.cnf --dead_master_host=10.80.8.89 --new_master_host=10.80.8.90 --interactive=0 --dead_master_ip=<dead_master_ip> is not set. Using 10.80.8.89. --dead_master_port=<dead_master_port> is not set. Using 3306. Wed May 18 20:19:26 2016 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed May 18 20:19:26 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: /etc/conf/masterha/app1.cnf:No such file or directory at /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line 104. [mha@iZ250pd1qtuZ ~]$ masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=10.80.8.89 --new_master_host=10.80.8.90 --interactive=0 --dead_master_ip=<dead_master_ip> is not set. Using 10.80.8.89. --dead_master_port=<dead_master_port> is not set. Using 3306. Wed May 18 20:19:39 2016 - [info] Reading default configuration from /etc/masterha_default.cnf.. Wed May 18 20:19:39 2016 - [info] Reading application default configuration from /etc/app1.cnf.. Wed May 18 20:19:39 2016 - [info] Reading server configuration from /etc/app1.cnf.. [mha@iZ250pd1qtuZ ~]$ mysql -h 10.80.8.89 -uuser -ppwd -P2345 -e "select * from backends"; Warning: Using a password on the command line interface can be insecure. +-------------+-------------------+-------+------+ | backend_ndx | address | state | type | +-------------+-------------------+-------+------+ | 1 | 10.80.8.90:3306 | up | rw | +-------------+-------------------+-------+------+
总结:
1.由于用的是阿里云,因此没法模拟主机直接掉电关闭的情况
2.mha很重要的一个功能是补全slave的差异并与新主建立主从关系,由于我们的环境并发及重要程度不是特别大,一主挂多从的成本过高,因此一主一从已经足够,如果是业务相对比较重要,那么还是建议一主多从。
3.secondary_check_script没有开启,有需要的可以开启试下,即通过多个节点去访问master,防止由于网络抖动而误判master异常而导致master切换