MHA的MySQL高可用方案实战

功能:

1master的故障切换(keepalived VIP的飘移)

2)主从复制角色的提升和重新转向

其中master 对外提供写服务,备选master2(实际的slave提供读服务,slave1slave2也提供相关的读服务,一旦master1宕机,将会把备选的master2提升为新的master1slave1slave2指向新的master

3MHA由两部分组成,MHA manager(管理节点)和MHA node(数据节点),MHA manager可以单独部署一台独立的机器上管理多个master-slave集群,也可以部署一台slave上。MHA node 运行在每台mysql服务器上及manager服务器上,MHA manager会定时探测集群中的master节点,当matser出现故障,它可以自动拥有最新数据的slave提升为新的master,然后将所有其他的slave重新指向提升的master.

5台机器

Master1主机192.168.30.25 server1     

Master2 备主机192.168.30.24 server2     

Slave1主机192.168.30.23 server3       

Slave2主机192.168.30.21 server4       

manager主机192.168.30.26 server5      监控复制组

 

配置所有主机名映射主机名一定要写清楚,因为后面实验会用到

[root@bogon ~]# vim /etc/hosts

192.168.30.25 server1

192.168.30.24 server2

192.168.30.23 server3

192.168.30.21 server4

192.168.30.26 server5

 

关闭所有主机防火墙

Systemctl stop firewalld

Setenforce 0

Iptables -F

Systemctl disadble firewalld

下载mha-manager mha-node

所有主机安装mha node 及相关perl依赖包

rpm -ivh epel-release-latest-7.noarch.rpm      epel源可以在阿里云镜像网站下载   

yum install -y perl-DBD-MySQL.x86_64 perl-DBI.x86_64 perl-CPAN perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker

安装后检查是否全部安装

软件包 perl-DBD-MySQL-4.023-6.el7.x86_64 已安装并且是最新版本

软件包 perl-DBI-1.627-4.el7.x86_64 已安装并且是最新版本

软件包 perl-CPAN-1.9800-292.el7.noarch 已安装并且是最新版本

软件包 1:perl-ExtUtils-CBuilder-0.28.2.6-292.el7.noarch 已安装并且是最新版本

软件包 perl-ExtUtils-MakeMaker-6.68-3.el7.noarch 已安装并且是最新版本

无须任何处理

 

所有主机安装mha node

tar xf mha4mysql-node-0.56.tar.gz

cd mha4mysql-node-0.56/

perl Makefile.PL

make && make install

 

Mha node安装后在/usr/local/bin生成以下脚本

[root@bogon ~]# ls -l /usr/local/bin

总用量 40

-r-xr-xr-x. 1 root root 16346 4月  11 12:29 apply_diff_relay_logs

-r-xr-xr-x. 1 root root  4807 4月  11 12:29 filter_mysqlbinlog

-r-xr-xr-x. 1 root root  7401 4月  11 12:29 purge_relay_logs

-r-xr-xr-x. 1 root root  7395 4月  11 12:29 save_binary_logs

 

Server5机器安装mha manager 只需一台作为manager监控即可

[root@bogon ~]# yum install -y perl perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-DBI perl-Time-HiRes

 

之前时候安装会安装不上,需要rpm包,添进去就可以

有的时候本地yum仓库没有log包和perl-parallel,需要去联网阿里云的yum仓库,epel一定要放在/etc/yum.repos.d下,不然找不到包的位置

wget -O /etc/yum.repos.d/aliyun.repo https://mirrors.aliyun.com/repo/Centos-7.repo

[root@bogon ~]# rpm -ivh perl-Config-Tiny-2.14-7.el7.noarch.rpm 必须加上

 

安装mha manager软件包

tar xf mha4mysql-manager-0.56.tar.gz

cd mha4mysql-manager-0.56/

perl Makefile.PL

make && make install

 

安装后会有以下脚本文件

[root@bogon mha4mysql-manager-0.56]# ls -l /usr/local/bin

 

总用量 76

-r-xr-xr-x. 1 root root 16346 4月  11 12:29 apply_diff_relay_logs

-r-xr-xr-x. 1 root root  4807 4月  11 12:29 filter_mysqlbinlog

-r-xr-xr-x. 1 root root  1995 4月  11 13:31 masterha_check_repl

-r-xr-xr-x. 1 root root  1779 4月  11 13:31 masterha_check_ssh

-r-xr-xr-x. 1 root root  1865 4月  11 13:31 masterha_check_status

-r-xr-xr-x. 1 root root  3201 4月  11 13:31 masterha_conf_host

-r-xr-xr-x. 1 root root  2517 4月  11 13:31 masterha_manager

-r-xr-xr-x. 1 root root  2165 4月  11 13:31 masterha_master_monitor

-r-xr-xr-x. 1 root root  2373 4月  11 13:31 masterha_master_switch

-r-xr-xr-x. 1 root root  3879 4月  11 13:31 masterha_secondary_check

-r-xr-xr-x. 1 root root  1739 4月  11 13:31 masterha_stop

-r-xr-xr-x. 1 root root  7401 4月  11 12:29 purge_relay_logs

-r-xr-xr-x. 1 root root  7395 4月  11 12:29 save_binary_logs

 

配置SSH 秘钥对验证

服务器先生成一个秘钥对

把自己的公钥传给对方

[root@server5 ~]# ssh-keygen -t rsa

[root@server5 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.25

[root@server5 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.24

[root@server5 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.23

[root@server5 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.21

 

 

Server 5 (192.168.30.26)

 

注意每个都需要进行测试,输入yes ,这样不影响故障切换,对每个主机号SSH控制

 

[root@server5 ~]# ssh server1

[root@server5 ~]# ssh server2

[root@server5 ~]# ssh server3

[root@server5 ~]# ssh server4

 

Master(192.168.30.25):

[root@server1 ~]# ssh-keygen -t rsa

[root@server1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.24

[root@server1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.23

[root@server1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.21

 

Master2(192.168.30.24):

[root@server2 ~]# ssh-keygen -t rsa

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.25

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.23

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.21

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.26

 

Slave1(192.168.30.23):

[root@server3 ~]# ssh-keygen -t rsa

[root@server3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.25

[root@server3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.24

[root@server3 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.21

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.26

 

Slave2(192.168.30.21):

[root@server4 ~]# ssh-keygen -t rsa

[root@server4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.25

[root@server4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.24

[root@server4 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.23

[root@server2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.30.26

 

 

安装MySQL

25-24-23-21 主机都安装mysql

[root@server1~]# yum -y install mariadb mariadb-server mariadb-devel

[root@server1 ~]# systemctl start mariadb

设置数据库密码

[root@server1~]# mysqladmin -u root password 123456

[root@server1 ~]# mysql -u root -p123456

 

搭建主从复制环境

修改mysql 主机的配置文件

Master (192.168.30.25):

[mysqld]

server-id = 1

log-bin=master-bin

log-slave-updates=true

relay_log_purge=0

[root@server1 ~]# systemctl restart mariadb

 

Master2(192.168.30.24):

[mysqld]

server-id = 2

log-bin=master-bin

log-slave-updates=true

relay_log_purge=0

[root@server2 ~]# systemctl restart mariadb

 

Slave1(192.168.30.23):

[mysqld]

server-id = 3

log-bin=mysql-bin

relay-log=slave-relay-bin

log-slave-updates=true

relay_log-purge=0

[root@server3 ~]# systemctl restart mariadb

 

Slave2(192.168.30.21):

 

[mysqld]

server-id = 4

log-bin=mysql-bin

relay-log=slave-relay-bin

log-slave-updates=true

relay_log_purge=0

[root@server4 ~]# systemctl restart mariadb

 

Mysql 服务器都创建复制授权用户

MariaDB [(none)]> grant replication slave on *.* to 'repl'@'192.168.30.%'identified by '123456';

flush privileges;

 

查看主库备份时的binlog名称和位置

MariaDB [(none)]> show master status;

+-------------------+----------+--------------+------------------+

| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |

+-------------------+----------+--------------+------------------+

| master-bin.000001 |      472 |              |                  |

+-------------------+----------+--------------+------------------+

 

24-23-21从服务器    授权主masterip192.168.30.25 日志文件需要写master上的

MariaDB [(none)]> stop slave;

MariaDB [(none)]> change master to

    -> master_host='192.168.30.25',

    -> master_user='repl',

    -> master_password='123456',

    -> master_log_file='master-bin.000001',

    -> master_log_pos=472;

MariaDB [(none)]> start slave;

MariaDB [(none)]> show slave status\G

并且为  yes

Yes

 

 

三台slave服务器设置read_only状态 (读)

从库对外只提供读服务,只所有没有写进mysql配置文件,是因为随时server2会提升为master

[root@server2 ~]# mysql -uroot -p123456 -e 'set global read_only=1'

[root@server3 ~]# mysql -u root -p123456 -e 'set global read_only=1'

[root@server4 ~]# mysql -u root -p123456 -e 'set global read_only=1'

 

创建监控用户(25-24-23-21 主机上操作)

MariaDB [(none)]> grant all privileges on *.* to 'root'@'192.168.30.%' identified by '123456';

MariaDB [(none)]> flush privileges;

为自己的主机名授权

MariaDB [(none)]> grant all privileges on *.* to 'root'@'server1' identified by '123456';

MariaDB [(none)]> flush privileges;

到这里整个mysql 主从集群环境已经搭建完毕

 

配置MHA环境

创建MHA的工作目录及相关配置文件

Server5(192.168.30.26):在软件包加压后的目录里面有样配置文件

 

 

修改app1.cnf配置文件

/usr/local/bin/master_ip_failover 脚本需要根据自己环境修改IP和网卡名称等

[root@server5 ~]# mkdir /etc/masterha

[root@server5 ~]# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/

[root@server5 ~]# vim /etc/masterha/app1.cnf

[server default]

manager_workdir=/var/log/masterha/app1

manager_log=/var/log/masterha/app1/manager.log

master_binlog_dir=/var/lib/mysql

master_ip_failover_script=/usr/local/bin/master_ip_failover

password=123456

user=root

ping_interval=1

remote_workdir=/tmp

repl_password=123456

repl_user=repl

 

[server1]

hostname=server1

port=3306

#candidate_master=1

 

[server2]

hostname=server2

candidate_master=1

port=3306

check_repl_delay=0

 

[server3]

hostname=server3

port=3306

 

[server4]

hostname=server4

port=3306

 

配置故障转移脚本

[root@server5 ~]# vim /usr/local/bin/master_ip_failover

#!/usr/bin/env perl  

use strict;

use warnings FATAL =>'all';

 

use Getopt::Long;

 

my (

$command,          $ssh_user,        $orig_master_host, $orig_master_ip,

$orig_master_port, $new_master_host, $new_master_ip,    $new_master_port

);

 

my $vip = '192.168.30.254';  # Virtual IP  

my $key = "1";

my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";

my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

 

$ssh_user="root";

 

GetOptions(

'command=s'          => \$command,

'ssh_user=s'         => \$ssh_user,

'orig_master_host=s' => \$orig_master_host,

'orig_master_ip=s'   => \$orig_master_ip,

'orig_master_port=i' => \$orig_master_port,

'new_master_host=s'  => \$new_master_host,

'new_master_ip=s'    => \$new_master_ip,

'new_master_port=i'  => \$new_master_port,

);

 

exit &main();

 

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

 

if ( $command eq "stop" || $command eq "stopssh" ) {

 

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  

@

 # If you manage master ip address at global catalog database,  

        # invalidate orig_master_ip here.  

        my $exit_code = 1;

       #eval {

            #print "Disabling the VIP - $vip on old master: $orig_master_host\n";

#&stop_vip();

#           $exit_code = 0;

#        };

eval {

print "Disabling the VIP on old master: $orig_master_host \n";

#my $ping=`ping -c 1 10.0.0.13 |grep "packet loss" |awk -F',''{print $3}' |awk '{print $1}'`;

#if ($ping le "90.0%"&& $ping gt "0.0%" ){

#$exit_code = 0;

#}

#else {

& stop_vip();

# updating global catalog, etc

$exit_code = 0;

#}

};

 

        if ($@) {

            warn "Got Error: $@\n";

            exit $exit_code;

        }

                                  

        exit $exit_code;

}

elsif ( $command eq "start" ) {

 

        # all arguments are passed.  

        # If you manage master ip address at global catalog database,  

        # activate new_master_ip here.  

        # You can also grant write access (create user, set read_only=0, etc) here.  

my $exit_code = 10;

        eval {

            print "Enabling the VIP - $vip on new master: $new_master_host \n";

&start_vip();

            $exit_code = 0;

        };

        if ($@) {

            warn $@;

            exit $exit_code;

 }

        exit $exit_code;

}

elsif ( $command eq "status" ) {

        print "Checking the Status of the script.. OK \n";

        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;

        exit 0;

}

else {

&usage();

        exit 1;

}

}

 

# A simple system call that enable the VIP on the new master  

sub start_vip() {

 

`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

# A simple system call that disable the VIP on the old_master  

sub stop_vip() {

`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

}

 

sub usage {

print

"Usage: master_ip_failover –command=start|stop|stopssh|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=po  

rt –new_master_host=host –new_master_ip=ip –new_master_port=port\n";

}

           

[root@server5 ~]# chmod +x /usr/local/bin/master_ip_failover

 

设置从库relay log 的清除方式(24-23-21

手动清除

mysql -u root -p123456 -e 'set global relay_log_purge=0;'

 

配置从库(24-23-21relay_log清除脚本加入计划任务

[root@server2 ~]# vim purge_relay_log.sh

!/bin/bash

user=root

passwd=123456

port=3306

log_dir='/tmp'

work_dir='/tmp'

purge='/usr/local/bin/purge_relay_logs'

if [ ! -d $log_dir ]

then

        mkdir $log_dir -p

fi

$purge --user=$user --password=$password --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1

 

[root@server2 ~]# chmod +x purge_relay_log.sh

[root@server2 ~]# crontab -e

0   4  *   *    *  /bin/bash /root/purgr_relay_log.sh

手动清除中继日志在从节点上

在从(24-23-21

[root@server2 ~]# purge_relay_logs --user=root --password=123456 --disable_relay_log_purge --port=3306 --workdir=/tmp

 

检测MHA shh 通信状态

[root@server5 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf

[root@server5 ~]#  masterha_check_ssh --conf=/etc/masterha/app1.cnf

Sat Apr 13 19:42:47 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Apr 13 19:42:47 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Sat Apr 13 19:42:47 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Sat Apr 13 19:42:47 2019 - [info] Starting SSH connection tests..

Sat Apr 13 19:42:51 2019 - [debug]

Sat Apr 13 19:42:48 2019 - [debug]  Connecting via SSH from root@server2(192.168.30.24:22) to root@server1(192.168.30.25:22)..

Sat Apr 13 19:42:49 2019 - [debug]   ok.

Sat Apr 13 19:42:49 2019 - [debug]  Connecting via SSH from root@server2(192.168.30.24:22) to root@server3(192.168.30.23:22)..

Sat Apr 13 19:42:50 2019 - [debug]   ok.

Sat Apr 13 19:42:50 2019 - [debug]  Connecting via SSH from root@server2(192.168.30.24:22) to root@server4(192.168.30.21:22)..

Sat Apr 13 19:42:51 2019 - [debug]   ok.

Sat Apr 13 19:42:51 2019 - [debug]

Sat Apr 13 19:42:47 2019 - [debug]  Connecting via SSH from root@server1(192.168.30.25:22) to root@server2(192.168.30.24:22)..

Sat Apr 13 19:42:48 2019 - [debug]   ok.

Sat Apr 13 19:42:48 2019 - [debug]  Connecting via SSH from root@server1(192.168.30.25:22) to root@server3(192.168.30.23:22)..

Sat Apr 13 19:42:49 2019 - [debug]   ok.

Sat Apr 13 19:42:49 2019 - [debug]  Connecting via SSH from root@server1(192.168.30.25:22) to root@server4(192.168.30.21:22)..

Sat Apr 13 19:42:50 2019 - [debug]   ok.

Sat Apr 13 19:42:51 2019 - [debug]

Sat Apr 13 19:42:48 2019 - [debug]  Connecting via SSH from root@server3(192.168.30.23:22) to root@server1(192.168.30.25:22)..

Sat Apr 13 19:42:50 2019 - [debug]   ok.

Sat Apr 13 19:42:50 2019 - [debug]  Connecting via SSH from root@server3(192.168.30.23:22) to root@server2(192.168.30.24:22)..

Sat Apr 13 19:42:50 2019 - [debug]   ok.

Sat Apr 13 19:42:50 2019 - [debug]  Connecting via SSH from root@server3(192.168.30.23:22) to root@server4(192.168.30.21:22)..

Sat Apr 13 19:42:51 2019 - [debug]   ok.

Sat Apr 13 19:42:52 2019 - [debug]

Sat Apr 13 19:42:49 2019 - [debug]  Connecting via SSH from root@server4(192.168.30.21:22) to root@server1(192.168.30.25:22)..

Sat Apr 13 19:42:50 2019 - [debug]   ok.

Sat Apr 13 19:42:50 2019 - [debug]  Connecting via SSH from root@server4(192.168.30.21:22) to root@server2(192.168.30.24:22)..

Sat Apr 13 19:42:51 2019 - [debug]   ok.

Sat Apr 13 19:42:51 2019 - [debug]  Connecting via SSH from root@server4(192.168.30.21:22) to root@server3(192.168.30.23:22)..

Sat Apr 13 19:42:52 2019 - [debug]   ok.

Sat Apr 13 19:42:52 2019 - [info] All SSH connection tests passed successfully.

 

检查整个集群的状态

[root@server5 ~]#  masterha_check_repl --conf=/etc/masterha/app1.cnf

[root@server5 ~]#  masterha_check_repl --conf=/etc/masterha/app1.cnf

Sat Apr 13 20:05:46 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Apr 13 20:05:46 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Sat Apr 13 20:05:46 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Sat Apr 13 20:05:46 2019 - [info] MHA::MasterMonitor version 0.56.

Sat Apr 13 20:05:47 2019 - [info] Dead Servers:

Sat Apr 13 20:05:47 2019 - [info] Alive Servers:

Sat Apr 13 20:05:47 2019 - [info]   server1(192.168.30.25:3306)

Sat Apr 13 20:05:47 2019 - [info]   server2(192.168.30.24:3306)

Sat Apr 13 20:05:47 2019 - [info]   server3(192.168.30.23:3306)

Sat Apr 13 20:05:47 2019 - [info]   server4(192.168.30.21:3306)

Sat Apr 13 20:05:47 2019 - [info] Alive Slaves:

Sat Apr 13 20:05:47 2019 - [info]   server2(192.168.30.24:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:05:47 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:05:47 2019 - [info]     Primary candidate for the new Master (candidate_master is set)

Sat Apr 13 20:05:47 2019 - [info]   server3(192.168.30.23:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:05:47 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:05:47 2019 - [info]   server4(192.168.30.21:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:05:47 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:05:47 2019 - [info] Current Alive Master: server1(192.168.30.25:3306)

Sat Apr 13 20:05:47 2019 - [info] Checking slave configurations..

Sat Apr 13 20:05:47 2019 - [warning]  relay_log_purge=0 is not set on slave server2(192.168.30.24:3306).

Sat Apr 13 20:05:47 2019 - [warning]  relay_log_purge=0 is not set on slave server3(192.168.30.23:3306).

Sat Apr 13 20:05:47 2019 - [warning]  relay_log_purge=0 is not set on slave server4(192.168.30.21:3306).

Sat Apr 13 20:05:47 2019 - [info] Checking replication filtering settings..

Sat Apr 13 20:05:47 2019 - [info]  binlog_do_db= , binlog_ignore_db=

Sat Apr 13 20:05:47 2019 - [info]  Replication filtering check ok.

Sat Apr 13 20:05:47 2019 - [info] Starting SSH connection tests..

Sat Apr 13 20:05:53 2019 - [info] All SSH connection tests passed successfully.

Sat Apr 13 20:05:53 2019 - [info] Checking MHA Node version..

Sat Apr 13 20:05:54 2019 - [info]  Version check ok.

Sat Apr 13 20:05:54 2019 - [info] Checking SSH publickey authentication settings on the current master..

Sat Apr 13 20:05:54 2019 - [info] HealthCheck: SSH to server1 is reachable.

Sat Apr 13 20:05:54 2019 - [info] Master MHA Node version is 0.56.

Sat Apr 13 20:05:54 2019 - [info] Checking recovery script configurations on the current master..

Sat Apr 13 20:05:54 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000002

Sat Apr 13 20:05:54 2019 - [info]   Connecting to root@server1(server1)..

  Creating /tmp if not exists..    ok.

  Checking output directory is accessible or not..

   ok.

  Binlog found at /var/lib/mysql, up to master-bin.000002

Sat Apr 13 20:05:55 2019 - [info] Master setting check done.

Sat Apr 13 20:05:55 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Sat Apr 13 20:05:55 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server2 --slave_ip=192.168.30.24 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:05:55 2019 - [info]   Connecting to root@192.168.30.24(server2:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:05:55 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server3 --slave_ip=192.168.30.23 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:05:55 2019 - [info]   Connecting to root@192.168.30.23(server3:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:05:56 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server4 --slave_ip=192.168.30.21 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:05:56 2019 - [info]   Connecting to root@192.168.30.21(server4:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:05:56 2019 - [info] Slaves settings check done.

Sat Apr 13 20:05:56 2019 - [info]

server1 (current master)

 +--server2

 +--server3

 +--server4

 

Sat Apr 13 20:05:56 2019 - [info] Checking replication health on server2..

Sat Apr 13 20:05:56 2019 - [info]  ok.

Sat Apr 13 20:05:56 2019 - [info] Checking replication health on server3..

Sat Apr 13 20:05:56 2019 - [info]  ok.

Sat Apr 13 20:05:56 2019 - [info] Checking replication health on server4..

Sat Apr 13 20:05:56 2019 - [info]  ok.

Sat Apr 13 20:05:56 2019 - [info] Checking master_ip_failover_script status:

Sat Apr 13 20:05:56 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server1 --orig_master_ip=192.168.30.25 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Apr 13 20:05:56 2019 - [info]  OK.

Sat Apr 13 20:05:56 2019 - [warning] shutdown_script is not defined.

Sat Apr 13 20:05:56 2019 - [info] Got exit code 0 (Not master dead).

 

MySQL Replication Health is OK.

 

VIP配置管理

打开在前面编辑的文件/etc/masterha/app1.cnf检查如下行是否正确,再检查集群状态

[root@server5 ~]# grep -n 'master_ip_failover_script' /etc/masterha/app1.cnf

5:master_ip_failover_script=/usr/local/bin/master_ip_failover

 

Master1(192.168.30.25)

[root@server1 ~]# ip a |grep ens33

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    inet 192.168.30.25/24 brd 192.168.30.255 scope global noprefixroute ens33

inet 192.168.30.254/24 brd 192.168.30.255 scope global secondary ens33:1

 

Server5(192.168.30.26)修改故障转移脚本

[root@server5 ~]# head -15 /usr/local/bin/master_ip_failover

#!/usr/bin/env perl  

use strict;  

use warnings FATAL =>'all';  

 

use Getopt::Long;  

 

my (  

$command,          $ssh_user,        $orig_master_host, $orig_master_ip,  

$orig_master_port, $new_master_host, $new_master_ip,    $new_master_port  

);  

 

my $vip = '192.168.30.254';  # Virtual IP  

my $key = "1";  

my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";  

my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

/usr/local/bin/master_ip_failover 文件的内容意思当主库发生故障时,会触发MHA切换

MHA manager 会停掉主库的ens33:1接口,触发虚拟IP飘移到备选从库,从而完成切换

 

Server5 (192.168.30.26),检查manager 状态

[root@server5 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 is stopped(2:NOT_RUNNING).

如果正常会显示ping OK,否则会显示not_running,代表MHA 监控没有开启

 

Server5 (192.168.30.26) 开启manager 监控

--remove_dead_master_conf 代表党发送主从切换后,老的主库的IP将会从配置文件中移除

 

[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null > /var/log/masterha/app1/manager.log 2>&1&

[1] 10458

 

 

Server5(192.168.30.26) 查看server5 监控是否正常

[root@server5 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf

app1 (pid:10458) is running(0:PING_OK), master:server1

可以看见已经在监控了

 

Server5(192.168.30.26)查看启动日志

[root@server5 ~]# cat /var/log/masterha/app1/manager.log

Sat Apr 13 20:27:40 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Apr 13 20:27:40 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Sat Apr 13 20:27:40 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Sat Apr 13 20:27:40 2019 - [info] MHA::MasterMonitor version 0.56.

Sat Apr 13 20:27:41 2019 - [info] Dead Servers:

Sat Apr 13 20:27:41 2019 - [info] Alive Servers:

Sat Apr 13 20:27:41 2019 - [info]   server1(192.168.30.25:3306)

Sat Apr 13 20:27:41 2019 - [info]   server2(192.168.30.24:3306)

Sat Apr 13 20:27:41 2019 - [info]   server3(192.168.30.23:3306)

Sat Apr 13 20:27:41 2019 - [info]   server4(192.168.30.21:3306)

Sat Apr 13 20:27:41 2019 - [info] Alive Slaves:

Sat Apr 13 20:27:41 2019 - [info]   server2(192.168.30.24:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:27:41 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:27:41 2019 - [info]     Primary candidate for the new Master (candidate_master is set)

Sat Apr 13 20:27:41 2019 - [info]   server3(192.168.30.23:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:27:41 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:27:41 2019 - [info]   server4(192.168.30.21:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 20:27:41 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 20:27:41 2019 - [info] Current Alive Master: server1(192.168.30.25:3306)

Sat Apr 13 20:27:41 2019 - [info] Checking slave configurations..

Sat Apr 13 20:27:41 2019 - [warning]  relay_log_purge=0 is not set on slave server2(192.168.30.24:3306).

Sat Apr 13 20:27:41 2019 - [warning]  relay_log_purge=0 is not set on slave server3(192.168.30.23:3306).

Sat Apr 13 20:27:41 2019 - [warning]  relay_log_purge=0 is not set on slave server4(192.168.30.21:3306).

Sat Apr 13 20:27:41 2019 - [info] Checking replication filtering settings..

Sat Apr 13 20:27:41 2019 - [info]  binlog_do_db= , binlog_ignore_db=

Sat Apr 13 20:27:41 2019 - [info]  Replication filtering check ok.

Sat Apr 13 20:27:41 2019 - [info] Starting SSH connection tests..

Sat Apr 13 20:27:46 2019 - [info] All SSH connection tests passed successfully.

Sat Apr 13 20:27:46 2019 - [info] Checking MHA Node version..

Sat Apr 13 20:27:47 2019 - [info]  Version check ok.

Sat Apr 13 20:27:47 2019 - [info] Checking SSH publickey authentication settings on the current master..

Sat Apr 13 20:27:48 2019 - [info] HealthCheck: SSH to server1 is reachable.

Sat Apr 13 20:27:48 2019 - [info] Master MHA Node version is 0.56.

Sat Apr 13 20:27:48 2019 - [info] Checking recovery script configurations on the current master..

Sat Apr 13 20:27:48 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000002

Sat Apr 13 20:27:48 2019 - [info]   Connecting to root@server1(server1)..

  Creating /tmp if not exists..    ok.

  Checking output directory is accessible or not..

   ok.

  Binlog found at /var/lib/mysql, up to master-bin.000002

Sat Apr 13 20:27:48 2019 - [info] Master setting check done.

Sat Apr 13 20:27:48 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Sat Apr 13 20:27:48 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server2 --slave_ip=192.168.30.24 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:27:48 2019 - [info]   Connecting to root@192.168.30.24(server2:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:27:49 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server3 --slave_ip=192.168.30.23 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:27:49 2019 - [info]   Connecting to root@192.168.30.23(server3:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:27:49 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server4 --slave_ip=192.168.30.21 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 20:27:49 2019 - [info]   Connecting to root@192.168.30.21(server4:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 20:27:50 2019 - [info] Slaves settings check done.

Sat Apr 13 20:27:50 2019 - [info]

server1 (current master)

 +--server2

 +--server3

 +--server4

 

Sat Apr 13 20:27:50 2019 - [info] Checking master_ip_failover_script status:

Sat Apr 13 20:27:50 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server1 --orig_master_ip=192.168.30.25 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Apr 13 20:27:50 2019 - [info]  OK.

Sat Apr 13 20:27:50 2019 - [warning] shutdown_script is not defined.

Sat Apr 13 20:27:50 2019 - [info] Set master ping interval 1 seconds.

Sat Apr 13 20:27:50 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.

Sat Apr 13 20:27:50 2019 - [info] Starting ping health check on server1(192.168.30.25:3306)..

Sat Apr 13 20:27:50 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

注意其中ping succeeded waiting until MYSQL doesn’t respond 说明整个系统已经开始监控了

 

关闭MHA manager 监控,忽略操作

Masterha_stop --conf=/etc/masterha/app1.cnf

 

发现已经将VIP:192.168.30.254 绑定在网卡ens33

[root@server1 ~]# ip a |grep ens33

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    inet 192.168.30.25/24 brd 192.168.30.255 scope global noprefixroute ens33

    inet 192.168.30.254/24 brd 192.168.30.255 scope global secondary ens33:1

 

Master(192.168.30.25) 模拟主库故障

[root@server1 ~]# systemctl stop mariadb

[root@server1 ~]# ip a | grep ens33

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    inet 192.168.30.25/24 brd 192.168.30.255 scope global noprefixroute ens33

 

查看slave1 (192.168.30.23)状态 已经切换到master2备用主上(192.168.30.24

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.24

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 2472

               Relay_Log_File: slave-relay-bin.000002

                Relay_Log_Pos: 530

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 2472

              Relay_Log_Space: 824

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 2

1 row in set (0.00 sec)

 

查看slave2(192.168.30.21)状态 已经切换到master2备用主上(192.168.30.24

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.24

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 2472

               Relay_Log_File: slave-relay-bin.000002

                Relay_Log_Pos: 530

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 2472

              Relay_Log_Space: 824

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 2

1 row in set (0.00 sec)

 

Server5(192.168.30.26)监控已经自动关闭

^C[1]+  完成                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

Server(192.168.30.26) 查看监控配置文件已经发生了变化(server1的配置已被删除)

[root@server5 ~]# cat /etc/masterha/app1.cnf

[server default]

manager_log=/var/log/masterha/app1/manager.log

manager_workdir=/var/log/masterha/app1

master_binlog_dir=/var/lib/mysql

master_ip_failover_script=/usr/local/bin/master_ip_failover

password=123456

ping_interval=1

remote_workdir=/tmp

repl_password=123456

repl_user=repl

user=root

 

[server2]

candidate_master=1

check_repl_delay=0

hostname=server2

port=3306

 

[server3]

hostname=server3

port=3306

 

[server4]

hostname=server4

port=3306

 

Server5(192.168.30.25) 故障切换过程中的日志文件内容如下

[root@server5 ~]# tail -f /var/log/masterha/app1/manager.log

Sat Apr 13 20:59:11 2019 - [info] Checking master_ip_failover_script status:

Sat Apr 13 20:59:11 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server2 --orig_master_ip=192.168.30.24 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Apr 13 20:59:11 2019 - [info]  OK.

Sat Apr 13 20:59:11 2019 - [warning] shutdown_script is not defined.

Sat Apr 13 20:59:11 2019 - [info] Set master ping interval 1 seconds.

Sat Apr 13 20:59:11 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.

Sat Apr 13 20:59:11 2019 - [info] Starting ping health check on server2(192.168.30.24:3306)..

Sat Apr 13 20:59:11 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

Sat Apr 13 21:00:10 2019 - [info] Got terminate signal. Exit.

 

故障主库修复及vip 切回测试

Master(192.168.30.25):

[root@server1 ~]# systemctl start mariadb

[root@server1 ~]# netstat -anpt |grep :3306

tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      7435/mysqld  

Master (192.168.30.25)指向新的主库       

[root@server1 ~]# mysql -u root -p123456

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 2

Server version: 5.5.56-MariaDB MariaDB Server

 

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

 

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> stop slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)

 

MariaDB [(none)]> change master to

    -> master_host='192.168.30.24',

    -> master_user='repl',

    -> master_password='123456';

Query OK, 0 rows affected (0.00 sec)

 

MariaDB [(none)]> start slave;

Query OK, 0 rows affected (0.12 sec)

 

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.24

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 2472

               Relay_Log_File: mariadb-relay-bin.000004

                Relay_Log_Pos: 1421

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 2472

              Relay_Log_Space: 2002

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 2

1 row in set (0.00 sec)

 

 

Server5(192.168.30.26) 修改监控配置文件添加server1配置

[server1]

hostname=server1

port=3306

 

Server5(192.168.30.26) 检测集群状态

[root@server5 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

Sat Apr 13 21:25:17 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Apr 13 21:25:17 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Sat Apr 13 21:25:17 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Sat Apr 13 21:25:17 2019 - [info] MHA::MasterMonitor version 0.56.

Sat Apr 13 21:25:26 2019 - [info] Dead Servers:

Sat Apr 13 21:25:26 2019 - [info] Alive Servers:

Sat Apr 13 21:25:26 2019 - [info]   server1(192.168.30.25:3306)

Sat Apr 13 21:25:26 2019 - [info]   server2(192.168.30.24:3306)

Sat Apr 13 21:25:26 2019 - [info]   server3(192.168.30.23:3306)

Sat Apr 13 21:25:26 2019 - [info]   server4(192.168.30.21:3306)

Sat Apr 13 21:25:26 2019 - [info] Alive Slaves:

Sat Apr 13 21:25:26 2019 - [info]   server1(192.168.30.25:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:25:26 2019 - [info]     Replicating from 192.168.30.24(192.168.30.24:3306)

Sat Apr 13 21:25:26 2019 - [info]   server3(192.168.30.23:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:25:26 2019 - [info]     Replicating from 192.168.30.24(192.168.30.24:3306)

Sat Apr 13 21:25:26 2019 - [info]   server4(192.168.30.21:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:25:26 2019 - [info]     Replicating from 192.168.30.24(192.168.30.24:3306)

Sat Apr 13 21:25:26 2019 - [info] Current Alive Master: server2(192.168.30.24:3306)

Sat Apr 13 21:25:26 2019 - [info] Checking slave configurations..

Sat Apr 13 21:25:26 2019 - [info]  read_only=1 is not set on slave server1(192.168.30.25:3306).

Sat Apr 13 21:25:26 2019 - [warning]  relay_log_purge=0 is not set on slave server1(192.168.30.25:3306).

Sat Apr 13 21:25:26 2019 - [warning]  relay_log_purge=0 is not set on slave server3(192.168.30.23:3306).

Sat Apr 13 21:25:26 2019 - [warning]  relay_log_purge=0 is not set on slave server4(192.168.30.21:3306).

Sat Apr 13 21:25:26 2019 - [info] Checking replication filtering settings..

Sat Apr 13 21:25:26 2019 - [info]  binlog_do_db= , binlog_ignore_db=

Sat Apr 13 21:25:26 2019 - [info]  Replication filtering check ok.

Sat Apr 13 21:25:26 2019 - [info] Starting SSH connection tests..

Sat Apr 13 21:25:31 2019 - [info] All SSH connection tests passed successfully.

Sat Apr 13 21:25:31 2019 - [info] Checking MHA Node version..

Sat Apr 13 21:25:32 2019 - [info]  Version check ok.

Sat Apr 13 21:25:32 2019 - [info] Checking SSH publickey authentication settings on the current master..

Sat Apr 13 21:25:32 2019 - [info] HealthCheck: SSH to server2 is reachable.

Sat Apr 13 21:25:33 2019 - [info] Master MHA Node version is 0.56.

Sat Apr 13 21:25:33 2019 - [info] Checking recovery script configurations on the current master..

Sat Apr 13 21:25:33 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000003

Sat Apr 13 21:25:33 2019 - [info]   Connecting to root@server2(server2)..

  Creating /tmp if not exists..    ok.

  Checking output directory is accessible or not..

   ok.

  Binlog found at /var/lib/mysql, up to master-bin.000003

Sat Apr 13 21:25:33 2019 - [info] Master setting check done.

Sat Apr 13 21:25:33 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Sat Apr 13 21:25:33 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server1 --slave_ip=192.168.30.25 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:25:33 2019 - [info]   Connecting to root@192.168.30.25(server1:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000004

    Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000004

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:25:34 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server3 --slave_ip=192.168.30.23 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:25:34 2019 - [info]   Connecting to root@192.168.30.23(server3:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:25:34 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server4 --slave_ip=192.168.30.21 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:25:34 2019 - [info]   Connecting to root@192.168.30.21(server4:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:25:35 2019 - [info] Slaves settings check done.

Sat Apr 13 21:25:35 2019 - [info]

server2 (current master)

 +--server1

 +--server3

 +--server4

 

Sat Apr 13 21:25:35 2019 - [info] Checking replication health on server1..

Sat Apr 13 21:25:35 2019 - [info]  ok.

Sat Apr 13 21:25:35 2019 - [info] Checking replication health on server3..

Sat Apr 13 21:25:35 2019 - [info]  ok.

Sat Apr 13 21:25:35 2019 - [info] Checking replication health on server4..

Sat Apr 13 21:25:35 2019 - [info]  ok.

Sat Apr 13 21:25:35 2019 - [info] Checking master_ip_failover_script status:

Sat Apr 13 21:25:35 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server2 --orig_master_ip=192.168.30.24 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Apr 13 21:25:35 2019 - [info]  OK.

Sat Apr 13 21:25:35 2019 - [warning] shutdown_script is not defined.

Sat Apr 13 21:25:35 2019 - [info] Got exit code 0 (Not master dead).

 

MySQL Replication Health is OK.

 

Server5(192.168.30.26) 开启监控

[root@server5 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null > /var/log/masterha/app1/manager.log 2>&1&

[2] 14177

 

Master(192.168.30.24) 关闭现在主库mysql

[root@server2 ~]# systemctl stop mariadb

[root@server2 ~]# netstat -anpt |grep :3306

 

Master192.168.30.21) 发现关了第二个mastervip就会自动分配到原来的master1

[root@server1 ~]# ip a |grep ens33

2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    inet 192.168.30.25/24 brd 192.168.30.255 scope global noprefixroute ens33

    inet 192.168.30.254/24 brd 192.168.30.255 scope global secondary ens33:1

 

Slave 1(192.168.30.23)状态

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.25

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 1807

               Relay_Log_File: slave-relay-bin.000002

                Relay_Log_Pos: 530

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 1807

              Relay_Log_Space: 824

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

1 row in set (0.00 sec)

 

Slave2192.168.30.24)状态

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.25

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 1807

               Relay_Log_File: slave-relay-bin.000002

                Relay_Log_Pos: 530

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 1807

              Relay_Log_Space: 824

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

1 row in set (0.00 sec)

 

Server5 192.168.30.26)配置文件变化,(已经移除故障件server2配置)

[server default]

manager_log=/var/log/masterha/app1/manager.log

manager_workdir=/var/log/masterha/app1

master_binlog_dir=/var/lib/mysql

master_ip_failover_script=/usr/local/bin/master_ip_failover

password=123456

ping_interval=1

remote_workdir=/tmp

repl_password=123456

repl_user=repl

user=root

 

[server1]

hostname=server1

port=3306

 

[server3]

hostname=server3

port=3306

 

[server4]

hostname=server4

port=3306

 

Server5(192.168.30.25)监控日志

[root@server5 ~]# tail -f /var/log/masterha/app1/manager.log

Selected server1 as a new master.

server1: OK: Applying all logs succeeded.

server1: OK: Activated master IP address.

server3: This host has the latest relay log events.

server4: This host has the latest relay log events.

Generating relay diff files from the latest slave succeeded.

server4: OK: Applying all logs succeeded. Slave started, replicating from server1.

server3: OK: Applying all logs succeeded. Slave started, replicating from server1.

server1: Resetting slave info succeeded.

Master failover to server1(192.168.30.25:3306) completed successfully.

 

修复master(192.168.30.24)主机

[root@server2 ~]# systemctl start mariadb

[root@server2 ~]# netstat -anpt |grep :3306

tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      8451/mysqld         

 

Master(192.1968.30.24)指向新的主库

 

[root@server2 ~]# mysql -u root -p123456

MariaDB [(none)]> stop slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)

 

MariaDB [(none)]> change master to

    -> master_host='192.168.30.25',

    -> master_user='repl',

    -> master_password='123456';

Query OK, 0 rows affected (0.00 sec)

 

MariaDB [(none)]> start slave;

Query OK, 0 rows affected (0.00 sec)

 

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 192.168.30.25

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: master-bin.000003

          Read_Master_Log_Pos: 1807

               Relay_Log_File: mariadb-relay-bin.000004

                Relay_Log_Pos: 530

        Relay_Master_Log_File: master-bin.000003

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB:

          Replicate_Ignore_DB:

           Replicate_Do_Table:

       Replicate_Ignore_Table:

      Replicate_Wild_Do_Table:

  Replicate_Wild_Ignore_Table:

                   Last_Errno: 0

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 1807

              Relay_Log_Space: 2447

              Until_Condition: None

               Until_Log_File:

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File:

           Master_SSL_CA_Path:

              Master_SSL_Cert:

            Master_SSL_Cipher:

               Master_SSL_Key:

        Seconds_Behind_Master: 0

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error:

               Last_SQL_Errno: 0

               Last_SQL_Error:

  Replicate_Ignore_Server_Ids:

             Master_Server_Id: 1

 

Server5(192.168.30.26)修改监控配置文件添加server2配置

[server2]

hostname=server2

candidate_master=1

port=3306

check_repl_delay=0

 

Server5(192.168.30.26) 检查集群状态

 

[root@server5 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

Sat Apr 13 21:51:01 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sat Apr 13 21:51:01 2019 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Sat Apr 13 21:51:01 2019 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Sat Apr 13 21:51:01 2019 - [info] MHA::MasterMonitor version 0.56.

Sat Apr 13 21:51:02 2019 - [info] Dead Servers:

Sat Apr 13 21:51:02 2019 - [info] Alive Servers:

Sat Apr 13 21:51:02 2019 - [info]   server1(192.168.30.25:3306)

Sat Apr 13 21:51:02 2019 - [info]   server2(192.168.30.24:3306)

Sat Apr 13 21:51:02 2019 - [info]   server3(192.168.30.23:3306)

Sat Apr 13 21:51:02 2019 - [info]   server4(192.168.30.21:3306)

Sat Apr 13 21:51:02 2019 - [info] Alive Slaves:

Sat Apr 13 21:51:02 2019 - [info]   server2(192.168.30.24:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:51:02 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 21:51:02 2019 - [info]     Primary candidate for the new Master (candidate_master is set)

Sat Apr 13 21:51:02 2019 - [info]   server3(192.168.30.23:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:51:02 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 21:51:02 2019 - [info]   server4(192.168.30.21:3306)  Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled

Sat Apr 13 21:51:02 2019 - [info]     Replicating from 192.168.30.25(192.168.30.25:3306)

Sat Apr 13 21:51:02 2019 - [info] Current Alive Master: server1(192.168.30.25:3306)

Sat Apr 13 21:51:02 2019 - [info] Checking slave configurations..

Sat Apr 13 21:51:02 2019 - [info]  read_only=1 is not set on slave server2(192.168.30.24:3306).

Sat Apr 13 21:51:02 2019 - [warning]  relay_log_purge=0 is not set on slave server2(192.168.30.24:3306).

Sat Apr 13 21:51:02 2019 - [warning]  relay_log_purge=0 is not set on slave server3(192.168.30.23:3306).

Sat Apr 13 21:51:02 2019 - [warning]  relay_log_purge=0 is not set on slave server4(192.168.30.21:3306).

Sat Apr 13 21:51:02 2019 - [info] Checking replication filtering settings..

Sat Apr 13 21:51:02 2019 - [info]  binlog_do_db= , binlog_ignore_db=

Sat Apr 13 21:51:02 2019 - [info]  Replication filtering check ok.

Sat Apr 13 21:51:02 2019 - [info] Starting SSH connection tests..

Sat Apr 13 21:51:10 2019 - [info] All SSH connection tests passed successfully.

Sat Apr 13 21:51:10 2019 - [info] Checking MHA Node version..

Sat Apr 13 21:51:12 2019 - [info]  Version check ok.

Sat Apr 13 21:51:12 2019 - [info] Checking SSH publickey authentication settings on the current master..

Sat Apr 13 21:51:13 2019 - [info] HealthCheck: SSH to server1 is reachable.

Sat Apr 13 21:51:13 2019 - [info] Master MHA Node version is 0.56.

Sat Apr 13 21:51:13 2019 - [info] Checking recovery script configurations on the current master..

Sat Apr 13 21:51:13 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000003

Sat Apr 13 21:51:13 2019 - [info]   Connecting to root@server1(server1)..

  Creating /tmp if not exists..    ok.

  Checking output directory is accessible or not..

   ok.

  Binlog found at /var/lib/mysql, up to master-bin.000003

Sat Apr 13 21:51:13 2019 - [info] Master setting check done.

Sat Apr 13 21:51:13 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..

Sat Apr 13 21:51:13 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server2 --slave_ip=192.168.30.24 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:51:13 2019 - [info]   Connecting to root@192.168.30.24(server2:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000004

    Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000004

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:51:14 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server3 --slave_ip=192.168.30.23 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:51:14 2019 - [info]   Connecting to root@192.168.30.23(server3:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:51:14 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server4 --slave_ip=192.168.30.21 --slave_port=3306 --workdir=/tmp --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx

Sat Apr 13 21:51:14 2019 - [info]   Connecting to root@192.168.30.21(server4:22)..

  Checking slave recovery environment settings..

    Opening /var/lib/mysql/relay-log.info ... ok.

    Relay log found at /var/lib/mysql, up to slave-relay-bin.000002

    Temporary relay log file is /var/lib/mysql/slave-relay-bin.000002

    Testing mysql connection and privileges.. done.

    Testing mysqlbinlog output.. done.

    Cleaning up test file(s).. done.

Sat Apr 13 21:51:15 2019 - [info] Slaves settings check done.

Sat Apr 13 21:51:15 2019 - [info]

server1 (current master)

 +--server2

 +--server3

 +--server4

 

Sat Apr 13 21:51:15 2019 - [info] Checking replication health on server2..

Sat Apr 13 21:51:15 2019 - [info]  ok.

Sat Apr 13 21:51:15 2019 - [info] Checking replication health on server3..

Sat Apr 13 21:51:15 2019 - [info]  ok.

Sat Apr 13 21:51:15 2019 - [info] Checking replication health on server4..

Sat Apr 13 21:51:15 2019 - [info]  ok.

Sat Apr 13 21:51:15 2019 - [info] Checking master_ip_failover_script status:

Sat Apr 13 21:51:15 2019 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=server1 --orig_master_ip=192.168.30.25 --orig_master_port=3306

Checking the Status of the script.. OK

Sat Apr 13 21:51:15 2019 - [info]  OK.

Sat Apr 13 21:51:15 2019 - [warning] shutdown_script is not defined.

Sat Apr 13 21:51:15 2019 - [info] Got exit code 0 (Not master dead).

 

MySQL Replication Health is OK.

实验完成

 

posted @ 2019-04-13 14:02  #赵程#  阅读(548)  评论(0编辑  收藏  举报