keepalived是vrrp协议的实现,原生设计目的是为了高可用ipvs服务,keepalived能够配置文件中的定义生成ipvs规则,并能够对各RS的健康状态进行检测;通过共用的虚拟IP地址对外提供服务;每个热备组内同一时刻只有一台主服务器提供服务,其他服务器处于冗余状态,若当前在线的服务器宕机,其虚拟IP地址将会被其他服务器接替(优先级决定接替顺序),实现高可用为后端主机提供服务。

   二、keepalived组件

   Keepalived组件介绍

wKiom1gVgVqxP0nVAADi2pjRuog301.png

core:keepalived核心组件,主进程的启动和维护,全局配置等。

vrrp stack:keepalived是基于vrrp协议实现高可用vps服务,vrrp则为相关子进程为其提供服务

check:检测keepalived的健康状态相关进程 

system call:系统调用

watch dog:监控check和vrrp进程的看管者,check负责检测器子进程的健康状态,当其检测到master上的服务不可用时则通告vrrp将其转移至backup服务器上。

   三 环境准备

操作系统:centos7.1.1511(core)

数据库: mysql5.7.21 社区版

master1 :10.0.0.11        安装mysql 和keeplived

master2 :10.0.0.12             安装mysql 和keeplived

VIP:10.0.0.20

要实现互为主从,就必须 mster1-->master2设置主从同步 同时 master2--->master1 也设置主从同步

四、Mysql主主同步环境部署

---------------master1服务器操作记录---------------
在my.cnf文件的[mysqld]配置区域添加下面内容:
[root@master1 ~]# vim /usr/local/mysql/my.cnf
server-id = 1         
log-bin = mysql-bin     
sync_binlog = 1
binlog_checksum = none
binlog_format = mixed
auto-increment-increment = 2     
auto-increment-offset = 1    
slave-skip-errors = all      
  
[root@master1 ~]# /etc/init.d/mysql restart
Shutting down MySQL. SUCCESS!
Starting MySQL.. SUCCESS!

创建一个复制用户

出了小问题,由于之前root用户的密码设置过于简单在创建复制用户时报如下错误

mysql> grant replication slave,replication client on *.* to repl@'10.0.0.%' identified by '1qaz@WSX';
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

按照提示将密码设置的复杂一点 在授权创建就没有问题了 

mysql> alter user 'root'@'localhost' identified by '1qaz@WSX';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

 

mysql> grant replication slave,replication client on *.* to repl@'10.0.0.%' identified by '1qaz@WSX';  
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

锁表,待同步配置完成在解锁

mysql> flush tables with read lock;
Query OK, 0 rows affected (0.00 sec)

查看当前的binlog以及数据所在位置

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000006 |      996 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
---------------master2服务器操作记录---------------
在my.cnf文件的[mysqld]配置区域添加下面内容:
[root@master2 ~]# vim /usr/local/mysql/my.cnf
server-id = 2        
log-bin = mysql-bin    
sync_binlog = 1
binlog_checksum = none
binlog_format = mixed
auto-increment-increment = 2     
auto-increment-offset = 2    
slave-skip-errors = all
  
[root@master2 ~]# /etc/init.d/mysql restart
Shutting down MySQL.. SUCCESS!
Starting MySQL.. SUCCESS!
mysql> grant replication slave,replication client on *.* to repl@'10.0.0.%' identified by '1qaz@WSX';  
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

  mysql> flush tables with read lock;
  Query OK, 0 rows affected (0.00 sec)

查看 master情况

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 |      150 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

分别开启同步对方

---------------master1服务器做同步操作---------------
mysql> unlock tables;     //先解锁,将对方数据同步到自己的数据库中
mysql> slave stop;
mysql> change  master to master_host='10.0.0.12',master_user='repl',master_password='1qaz@WSX',master_log_file='mysql-bin.000001',master_log_pos=150;         
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
 
查看两个线程状态是否为YES 
mysql> show slave status \G;

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

-------------master2服务器做同步操作---------------
mysql> unlock tables;     //先解锁,将对方数据同步到自己的数据库中
mysql> slave stop;
mysql> change  master to master_host='10.0.0.11',master_user='repl',master_password='1qaz@WSX',master_log_file='mysql-bin.000006',master_log_pos=996;  
Query OK, 0 rows affected, 2 warnings (0.06 sec)
  
mysql> start slave;
Query OK, 0 rows affected (0.01 sec)
  
mysql> show slave status \G;

Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 996
Relay_Log_File: master2-relay-bin.000002
Relay_Log_Pos: 312
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

以上表明双方已经实现了mysql主主同步。

当运行一段时间后,要是发现同步有问题,比如只能单向同步,双向同步失效。可以重新执行下上面的change master同步操作,只不过这样同步后,只能同步在此之后的更新数据。下面开始进行数据验证:

-----------------主主同步效果验证---------------------
1)在master1数据库上写入新数据
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
  
mysql> create database huanqiu;
Query OK, 1 row affected (0.01 sec)
  
mysql> use huanqiu;
Database changed
  

mysql> create table if not exists haha ( id int(10) PRIMARY KEY AUTO_INCREMENT, name varchar(50) NOT NULL);
Query OK, 0 rows affected (0.04 sec)


mysql> insert into haha values(2,'guojing');
Query OK, 1 row affected (0.00 sec)


mysql> insert into haha values(1,"huangrong");
Query OK, 1 row affected (0.00 sec)


mysql> select * from haha;
+----+-----------+
| id | name |
+----+-----------+
| 1 | huangrong |
| 2 | guojing |
+----+-----------+
2 rows in set (0.00 sec)

  
然后在master2数据库上查看,发现数据已经同步过来了!

mysql> select * from huanqiu.haha;
+----+-----------+
| id | name |
+----+-----------+
| 1 | huangrong |
| 2 | guojing |
+----+-----------+
2 rows in set (0.00 sec)


2)在master2数据库上写入新数据
mysql> create database hehe;
Query OK, 1 row affected (0.00 sec)
  

mysql> insert into huanqiu.haha values(3,"haha"),(4,"haha");
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

  
然后在master1数据库上查看,发现数据也已经同步过来了!
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hehe               |
| huanqiu            |
| mysql              |
| performance_schema |
| test               |
+--------------------+
6 rows in set (0.00 sec)
  

mysql> select * from huanqiu.haha;
+----+-----------+
| id | name |
+----+-----------+
| 1 | huangrong |
| 2 | guojing |
| 3 | haha |
| 4 | haha |
+----+-----------+
4 rows in set (0.00 sec)

  
至此,Mysql主主同步环境已经实现。

五 配置mysql+keepalived 高可用环境

1)安装keepalived并将其配置成系统服务。master1和master2两台机器上同样进行如下操作:
[root@master1 ~]# yum install -y openssl-devel
[root@master1 ~]# cd /usr/local/src/
[root@master1 src]# wget http://www.keepalived.org/software/keepalived-1.3.5.tar.gz
[root@master1 src]# tar -zvxf keepalived-1.3.5.tar.gz
[root@master1 src]# cd keepalived-1.3.5
[root@master1 keepalived-1.3.5]# ./configure --prefix=/usr/local/keepalived
[root@master1 keepalived-1.3.5]# make && make install
     
[root@master1 keepalived-1.3.5]# cp /usr/local/src/keepalived-1.3.5/keepalived/etc/init.d/keepalived /etc/rc.d/init.d/
[root@master1 keepalived-1.3.5]# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
[root@master1 keepalived-1.3.5]# mkdir /etc/keepalived/
[root@master1 keepalived-1.3.5]# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
[root@master1 keepalived-1.3.5]# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
[root@master1 keepalived-1.3.5]# echo "/etc/init.d/keepalived start" >> /etc/rc.local

 

2)master1机器上的keepalived.conf配置。(下面配置中没有使用lvs的负载均衡功能,所以不需要配置虚拟服务器virtual server)

[root@master1 ~]# cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
[root@master1 ~]# vim /etc/keepalived/keepalived.conf       #清空默认内容,直接采用下面配置:
! Configuration File for keepalived
       
global_defs {
notification_email {
ops@wangshibo.cn
tech@wangshibo.cn
}
       
notification_email_from ops@wangshibo.cn
smtp_server 127.0.0.1 
smtp_connect_timeout 30
router_id MASTER-HA
}
       
vrrp_script chk_mysql_port {     #检测mysql服务是否在运行。有很多方式,比如进程,用脚本检测等等
    script "/opt/chk_mysql.sh"   #这里通过脚本监测
    interval 2                   #脚本执行间隔,每2s检测一次
    weight -5                    #脚本结果导致的优先级变更,检测失败(脚本返回非0)则优先级 -5
    fall 2                    #检测连续2次失败才算确定是真失败。会用weight减少优先级(1-255之间)
    rise 1                    #检测1次成功就算成功。但不修改优先级
}
       
vrrp_instance VI_1 {
    state MASTER    
    interface eth0      #指定虚拟ip的网卡接口
    mcast_src_ip 10.0.0.11
    virtual_router_id 51    #路由器标识,MASTER和BACKUP必须是一致的
    priority 101            #定义优先级,数字越大,优先级越高,在同一个vrrp_instance下,MASTER的优先级必须大于BACKUP的优先级。这样MASTER故障恢复后,就可以将VIP资源再次抢回来 
    advert_int 1         
    authentication {   
        auth_type PASS 
        auth_pass 1111     
    }
    virtual_ipaddress {    
        10.0.0.20
    }
      
track_script {               
   chk_mysql_port             
}
}

编写切换脚本。KeepAlived做心跳检测,如果Master的MySQL服务挂了(3306端口挂了),那么它就会选择自杀。Slave的KeepAlived通过心跳检测发现这个情况,就会将VIP的请求接管

[root@master1 ~]# vim /opt/chk_mysql.sh
#!/bin/bash
counter=$(netstat -na|grep "LISTEN"|grep "3306"|wc -l)
if [ "${counter}" -eq 0 ]; then
    /etc/init.d/keepalived stop
fi
[root@master1 ~]# chmod 755 /opt/chk_mysql.sh
     
启动keepalived服务
[root@master1 ~]# /etc/init.d/keepalived start
正在启动 keepalived:                                      [确定]

4)master2机器上的keepalived配置。master2机器上的keepalived.conf文件只修改priority为90、nopreempt不设置、real_server设置本地IP。

[root@master2 ~]# cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
[root@master2 ~]# >/etc/keepalived/keepalived.conf
[root@master2 ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
       
global_defs {
notification_email {
ops@qq.com
tech@qq.com
}
       
notification_email_from ops@wangshibo.cn
smtp_server 127.0.0.1 
smtp_connect_timeout 30
router_id MASTER-HA
}
       
vrrp_script chk_mysql_port {
    script "/opt/chk_mysql.sh"
    interval 2            
    weight -5                 
    fall 2                 
    rise 1               
}
       
vrrp_instance VI_1 {
    state BACKUP
    interface eth0    
    mcast_src_ip 10.0.0.12
    virtual_router_id 51    
    priority 99          
    advert_int 1         
    authentication {   
        auth_type PASS 
        auth_pass 1111     
    }
    virtual_ipaddress {    
        10.0.0.20
    }
      
track_script {               
   chk_mysql_port             
}
}
     
     
[root@master2 ~]# cat /opt/chk_mysql.sh
#!/bin/bash
counter=$(netstat -na|grep "LISTEN"|grep "3306"|wc -l)
if [ "${counter}" -eq 0 ]; then
    /etc/init.d/keepalived stop
fi
 
[root@master2 ~]# chmod 755 /opt/chk_mysql.sh
     
[root@master2 ~]# /etc/init.d/keepalived start
正在启动 keepalived:                                      [确定]

我这里启动时出现了问题,分析日志

tail -f /var/log/message
Mar 31 14:28:14 master1 systemd: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway.

查看keepalived.service

# vi /lib/systemd/system/keepalived.service  

[Unit]
Description=LVS and VRRP High Availability Monitor
After=syslog.target network-online.target

[Service]
Type=forking
# PIDFile=/usr/local/keepalived/var/run/keepalived.pid 

# 上面这个注释掉 改成下面 因为这个默认路径不存在所以就无法写入进程ID文件

PIDFile=/var/run/keepalived.pid 
KillMode=process
EnvironmentFile=-/usr/local/keepalived/etc/sysconfig/keepalived
ExecStart=/usr/local/keepalived/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

5)master1和master2两台服务器都要授权允许root用户远程登录,用于在客户端登陆测试!

mysql> grant all on *.* to root@'10.0.0.%' identified by "1qaz@WSX";
Query OK, 0 rows affected (0.00 sec)
     
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
   
6)在master1和master2两台机器上设置iptables防火墙规则,如下:
[root@master1 ~]# cat /etc/sysconfig/iptables
........
-A INPUT -s 10.0.0.0/24 -d 224.0.0.18 -j ACCEPT       #允许组播地址通信
-A INPUT -s 10.0.0.0/24 -p vrrp -j ACCEPT             #允许VRRP(虚拟路由器冗余协)通信
-A INPUT -m state --state NEW -m tcp -p tcp --dport 3306 -j ACCEPT    #开放mysql的3306端口
   
[root@master1 ~]# /etc/init.d/iptables restart

 

 六 Mysql+keepalived故障转移的高可用测试

 1)通过Mysql客户端通过VIP连接,看是否连接成功。

比如,在远程一台测试机上连接,通过vip地址可以正常连接(下面的连接权限要是在服务端提前授权的)
[root@master1 ~]# mysql -uroot -p1qaz@WSX -h10.0.0.20
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.7.21-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


mysql> select * from huanqiu.haha;
+----+-----------+
| id | name |
+----+-----------+
| 1 | huangrong |
| 2 | guojing |
| 3 | haha |
| 4 | haha |
+----+-----------+
4 rows in set (0.02 sec)

2)默认情况下,vip是在master1上的。使用"ip addr"命令查看vip切换情况 

[root@master1 ~]# ip addr |grep 10.0                 
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.11/8 brd 10.255.255.255 scope global eth0
    inet 10.0.0.20/32 scope global eth0

[root@master2 ~]# ip addr |grep 10.0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 10.0.0.12/8 brd 10.255.255.255 scope global eth0

停止master1机器上的mysql服务,根据配置中的脚本,mysql服务停了,keepalived也会停,从而vip资源将会切换到master2机器上。(mysql服务没有起来的时候,keepalived服务也无法顺利启动!)

[root@master1 ~]# systemctl stop mysqld
[root@master1 ~]# ps -ef|grep mysql
root       4431   2423  0 15:08 pts/0    00:00:00 grep --color=auto mysql
[root@master1 ~]# ps -ef|grep keepalived
root       4433   2423  0 15:08 pts/0    00:00:00 grep --color=auto keepalived
[root@master1 ~]# ip addr |grep 10.0                 
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.11/8 brd 10.255.255.255 scope global eth0

 

 

查看master2主机

[root@master2 ~]# ip addr |grep 10.0   
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.12/8 brd 10.255.255.255 scope global eth0
    inet 10.0.0.20/32 scope global eth0
3)再次启动master1的mysql和keepalived服务。(注意:如果restart重启mysql,那么还要启动下keepalived,因为mysql重启,根据脚本会造成keepalived关闭)
注意:一定要先启动mysql服务,然后再启动keepalived服务。如果先启动keepalived服务,按照上面的配置,mysql没有起来,就会自动关闭keepalived。
[root@master1 ~]# systemctl start mysqld
[root@master1 ~]# /etc/init.d/keepalived start
Starting keepalived (via systemctl):                       [  OK  ]
[root@master1 ~]# ip addr |grep 10.0          
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 10.0.0.11/8 brd 10.255.255.255 scope global eth0
    inet 10.0.0.20/32 scope global eth0

[root@master2 ~]# ip addr |grep 10.0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 10.0.0.12/8 brd 10.255.255.255 scope global eth0

此时虚拟ip又回到了 master1 主机上了

 以上在vip资源切换过程中,对于客户端连接mysql(使用vip连接)来说几乎是没有任何影响的。

 

---------------------------------温馨提示(Keepalived的抢占和非抢占模式)---------------------------------------

keepalive是基于vrrp协议在linux主机上以守护进程方式,根据配置文件实现健康检查。

VRRP是一种选择协议,它可以把一个虚拟路由器的责任动态分配到局域网上的VRRP路由器中的一台。

控制虚拟路由器IP地址的VRRP路由器称为主路由器,它负责转发数据包到这些虚拟IP地址。

一旦主路由器不可用,这种选择过程就提供了动态的故障转移机制,这就允许虚拟路由器的IP地址可以作为终端主机的默认第一跳路由器。

 

keepalive通过组播,单播等方式(自定义),实现keepalive主备推选。工作模式分为抢占和非抢占(通过参数nopreempt来控制)。

1)抢占模式:

主服务正常工作时,虚拟IP会在主上,备不提供服务,当主服务优先级低于备的时候,备会自动抢占虚拟IP,这时,主不提供服务,备提供服务。

也就是说,工作在抢占模式下,不分主备,只管优先级。

 

如上配置,不管keepalived.conf里的state配置成master还是backup,只看谁的priority优先级高(一般而言,state为MASTER的优先级要高于BACKUP)。

priority优先级高的那一个在故障恢复后,会自动将VIP资源再次抢占回来!!

 

2)非抢占模式:

这种方式通过参数nopreempt(一般设置在advert_int的那一行下面)来控制。不管priority优先级,只要MASTER机器发生故障,VIP资源就会被切换到BACKUP上。

并且当MASTER机器恢复后,也不会去将VIP资源抢占回来,直至BACKUP机器发生故障时,才能自动切换回来。

 

千万注意:

nopreempt这个参数只能用于state为backup的情况,所以在配置的时候要把master和backup的state都设置成backup,这样才会实现keepalived的非抢占模式!

 

也就是说:

a)当state状态一个为master,一个为backup的时候,加不加nopreempt这个参数都是一样的效果。即都是根据priority优先级来决定谁抢占vip资源的,是抢占模式!

b)当state状态都设置成backup,如果不配置nopreempt参数,那么也是看priority优先级决定谁抢占vip资源,即也是抢占模式。

c)当state状态都设置成backup,如果配置nopreempt参数,那么就不会去考虑priority优先级了,是非抢占模式!即只有vip当前所在机器发生故障,另一台机器才能接管vip。即使优先级高的那一台机器恢复  后也不会主动抢回vip,只能等到对方发生故障,才会将vip切回来。

 

---------------------------------mysql状态检测脚本优化---------------------------------

上面的mysql监测脚本有点过于简单且粗暴,即脚本一旦监测到Master的mysql服务关闭,就立刻把keepalived服务关闭,从而实现vip转移!

下面对该脚本进行优化,优化后,当监测到Master的mysql服务关闭后,就会将vip切换到Backup上(但此时Master的keepalived服务不会被暴力kill
当Master的mysql服务恢复后,就会再次将VIP资源切回来!
[root@master ~]# cat /opt/chk_mysql.sh
#!/bin/bash
MYSQL=/usr/bin/mysql
MYSQL_HOST=localhost
MYSQL_USER=root
MYSQL_PASSWORD=1qaz@WSX
CHECK_TIME=3
  
#mysql  is working MYSQL_OK is 1 , mysql down MYSQL_OK is 0
  
MYSQL_OK=1
  
function check_mysql_helth (){
    $MYSQL -h $MYSQL_HOST -u $MYSQL_USER -p${MYSQL_PASSWORD} -e "show status;" >/dev/null 2>&1
    if [ $? = 0 ] ;then
    MYSQL_OK=1
    else
    MYSQL_OK=0
    fi
    return $MYSQL_OK
}
while [ $CHECK_TIME -ne 0 ]
do
    let "CHECK_TIME -= 1"
    check_mysql_helth
if [ $MYSQL_OK = 1 ] ; then
    CHECK_TIME=0
    exit 0
fi
if [ $MYSQL_OK -eq 0 ] &&  [ $CHECK_TIME -eq 0 ]
then
    pkill keepalived
    exit 1
fi
sleep 1
done