Keepalived 高可用
Keepalived 高可用
什么是高可用
一般是指2台机器启动着完全相同的业务系统,当有一台机器down机了,另外一台服务器就能快速的接管,对于访问的用户是无感知的。
高可用通常使用的软件
- keepalived
- heartbeat
- RoseHA
keepalived是如何实现高可用的?
keepalived软件是基于VRRP协议实现的,VRRP虚拟路由冗余协议,主要用于解决单点故障问题
VRRP是如何诞生的,原理又是什么?
比如公司的网络是通过网关进行上网的,那么如果该路由器故障了,网关无法转发报文了,此时所有人都无法上网了,怎么办?
通常做法是给路由器增加一台备节点
keepalived协议
我们的VRRP其实是通过软件或者硬件的形式在Master和Backup外面增加一个虚拟的MAC地址(VMAC)与虚拟IP地址(VIP),那么在这种情况下,PC请求VIP的时候,无论是Master处理还是Backup处理,PC仅会在ARP缓存表中记录VMAC与VIP的信息
高可用keepalived核心概念
-
优先级
- 如何确定谁是主节点谁是背节点
-
抢占试、非抢占式
- 如果Master故障,Backup自动接管,那么Master回复后会夺权吗
-
脑裂
- 如果两台服务器都认为自己是Master会出现什么问题
部署keepalived高可用软件
环境准备
主机 | 角色 | 外网IP | 内网IP | 安装软件 |
---|---|---|---|---|
lb01 | 主节点 | 10.0.0.5 | 172.16.1.5 | nginx,keepalived |
lb02 | 备节点 | 10.0.0.6 | 172.16.1.6 | nginx,keepalived |
VIP | 虚拟IP | 10.0.0.3 |
keepalived工作原理
1.哪些机器需要做高可用,就在哪些机器上安装keepalived
2.keepalived的主节点会心跳检测(想要证明主机或服务是否存活)
3.如果心跳检测失败,就杀掉自己(keepalived)
4.VIP到备节点(主节点和备节点互相检测)
部署keepalived
# 1.安装keepalived
[root@lb01 ~]$ yum install -y keepalived
[root@lb02 ~]$ yum install -y keepalived
# 2.修改主配置文件
## 配置master
[root@lb01 ~]$ vim /etc/keepalived/keepalived.conf
global_defs { #全局配置
router_id lb01 #标识身份->名称随意起
}
vrrp_instance VI_1 { #VI_1就是集群的名字,名称一样就是在一个集群
state MASTER #标识角色状态
interface eth0 #网卡绑定接口
virtual_router_id 50 #虚拟路由id
priority 150 #优先级,主节点优先级更高
advert_int 1 #监测间隔时间
authentication { #认证
auth_type PASS #认证方式
auth_pass 1111 #认证密码
}
virtual_ipaddress {
10.0.0.3 #虚拟的VIP地址
}
}
## 配置backup
[root@lb02 ~]$ vim /etc/keepalived/keepalived.conf
global_defs { #全局配置
router_id lb02 #标识身份->名称随意起
}
vrrp_instance VI_1 { #VI_1就是集群的名字,名称一样就是在一个集群
state BACKUP #标识角色状态
interface eth0 #网卡绑定接口
virtual_router_id 50 #虚拟路由id
priority 100 #优先级
advert_int 1 #健康检测间隔时间
authentication { #认证
auth_type PASS #认证方式
auth_pass 1111 #认证密码
}
virtual_ipaddress {
10.0.0.3 #虚拟的VIP地址
}
}
# 3.启动服务并加入开机自启
[root@lb01 ~]$ systemctl start keepalived.service
[root@lb01 ~]$ systemctl enable keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[root@lb02 ~]$ systemctl start keepalived.service
[root@lb02 ~]$ systemctl enable keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
# 4.查看IP
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
# 虚拟IP在主节点
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4c:57c0/64 scope link
valid_lft forever preferred_lft forever
# 5.测试IP漂移,关闭主节点keepalived服务
[root@lb01 ~]$ systemctl stop keepalived.service
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
# 虚拟IP漂移到了备节点
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4c:57c0/64 scope link
valid_lft forever preferred_lft forever
主节点和备节点配置文件区别
Keepalived配置区别 | Master节点配置 | Backup节点配置 |
---|---|---|
route_id(唯一标识) | router_id lb01 | router_id lb02 |
state(角色状态) | state MASTER | state BACKUP |
priority(竞选优先级) | priority 150 | priority 100 |
高可用keepalived抢占式与非抢占式
- 抢占式:master故障后,VIP漂移至backup,master故障恢复后VIP恢复回来,即抢占
- 非抢占式:master故障后,VIP漂移至backup,master故障恢复后VIP依然在备节点
keepalived配置文件中:角色状态和优先级已经定义了master为主节点,优先级更高
企业中一般不会使用非抢占式
# 虚拟IP在主节点中
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
# 关闭keepalived服务后虚拟IP会漂移至备节点
[root@lb01 ~]$ systemctl stop keepalived.service
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4c:57c0/64 scope link
valid_lft forever preferred_lft forever
# 启动主节点keepalived服务(故障恢复)
[root@lb01 ~]$ systemctl start keepalived.service
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
# VIP会回到主节点(抢占式)
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
配置非抢占式
1、两个节点的state都必须配置为BACKUP,如果都用MASTER会造成脑裂
2、两个节点都必须加上配置 nopreempt
3、其中一个节点的优先级必须要高于另外一个节点的优先级。
两台服务器都角色状态启用nopreempt后,必须修改角色状态统一为BACKUP,唯一的区分就是优先级。
Master配置
vrrp_instance VI_1 {
state BACKUP
priority 150
nopreempt
}
Backup配置
vrrp_instance VI_1 {
state BACKUP
priority 100
nopreempt
}
# 1.配置完成后关闭原主节点keepalived服务,VIP漂移至原备节点
[root@lb01 ~]$ systemctl stop keepalived.service
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
# 2.再次启动原主节点keepalived服务,VIP不会再被抢占,依然留在原备节点
[root@lb01 ~]$ systemctl start keepalived.service
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
高可用keepalived与nginx结合
为什么域名解析到VIP就可以访问nginx?
Nginx默认监听在所有的IP地址上,VIP会飘到一台节点上,相当于那台nginx多了VIP这么一个网卡,所以可以访问到nginx所在机器
我们的lb01和lb02两台代理服务器代理了后端的web服务器,虽然算是有高可用,但是我们的代理服务器其中一台如果宕机我们只能手动切换本地域名解析到另一台,就很麻烦。但是如果我们利用虚拟IP绑定域名,这样一台代理服务器宕机虚拟IP自动漂移到另一台代理服务器,那么不需要人工操作即能实现真正的代理服务器高可用
但是如果nginx宕机,会导致用户请求失败,但是keepalived没有挂掉不会进行切换,所以需要编写一个脚本检测Nginx的存活状态,如果不存活则kill掉keepalived
1.编写nginx检测脚本
[root@lb01 ~]$ vim /root/check.sh
#!/bin/sh
# 定义变量:nginx进程数
nginx_count=$(ps -ef|grep [n]ginx|wc -l)
# 判断:nginx进程数等于0的时候(nginx服务停止时)
if [ $nginx_count -eq 0 ];then
# 关闭keepalived服务,以实现VIP漂移至备节点
systemctl stop keepalived
fi
# 给脚本文件赋予执行权限
[root@lb01 ~]$ chmod +x /root/check.sh
PS:脚本名称尽量不要写nginx,如果写了,执行脚本时会有名称中含有nginx的脚本程序存在,将无法判断nginx进程数为0;所以我们编写的脚本如果跟进程数有关,千万不能用包含进程名称的脚本名
2.在主节点的配置文件中调用脚本文件
[root@lb01 ~]$ cat /etc/keepalived/keepalived.conf
global_defs { #全局配置
router_id lb01 #标识身份->名称
}
vrrp_script check_web {
# 脚本路径
script "/root/check.sh"
# 检测时间(每5秒执行一次检测脚本)
interval 5
}
vrrp_instance VI_1 {
state MASTER #标识角色状态
interface eth0 #网卡绑定接口
virtual_router_id 50 #虚拟路由id
priority 150 #优先级
advert_int 1 #监测间隔时间
authentication { #认证
auth_type PASS #认证方式
auth_pass 1111 #认证密码
}
# 调用脚本的模块要在vrrp_instance层中
track_script {
check_web
}
virtual_ipaddress {
10.0.0.3 #虚拟的VIP地址
}
}
PS:在Master的keepalived中调用脚本,抢占式,仅需在master配置即可。(注意,如果配置为非抢占式,那么需要两台服务器都使用该脚本)
测试nginx服务停止
[root@lb01 ~]$ systemctl stop nginx.service
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
# VIP漂移
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4c:57c0/64 scope link
valid_lft forever preferred_lft forever
# 启动nginx和keepalived服务后VIP会回到主节点
[root@lb01 ~]$ systemctl start nginx.service
[root@lb01 ~]$ systemctl start keepalived.service
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
高可用keepalived故障脑裂
由于某些原因,导致两台keepalived高可用服务器在指定时间内,无法检测到对方的心跳,而此时的两台高可用服务器又都还活着,但已经互以为对方keepalived服务已停机,就都占用了VIP
脑裂故障原因
1、服务器网线松动等网络故障
2、服务器硬件故障发生损坏现象而崩溃
3、主备都开启firewalld防火墙
解决脑裂故障方案
#如果发生脑裂,则随机kill掉一台即可
#在备节点上编写检测脚本, 测试如果能ping通主并且备节点还有VIP的话则认为产生了脑裂
[root@lb02 ~]$ vim check_split_brain.sh
#!/bin/sh
# 定义虚拟IP
vip=10.0.0.3
# 定义检测的服务器
lb01_ip=10.0.0.5
while true;do
#ping另一台服务器,-c 2是两段信息
ping -c 2 $lb01_ip &>/dev/null
if [ $? -eq 0 -a `ip add|grep "$vip"|wc -l` -eq 1 ];then
echo "ha is split brain.warning."
else
echo "ha is ok"
fi
sleep 5
done
利用keepalived结合nginx实现负载均衡的高可用
1.编写nginx检测脚本
[root@lb01 ~]$ vim /root/check.sh
#!/bin/sh
# 定义变量:nginx进程数
nginx_count=$(ps -ef|grep [n]ginx|wc -l)
# 判断:nginx进程数等于0的时候(nginx服务停止时)
if [ $nginx_count -eq 0 ];then
# 关闭keepalived服务,以实现VIP漂移至备节点
systemctl stop keepalived
fi
# 给脚本文件赋予执行权限
[root@lb01 ~]$ chmod +x /root/check.sh
2.配置web站点文件
[root@web01 ~]$ vim /etc/nginx/conf.d/blog.wj.com.conf
server {
listen 8000;
server_name blog.wj.com;
root /code/wordpress;
location / {
index index.php index.html;
if ( -f $request_filename/index.html ){
rewrite (.*) $1/index.html break;
}
if ( -f $request_filename/index.php ){
rewrite (.*) $1/index.php;
}
if ( !-f $request_filename ){
rewrite (.*) /index.php;
}
if ($http_user_agent ~* "Wget|ApacheBench|webBench|isouSpider|MJ12bot|YoudaoBot|Tomato|bingbot/2.0|com
patible"){
set $block_user_agent 1;
}
if ($block_user_agent = 1){
return 403;
}
}
location ~ \.php$ {
fastcgi_pass unix:/opt/php71w.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include /etc/nginx/fastcgi_params;
fastcgi_param HTTPS on;
}
error_page 404 /404.php ;
}
--------------------------------------------------------------------------------------
[root@web02 ~]$ vim /etc/nginx/conf.d/blog.wj.com.conf
server {
listen 8000;
server_name blog.wj.com;
root /code/wordpress;
location / {
index index.php index.html;
if ( -f $request_filename/index.html ){
rewrite (.*) $1/index.html break;
}
if ( -f $request_filename/index.php ){
rewrite (.*) $1/index.php;
}
if ( !-f $request_filename ){
rewrite (.*) /index.php;
}
if ($http_user_agent ~* "Wget|ApacheBench|webBench|isouSpider|MJ12bot|YoudaoBot|Tomato|bingbot/2.0|com
patible"){
set $block_user_agent 1;
}
if ($block_user_agent = 1){
return 403;
}
}
location ~ \.php$ {
fastcgi_pass unix:/opt/php71w.sock;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include /etc/nginx/fastcgi_params;
fastcgi_param HTTPS on;
}
error_page 404 /404.php ;
}
3.配置负载均衡
[root@lb01 /etc/nginx/ssl]$ vim /etc/nginx/conf.d/blog_upstream.conf
upstream blog_wj_com {
server 172.16.1.7:8000;
server 172.16.1.8:8000;
}
server {
listen 8000;
server_name _;
rewrite (.*) https://blog.wj.com redirect;
}
server {
listen 80;
server_name blog.wj.com;
rewrite (.*) https://$server_name$request_uri redirect;
}
server {
listen 443 ssl;
server_name blog.wj.com;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1440m;
ssl_ciphers ECDHE-RSA-AES128-GCMSHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
----------------------------------------------------------------------------------------------------
[root@lb02 /etc/nginx/ssl]$ vim /etc/nginx/conf.d/blog_upstream.conf
upstream blog_wj_com {
server 172.16.1.7:8000;
server 172.16.1.8:8000;
}
server {
listen 8000;
server_name _;
rewrite (.*) https://blog.wj.com redirect;
}
server {
listen 80;
server_name blog.wj.com;
rewrite (.*) https://$server_name$request_uri redirect;
}
server {
listen 443 ssl;
server_name blog.wj.com;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 1440m;
ssl_ciphers ECDHE-RSA-AES128-GCMSHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
4.在主节点的keepalived配置文件中调用脚本文件
[root@lb01 ~]$ vim /etc/keepalived/keepalived.conf
global_defs { #全局配置
router_id lb01 #标识身份->名称
}
vrrp_script check_web {
# 脚本路径
script "/root/check.sh"
# 检测时间(每5秒执行一次检测脚本)
interval 5
}
vrrp_instance VI_1 {
state MASTER #标识角色状态
interface eth0 #网卡绑定接口
virtual_router_id 50 #虚拟路由id
priority 150 #优先级
advert_int 1 #监测间隔时间
authentication { #认证
auth_type PASS #认证方式
auth_pass 1111 #认证密码
}
# 调用脚本的模块要在vrrp_instance层中
track_script {
check_web
}
virtual_ipaddress {
10.0.0.3 #虚拟的VIP地址
}
}
5.在本地做域名解析
10.0.0.3 blog.wj.com
6.关闭主节点nginx
[root@lb01 ~]$ systemctl stop nginx.service
[root@lb01 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:5d:04:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.5/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe5d:4c0/64 scope link
valid_lft forever preferred_lft forever
# 虚拟IP漂移
[root@lb02 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4c:57:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.6/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.3/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4c:57c0/64 scope link
valid_lft forever preferred_lft forever
访问blog.wj.com