Nginx入门篇(七)之Nginx+keepalived高可用集群
-
一、keepalived介绍
keepalived软件最开始是转为负载均衡软件LVS而设计,用来管理和监控LVS集群系统中各个服务节点的状态,后来又加入了可实现高可用的VRRP功能。所以Keepalived除了能管理LVS以外,还可以作为其他服务(如:Nginx、Haproxy、MySQL)的高可用解决方案的软件。Keepalived是类似工作在lay3、lay4和lay7的交换机制的软件。
Keepalived软件是通过VRRP协议实现高可用功能。VRRP(虚拟路由器冗余协议)目的就是为了解决静态路由单点故障的问题,它能够保证当个别节点宕机时,整个网络还可以正常地运行。所以Keepalived一方面有配置管理LVS的功能,还可以对LVS下面的节点进行健康监测,另一方面又可以实现系统网络服务的高可用功能。
-
二、Keepalived的三个功能
1、管理LVS负载均衡软件
2、实现对LVS集群节点健康检查功能
3、作为系统网络服务的高可用功能(重点)
Keepalived的作用是检测Web服务器的状态,如果有1台web或MySQL服务器宕机或故障,Keepalived检测到后,会将故障的Web服务器或MySQL服务器从集群当中剔除,而当服务器恢复正常后,Keepalived会自动将剔除的服务器重新加入到集群当中,这些工作无需人工参与,需要人工参与的是服务器故障的修复。
-
三、Keepalived的工作原理
Keepalived高可用之间是通过VRRP进行通信的。那什么是VRRP协议呢?
(1)VRRP,全称Virtual Router Redundancy Protocol,中文为虚拟路由冗余协议,VRRP的出现是为了解决静态路由的单点故障。
(2)VRRP是通过一种竞选协议机制来决定将路由任务交给某台VRRP路由器的。
(3)VRRP用IP多播的方式(默认多播地址:224.0.0.18)实现高可用对之通信。
(4)工作做时,主节点发包,备用节点接包,当备用节点接收不到主节点发送的数据包时,就会启动接管程序接管主节点的资源。备用节点可以有多个,通过优先级竞选,但一般的Keepalived系统运行工作中都是一对。
(5)VRRP使用了加密协议加密数据,但是目录官方还是推荐以明文的方式配置认证类型和密码。
明确了VRRP协议,再看Keepalived工作原理:
Keepalived高可用对之间是通过VRRP进行通信,VRRP通过竞选机制来确定主备,主的优先级高于备,因此工作时,主会优先获得所有资源,备节点处于等待状态,当主宕机后,备用节点则会接管主节点资源,然后顶替主节点对外提供服务。
在Keepalived服务对之间,只有作为主的服务器会一直发送VRRP广播包,告诉备用节点主节点还活着,此时备用节点不会抢占主,当主不可用时,即备监听不到主发送的广播包时,就会启动相关的服务接管资源,保证业务的连续性,接管速度最快可以小于1秒。
-
四、Keepalived高可用服务部署
1、环境说明
Hostname | IP | 角色说明 |
lb01 | 192.168.56.12 | keepalived MASTER |
lb02 | 192.168.56.13 | keepalived BACKUP |
2、部署Keepalived
(1)安装keepalived
[root@lb01 ~]# yum install -y keepalived [root@lb02 ~]# yum install -y keepalived [root@lb01 ~]# rpm -qa keepalived keepalived-1.3.5-6.el7.x86_64 [root@lb02 ~]# rpm -qa keepalived keepalived-1.3.5-6.el7.x86_64
(2)keepalived.conf配置文件高可用部分解析
[root@lb01 ~]# cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { #定义服务故障报警的E-mail地址,可配多个地址,可选配置 notification_email { acassen@firewall.loc failover@firewall.loc sysadmin@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc #指定发送邮件的发件人,即发送人地址,可选配置 smtp_server 192.168.200.1 #指定发送邮件的smtp服务器,本机开启了sendmail或postfix就可以使用上面的默认地址发送邮件,可选配置 smtp_connect_timeout 30 #链接smtp超时时间,可选配置 router_id LVS_DEVEL #Keepalived服务器的路由标识,在同一局域网内该标识具有唯一性 vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 } vrrp_instance VI_1 { #VRRP实例定义区块,定义了一个VI_1的实例,每个vrrp_instance实例可以认为是Keepalived服务的一个实例或作为一个业务服务,在主节点中有的vrrp_instance实例,备用节点也要存在,这样故障才能接管。 state MASTER #定义Keepalived的主备状态,只能有MASTER和BACKUP两种状态,并且状态字符要大写 interface eth0 #定义Keepalived使用的网卡接口 virtual_router_id 51 #虚拟路由ID标识,这个标识最好是一个数字,并且唯一。MASTER和BACKUP配置中相同实例的这个id必须一致,否则会脑裂。 priority 100 #优先级配置,数值越大,实例优先级越高,建议MASTER和BACKUP相差50以上为佳。 advert_int 1 #同步通知间隔,也就是MASTER和BACKUP之间通信检查时间间隔,单位为秒,默认为1. authentication { #权限认证配置 auth_type PASS #认证类型有PASS、AH2中,官方推荐PASS,不超过8个字符,同一实例MASTER和BACKUP使用相同密码才能正常通信。 auth_pass 1111 #认证密码 } virtual_ipaddress { #虚拟IP地址,可以配置多个IP地址,每个一行,配置时最好明确指定子网掩码和虚拟IP绑定的网络接口。 192.168.200.16 192.168.200.17 192.168.200.18 } }
3、Keepalived高可用服务单实例演示
(1)配置Keepalived主服务器lb01 MASTER
[root@lb01 keepalived]# cp keepalived.conf keepalived.conf.bak [root@lb01 keepalived]# vim keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } [root@lb01 keepalived]# systemctl start keepalived //配置完成启动keepalived [root@lb01 keepalived]# ip addr |grep 192.168.56.20 //查看是否有配置的虚拟IP:192.168.56.20 inet 192.168.56.20/24 scope global secondary eth0:1
(2)配置Keepalived备服务器lb02 BACKUP
[root@lb02 keepalived]# cp keepalived.conf keepalived.conf.bak [root@lb02 keepalived]# vim keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } [root@lb02 keepalived]# systemctl start keepalived //配置完成启动keepalived [root@lb02 keepalived]# ip addr |grep 192.168.56.20 //查看是否有配置的虚拟IP:192.168.56.20,备用服务器查看是不存在虚拟IP的,如果有返回结果,说明脑裂了
(3)高可用主备服务器切换测试
(1)停止主上的keepalived服务,查看lb01和lb02的虚拟ip [root@lb01 keepalived]# systemctl stop keepalived //停止主上的keepalived服务 [root@lb01 keepalived]# ip addr |grep 192.168.56.20 //lb01上停止keepalived后,查看lb01上是不存在虚拟ip:192.168.56.20 [root@lb02 keepalived]# ip addr |grep 192.168.56.20 //lb02上可以看到虚拟ip:192.168.56.20,实现了VIP漂移 inet 192.168.56.20/24 scope global secondary eth0:1 (2)重新启动主上的keepalived服务,查看lb01和lb02的虚拟ip [root@lb01 keepalived]# systemctl start keepalived //重新启动lb01上的keepalived [root@lb01 keepalived]# ip addr |grep 192.168.56.20 //可以看到虚拟ip又重新回到了lb01上 inet 192.168.56.20/24 scope global secondary eth0:1 [root@lb02 keepalived]# ip addr |grep 192.168.56.20 //lb02上再查询虚拟ip信息是不存在虚拟ip的
4、Keepalived双实例双主模式演示
(1)修改lb01和lb02的主配置文件,增加一个实例vrrp_VI2
[root@lb01 keepalived]# cat keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } vrrp_instance VI_2 { //增加一个vrrp实例VI2 state BACKUP interface eth0 virtual_router_id 56 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.30/24 dev eth0 label eth0:2 //虚拟ip为192.168.56.30 } } [root@lb02 keepalived]# cat keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } vrrp_instance VI_2 { //增加一个vrrp实例VI2 state MASTER interface eth0 virtual_router_id 56 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.30/24 dev eth0 label eth0:2 //虚拟ip为192.168.56.30 } }
(2)在lb01和lb02上分别重启Keepalived服务,观察初始VIP设置情况
[root@lb01 keepalived]# systemctl restart keepalived [root@lb01 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" inet 192.168.56.20/24 scope global secondary eth0:1 [root@lb02 keepalived]# systemctl restart keepalived [root@lb02 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" inet 192.168.56.30/24 scope global secondary eth0:2
启动lb01的Keepalived服务后,初始状态启动了192.168.56.20这个VIP地址,即由VI_1实例配置的VIP对外提供服务。
启动lb02的Keepalived服务后,初始状态启动了192.168.56.30这个VIP地址,即由VI_2实例配置的VIP对外提供服务。
(3)高可用故障切换测试
[root@lb01 keepalived]# systemctl stop keepalived //停止lb01的keepalived服务 [root@lb01 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" //在lb01上是无法查看到vip [root@lb02 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" //在lb02上是可以查看到2个vip地址的 inet 192.168.56.30/24 scope global secondary eth0:2 inet 192.168.56.20/24 scope global secondary eth0:1 [root@lb01 keepalived]# systemctl start keepalived //重新启动lb01上的keepalived服务 [root@lb01 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" //可以看到vip地址192.168.56.20飘移回来了 inet 192.168.56.20/24 scope global secondary eth0:1 同理测试停止lb02上的keepalived服务查看vip信息 [root@lb02 keepalived]# systemctl stop keepalived [root@lb02 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" [root@lb01 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" inet 192.168.56.20/24 scope global secondary eth0:1 inet 192.168.56.30/24 scope global secondary eth0:2 [root@lb02 keepalived]# systemctl start keepalived [root@lb02 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" inet 192.168.56.30/24 scope global secondary eth0:2 [root@lb01 keepalived]# ip addr |egrep "192.168.56.20|192.168.56.30" inet 192.168.56.20/24 scope global secondary eth0:1
-
五、Nginx负载均衡配置Keepalived服务
1、环境说明:
Hostname | IP | 角色说明 |
lb01 | 192.168.56.12 | Nginx+Keepalived(MASTER) |
lb02 | 192.168.56.13 | Nginx+Keepalived(BACKUP) |
web01 | 192.168.56.11 | web01服务-->Nginx |
web02 | 192.168.0.130 | web02服务-->Nginx |
2、配置web01和web02
[root@web01 vhosts]# cat www.abc.org.conf server { listen 80; server_name 192.168.56.11; root /vhosts/html/www; index index.html index.htm index.php; } [root@web02 vhosts]# cat www.abc.org.conf server { listen 8080; server_name 192.168.0.130; root /vhosts/html/www; index index.html index.htm index.php; } 测试web01和web02的主页,进行区分 [root@localhost vhosts]# curl 192.168.56.11 welcome to 192.168.56.11 [root@localhost vhosts]# curl 192.168.0.130:8080 welcome to use 192.168.0.130
3、在lb01和lb02上配置Nginx负载均衡
[root@lb01 keepalived]# cat /etc/nginx/nginx.conf user nginx; worker_processes auto; error_log /var/log/nginx/error.log; pid /run/nginx.pid; include /usr/share/nginx/modules/*.conf; events { worker_connections 1024; } http { log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; include /etc/nginx/mime.types; default_type application/octet-stream; include /etc/nginx/conf.d/*.conf; upstream web_server_pool { server 192.168.56.11:80 weight=1; server 192.168.0.130:8080 weight=1; } server { listen 80; server_name 192.168.56.20; //此处的server_name需要配置VIP的地址 location / { proxy_pass http://web_server_pool; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $remote_addr; } } }
4、在lb01和lb02上配置Keepalived服务
[root@lb01 keepalived]# cat keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } [root@lb02 keepalived]# cat keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb02 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 55 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 dev eth0 label eth0:1 } } 注意lb01和lb02中Keepalived配置的不同之处
5、访问测试
直接访问:http://192.168.56.20,可以看到刷新页面,分别得到不同的结果,说明Nginx的负载均衡功能实现了,如图:
再停止lb01上的keepalived,再查看是否能够保持访问
[root@lb01 keepalived]# systemctl stop keepalived [root@lb01 keepalived]# ip addr |grep "192.168.56.20" [root@lb02 keepalived]# ip addr |grep "192.168.56.20" //可以看到停止lb01上的keepalived后,vip在lb02上 inet 192.168.56.20/24 scope global secondary eth0:1
再进行访问:http://192.168.56.20,一样可以保持访问结果,这就实现了Keepalived的高可用功能,如图:
6、解决Nginx监控检查的问题
按照前面的操作,顺利地实现了Nginx的反向代理和负载均衡,也实现了Keepalived的高可用功能,在默认情况下,Keepalived仅仅在对方机器宕机或者Keepalived服务停止时才会接管也业务,而在实际工作当中,也会有其中一台负载均衡器的Nginx宕机了,而Keepalived服务还在运行,这就会导致用户访问的VIP:192.168.56.20无法找到对应的服务。尝试把lb01的Nginx停止,再查看访问情况
(1)首先先进行访问测试,可以看到都是正常的 [root@localhost vhosts]# curl 192.168.56.20 welcome to 192.168.56.110 [root@localhost vhosts]# curl 192.168.56.20 welcome to use 192.168.0.130 (2)停止lb01上的nginx,查看vip依旧还在lb01上 [root@lb01 keepalived]# systemctl stop nginx [root@lb01 keepalived]# ip addr |grep "192.168.56.20" inet 192.168.56.20/24 scope global secondary eth0:1 (3)再进行测试访问,发现连接被拒绝 [root@localhost vhosts]# curl 192.168.56.20 curl: (7) Failed connect to 192.168.56.20:80; Connection refused
那么,如何解决这种业务服务宕机还可以将IP漂移到备用节点上呢?这就需要Keepalived监测脚本了。首先先写一个脚本,如下:
#!/bin/bash d=`date --date today +%Y%m%d_%H:%M:%S` counter=$(ps -C nginx --no-heading |wc -l) if [ "${counter}" = "0" ]; then systemctl start nginx.service sleep 2 counter=$(ps -C nginx --no-heading|wc -l) if [ "${counter}" = "0" ]; then echo "$d nginx was down.keepalived will stop." >> /var/log/check_ng.log systemctl stop keepalived fi fi
此处在监测到nginx进程为0时,会重新启动nginx,再进行统计nginx的进程数量,如果依旧为0,则将keepalived服务停止,启用高可用故障切换。实验阶段,为了看到效果,使用一下脚本,只要监测到了nginx进程数为0,即刻停止keepalived服务,脚本如下:
此脚本在lb01和lb02上都需要存在的,脚本路径:/etc/keepalived/check_nginx.sh
[root@lb01 keepalived]# cat check_nginx.sh #!/bin/bash d=`date --date today +%Y%m%d_%H:%M:%S` counter=$(ps -C nginx --no-heading|wc -l) if [ $counter -eq 0 ]; then echo "$d nginx was down.keepalived will stop." >> /var/log/check_ng.log systemctl stop keepalived fi
再对lb01和lb02的keepalived.conf配置文件进行修改,增加脚本模块:
[root@lb01 keepalived]# cat keepalived.conf ! Configuration File for keepalived global_defs { notification_email { 123456@qq.com } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id lb01 } vrrp_script chk_nginx { #定义vrrp脚本,检测nginx进程,此处一定要注意和"{"的空格,如果没有空格,会导致脚本不会执行,切记切记!!! script "/etc/keepalived/check_nginx.sh" #执行脚本,当Nginx服务有问题,就停掉Keepalived interval 2 #监测的间隔时间为2s weight 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 55 priority 150 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 192.168.56.20/24 } track_script { chk_nginx #在vrrp实例VI_1启用chk_nginx这个脚本 } }
下面测试过程和结果:
(1)在lb01上查看keepalived的vip和进程以及nignx的端口
[root@lb01 keepalived]# !ip ip addr |grep "192.168.56.20" inet 192.168.56.20/24 scope global secondary eth0 [root@lb01 keepalived]# netstat -tulnp |grep nginx tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 7624/nginx: master [root@lb01 keepalived]# ps -ef |grep keepalived root 7633 1 0 23:49 ? 00:00:00 /usr/sbin/keepalived -D root 7634 7633 0 23:49 ? 00:00:00 /usr/sbin/keepalived -D root 7635 7633 0 23:49 ? 00:00:00 /usr/sbin/keepalived -D
(2)模拟Nginx故障,停止Nginx服务,再查看(1)中的相关信息 [root@lb01 keepalived]# systemctl stop nginx [root@lb01 keepalived]# !nets netstat -tulnp |grep nginx [root@lb01 keepalived]# !ip ip addr |grep "192.168.56.20" [root@lb01 keepalived]# ps -ef |grep keepalived root 7881 5009 0 23:51 pts/1 00:00:00 grep --color=auto keepalived
(3)在lb02上查看VIP信息是否存在,并验证web服务访问是否正常
[root@lb02 keepalived]# !ip ip addr |grep "192.168.56.20" inet 192.168.56.20/24 scope global secondary eth0 [root@localhost vhosts]# curl 192.168.56.20 welcome to use 192.168.0.130 [root@localhost vhosts]# curl 192.168.56.20 welcome to 192.168.56.110
通过上述的脚本监测,可以实现了真正的Nginx+Keepalived高可用故障切换功能。
7、写一个监测Keepalived脑裂的脚本
为了防止高可用功能出现脑裂现象,还可以在备用服务器上写一个监测脚本,如果可以ping通主节点并且备用节点有VIP就报警。
(1)在lb02上写一个监测脚本并执行
[root@lb02 keepalived]# cat check_split_brain.sh #!/bin/bash lb01_vip="192.168.56.20" lb01_ip="192.168.56.12" while true do ping -c 2 -W 3 $lb01_ip &>/dev/null if [ $? -eq 0 -a `ip add|grep "$lb01_vip"|wc -l` -eq 1 ] then echo "ha is split brain.warning." else echo "ha is ok." fi sleep 3 done [root@lb02 keepalived]# sh check_split_brain.sh ha is ok. ha is ok. ha is ok.
正常情况下,主节点还活着,VIP 192.168.56.20就在主节点上,不会报警,提示:ha is ok
(2)模拟脑裂:停止主节点lb01上的Keepalived,查看lb02上的脚本执行情况
[root@lb01 keepalived]# systemctl stop keepalived [root@lb02 keepalived]# sh check_split_brain.sh ha is ok. ha is split brain.warning. ha is split brain.warning.
从上可以看到脚本会报警有脑裂的错误,即可将此叫脚本放在zabbix监控服务当中,实现脑裂报警。