keepalived结合nfs实现生产环境高可用
keepalived结合nfs实现生产环境高可用-oldlai
1、服务器无可厚非会遇到意外宕机的情况,如果服务端出现故障,那么客户端挂载的目录将不可用,如果这个目录是挂载给用户作为静态资源,那么前端就无法访问了。因为我们并不知道哪个服务器会挂,或者说,直接挂载某个ip,如果该服务器挂了,如何实现切换,又是一个需要解决的痛点问题。
2、对于上面的痛点问题我们就需要用到keeplived工具了,它会为我们虚拟出一个IP,我们只需要挂载这个IP即可,该IP会首先绑定在主服务器上,如果主服务器上的nfs宕机或者是主服务器宕机,则会漂移到备用服务器上,而客户端挂载的还是虚拟IP。
3、基于上述问题我们用概念图来解释一下
B服务器一旦挂了,vip则漂移到C服务上,如下:
4、前面提出了nfs服务挂了的痛点问题下面我们就这个问题进行解决
具体解决办法实现
生产环境服务器规格
服务器名称 | 服务器IP地址 | 网卡名称 | 备注 |
---|---|---|---|
keepalived | 10.0.0.3 | eth0 | |
master(B) | 10.0.0.61 | eth0 | |
nfs客户端(A) | 10.0.0.120 | eth0 | |
backup(C) | 10.0.0.62 | eth0 |
1、后端B、C两台服务器均部署keepalived
B服务器和C服务器均执行如下命令
root@master:~ # yum install keepalived -y
[root@backup ~]# yum install keepalived -y
2、备份原配置文件,然后修改配置文件:
#master的keepalived的配置文件
root@master:~ # cd /etc/keepalived/
您在 /var/spool/mail/root 中有新邮件
root@master:/etc/keepalived # pwd
/etc/keepalived
root@master:/etc/keepalived #
root@master:/etc/keepalived # cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}
notification_email_from 1026044760@qq.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script check_nfs {
script "/data/sh/check_nfs.sh"
interval 2
weight -20
}
# VIP1
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 90
advert_int 5
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.3/24 label eth0:0
}
track_script {
check_nfs
}
}
# backup备用服务器配置文件
[root@backup keepalived]# cat keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}
notification_email_from 1026044760@qq.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script check_nfs {
script "/data/sh/check_nfs.sh"
interval 2
weight -20
}
# VIP1
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 80
advert_int 5
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.3/24 label eth0:0
}
track_script {
check_nfs
}
}
3、master和backup服务器创建用于检测nfs是否存在的脚本
#master
root@master:/data/sh # cat check_nfs.sh
#!/bin/bash
#by oldlai
##############
killall -0 nfsd
if [ $? -ne 0 ];then
systemctl stop keepalived
fi
#backup
[root@backup keepalived]# cat /data/sh/check_nfs.sh
#!/bin/bash
#by oldlai
##############
killall -0 nfsd
if [ $? -ne 0 ];then
systemctl stop keepalived
fi
4、启动rpcbind,nfs,keepalived服务:
#master节点
root@master:~ # systemctl start rpcbind
root@master:~ # systemctl start nfs
root@master:~ # systemctl start keepalived
#backup节点
[root@backup ~]# systemctl start rpcbind
[root@backup ~]# systemctl start nfs
[root@backup ~]# systemctl start keepalived
备注:这三个服务的启动次序不要搞错
5、在master服务器上查看虚拟IP
root@master:~ # ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.61 netmask 255.255.255.0 broadcast 10.0.0.255
ether 00:0c:29:1c:a0:df txqueuelen 1000 (Ethernet)
RX packets 399134 bytes 122155578 (116.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 311571 bytes 86941574 (82.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet ** 10.0.0.3 ** netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:1c:a0:df txqueuelen 1000 (Ethernet)
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.61 netmask 255.255.255.0 broadcast 172.16.1.255
inet6 fe80::20c:29ff:fe1c:a0e9 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:1c:a0:e9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16 bytes 1286 (1.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 49388 bytes 4096178 (3.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49388 bytes 4096178 (3.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
6、将master的nfs服务关掉,查看vip,这时vip会漂移到backup服务器上:
root@master:~ # systemctl stop nfs
root@master:~ # ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.61 netmask 255.255.255.0 broadcast 10.0.0.255
ether 00:0c:29:1c:a0:df txqueuelen 1000 (Ethernet)
RX packets 447446 bytes 132536968 (126.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 352379 bytes 99609300 (94.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.61 netmask 255.255.255.0 broadcast 172.16.1.255
inet6 fe80::20c:29ff:fe1c:a0e9 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:1c:a0:e9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16 bytes 1286 (1.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 49438 bytes 4100328 (3.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49438 bytes 4100328 (3.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
backup 服务器的IP:
[root@backup ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.62 netmask 255.255.255.0 broadcast 10.0.0.255
ether 00:0c:29:9d:c8:07 txqueuelen 1000 (Ethernet)
RX packets 143391 bytes 108149745 (103.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 97770 bytes 13936087 (13.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.3 netmask 255.255.255.0 broadcast 0.0.0.0
ether 00:0c:29:9d:c8:07 txqueuelen 1000 (Ethernet)
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.16.1.62 netmask 255.255.255.0 broadcast 172.16.1.255
inet6 fe80::20c:29ff:fe9d:c811 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:9d:c8:11 txqueuelen 1000 (Ethernet)
RX packets 6 bytes 360 (360.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 13 bytes 1016 (1016.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 20 bytes 1476 (1.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20 bytes 1476 (1.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
备注: 到这里说明vip已经可以正常漂移,我们只需要在nfs客户端挂载vip即可。
7、客户端挂载目录,然后创建如下脚本:
挂载:
mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs
推荐使用软挂载,默认是硬挂载。使用软挂载,服务端宕机,不会一直阻塞。
客户端检测脚本
[root@khd ~]# cat check_nfs.sh
#!/bin/bash
#by oldlai
###############
while true;do
ls /mnt/nfs &> /dev/null
if [ $? -ne 0 ];then
umount -l /mnt/nfs && mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs
fi
sleep 1
done
备注:如果客户端已经挂载了,服务端某台服务器宕机了,即使vip切换了,但是还是会报错,以前失效的挂载连接还在。所以需要卸载,重新挂载一次,这个脚本会每秒检测一次。
8.客户端配置定时任务
#客户端
[root@master ~]# crontab -l
* * * * * /usr/bin/bash /root/check_nfs.sh >/dev/null
[root@khd ~]# cat check_nfs.sh
#!/bin/bash
#by oldlai
###############
while true;do
ls /mnt/nfs &> /dev/null
if [ $? -ne 0 ];then
umount -l /mnt/nfs && mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs
fi
sleep 1
done