随笔- 38 文章- 3 评论- 3 阅读- 13087

keepalived结合nfs实现生产环境高可用

keepalived结合nfs实现生产环境高可用-oldlai

1、服务器无可厚非会遇到意外宕机的情况，如果服务端出现故障，那么客户端挂载的目录将不可用，如果这个目录是挂载给用户作为静态资源，那么前端就无法访问了。因为我们并不知道哪个服务器会挂，或者说，直接挂载某个ip，如果该服务器挂了，如何实现切换，又是一个需要解决的痛点问题。

2、对于上面的痛点问题我们就需要用到keeplived工具了，它会为我们虚拟出一个IP，我们只需要挂载这个IP即可，该IP会首先绑定在主服务器上，如果主服务器上的nfs宕机或者是主服务器宕机，则会漂移到备用服务器上，而客户端挂载的还是虚拟IP。

3、基于上述问题我们用概念图来解释一下

nfs概念图

B服务器一旦挂了，vip则漂移到C服务上，如下：

4、前面提出了nfs服务挂了的痛点问题下面我们就这个问题进行解决

具体解决办法实现

生产环境服务器规格

服务器名称	服务器IP地址	网卡名称
keepalived	10.0.0.3	eth0
master(B)	10.0.0.61	eth0
nfs客户端(A)	10.0.0.120	eth0
backup(C)	10.0.0.62	eth0

1、后端B、C两台服务器均部署keepalived

B服务器和C服务器均执行如下命令
root@master:~ # yum install keepalived -y
[root@backup ~]# yum install keepalived -y

2、备份原配置文件，然后修改配置文件：

#master的keepalived的配置文件
root@master:~ # cd /etc/keepalived/
您在 /var/spool/mail/root 中有新邮件
root@master:/etc/keepalived # pwd
/etc/keepalived
root@master:/etc/keepalived # 
root@master:/etc/keepalived # cat keepalived.conf
! Configuration File for keepalived
 global_defs {
  notification_email {
      root@localhost
 }
    notification_email_from 1026044760@qq.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id LVS_DEVEL
 }
vrrp_script check_nfs {
    script "/data/sh/check_nfs.sh"
    interval 2
    weight -20
 }

# VIP1
vrrp_instance VI_1 {
     state MASTER
     interface eth0
     virtual_router_id 51
     priority 90
     advert_int 5
     authentication {
         auth_type  PASS
         auth_pass  1111
     }
     virtual_ipaddress {
        10.0.0.3/24  label eth0:0
     }
     track_script {
        check_nfs
    }
}
# backup备用服务器配置文件
[root@backup keepalived]# cat keepalived.conf 
! Configuration File for keepalived
 global_defs {
  notification_email {
      root@localhost
 }
    notification_email_from 1026044760@qq.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    router_id LVS_DEVEL
 }
vrrp_script check_nfs {
    script "/data/sh/check_nfs.sh"
    interval 2
    weight -20
 }

# VIP1
vrrp_instance VI_1 {
     state BACKUP
     interface eth0
     virtual_router_id 51
     priority 80
     advert_int 5
     authentication {
         auth_type  PASS
         auth_pass  1111
     }
     virtual_ipaddress {
        10.0.0.3/24  label eth0:0
     }
     track_script {
        check_nfs
    }
}

3、master和backup服务器创建用于检测nfs是否存在的脚本

#master
root@master:/data/sh # cat check_nfs.sh 
#!/bin/bash
#by oldlai
##############

killall -0 nfsd
if [ $? -ne 0 ];then
        systemctl stop keepalived
fi
#backup 
[root@backup keepalived]# cat /data/sh/check_nfs.sh 
#!/bin/bash
#by oldlai
##############

killall -0 nfsd
if [ $? -ne 0 ];then
        systemctl stop keepalived
fi

4、启动rpcbind，nfs，keepalived服务：

#master节点
root@master:~ # systemctl start rpcbind 
root@master:~ # systemctl start nfs 
root@master:~ # systemctl start keepalived
#backup节点
[root@backup ~]# systemctl start rpcbind 
[root@backup ~]# systemctl start nfs
[root@backup ~]# systemctl start keepalived
备注：这三个服务的启动次序不要搞错

5、在master服务器上查看虚拟IP

root@master:~ # ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.61  netmask 255.255.255.0  broadcast 10.0.0.255
        ether 00:0c:29:1c:a0:df  txqueuelen 1000  (Ethernet)
        RX packets 399134  bytes 122155578 (116.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 311571  bytes 86941574 (82.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet ** 10.0.0.3 **  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 00:0c:29:1c:a0:df  txqueuelen 1000  (Ethernet)

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.1.61  netmask 255.255.255.0  broadcast 172.16.1.255
        inet6 fe80::20c:29ff:fe1c:a0e9  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:1c:a0:e9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 16  bytes 1286 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 49388  bytes 4096178 (3.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 49388  bytes 4096178 (3.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

6、将master的nfs服务关掉，查看vip，这时vip会漂移到backup服务器上：

root@master:~ # systemctl stop nfs
root@master:~ # ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.61  netmask 255.255.255.0  broadcast 10.0.0.255
        ether 00:0c:29:1c:a0:df  txqueuelen 1000  (Ethernet)
        RX packets 447446  bytes 132536968 (126.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 352379  bytes 99609300 (94.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.1.61  netmask 255.255.255.0  broadcast 172.16.1.255
        inet6 fe80::20c:29ff:fe1c:a0e9  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:1c:a0:e9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 16  bytes 1286 (1.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 49438  bytes 4100328 (3.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 49438  bytes 4100328 (3.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

backup 服务器的IP：
[root@backup ~]# ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.62  netmask 255.255.255.0  broadcast 10.0.0.255
        ether 00:0c:29:9d:c8:07  txqueuelen 1000  (Ethernet)
        RX packets 143391  bytes 108149745 (103.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 97770  bytes 13936087 (13.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.3  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 00:0c:29:9d:c8:07  txqueuelen 1000  (Ethernet)

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.1.62  netmask 255.255.255.0  broadcast 172.16.1.255
        inet6 fe80::20c:29ff:fe9d:c811  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:9d:c8:11  txqueuelen 1000  (Ethernet)
        RX packets 6  bytes 360 (360.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13  bytes 1016 (1016.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 20  bytes 1476 (1.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20  bytes 1476 (1.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

备注: 到这里说明vip已经可以正常漂移，我们只需要在nfs客户端挂载vip即可。

7、客户端挂载目录，然后创建如下脚本：

挂载：

mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs

推荐使用软挂载，默认是硬挂载。使用软挂载，服务端宕机，不会一直阻塞。

客户端检测脚本

[root@khd ~]# cat check_nfs.sh
#!/bin/bash
#by oldlai
###############
while true;do
ls /mnt/nfs &> /dev/null
if [ $? -ne 0 ];then
        umount -l /mnt/nfs && mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs
fi
sleep 1
done

备注:如果客户端已经挂载了，服务端某台服务器宕机了，即使vip切换了，但是还是会报错，以前失效的挂载连接还在。所以需要卸载，重新挂载一次，这个脚本会每秒检测一次。

8.客户端配置定时任务

#客户端
[root@master ~]# crontab -l
* * * * * /usr/bin/bash /root/check_nfs.sh >/dev/null
[root@khd ~]# cat check_nfs.sh
#!/bin/bash
#by oldlai
###############
while true;do
ls /mnt/nfs &> /dev/null
if [ $? -ne 0 ];then
        umount -l /mnt/nfs && mount -t nfs -o soft,timeo=10 10.0.0.3:/data/lutixia /mnt/nfs
fi
sleep 1
done