KingbaseES V8R6集群案例之---主库vip地址被卸载分析

案例分析:
KingbaseES V8R6集群,在应用vip连接的架构中,正常运行期间vip应该在主库物理网卡被加载,但是在生产中会遇到vip意外被卸载的状况,需要分析相关的日志获取到vip缺失的原因。
适用版本:
KingbaseES V8R6

一、集群Virtual IP的管理与支持
VIP(Virtual Internet-Protocal,虚拟IP)是主备集群提供的一种功能,用于应用访问集群的接口。VIP会由主库的守护进程repmgrd加载到本地设备上,当主库故障发生故障自动转移时,VIP会自动漂移到新的主库上,应用方法VIP就不受主库故障的影响。
集群中关于VIP相关参数:
集群加载VIP的命令:

vip_add_cmd='ip addr add $\ *IP*\ $ dev $\ *DEV*\ $ label $\ *DEV*\ $:3'
arping_cmd='arping –U $\ *IP*\ $ -w 2 –c 2 –I $\ *DEV*\ $'

集群卸载VIP的命令:
vip_del_cmd='ip addr del $\ *IP*\ $ dev $\ *DEV*\ $'

VIP变化的触发点:

主库注册:执行primary register操作时,在主库写入primary信息前加载vip。
主库注销:在删除primary信息后卸载vip操作。
故障切换:在切换过程中,备库执行promote前,先加载vip,如果vip仍然连通,需要尝试登陆到原主库卸载vip——此过程会不断重试,直到成功或者vip_timeout超时。
主备切换(switchover):需要先在原主库卸载vip,再在备库升主前加载vip。
repmgrd监控主库进程:每隔check_vip_interval检查一次,如果vip不存在则加载。
repmgrd监控备库进程:每隔check_vip_interval检查一次,如果vip存在则卸载。
网关故障:repmgrd进程会退出,其检测vip的逻辑由kbha进程代替。

二、vip管理测试

1、查看主库vip信息(192.168.1.88为vip)

[root@node201 ~]# ip add sh
.........
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 08:00:27:df:15:2c brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.201/24 brd 192.168.1.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.1.88/24 scope global secondary enp0s3:3

2、模拟主库vip被卸载
[root@node201 ~]# ip add del 192.168.1.88/24 dev enp0s3

如下所示,vip地址已被卸载

[root@node201 ~]# ip add sh
........
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 08:00:27:df:15:2c brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.201/24 brd 192.168.1.255 scope global enp0s3
       valid_lft forever preferred_lft forever

三、获取日志信息
1、主库hamgr.log

[2023-11-15 14:03:29] [NOTICE] found primary node lost virtual_ip, try to acquire virtual_ip
[2023-11-15 14:03:31] [NOTICE] PING 192.168.1.88 (192.168.1.88) 56(84) bytes of data.

--- 192.168.1.88 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms

[2023-11-15 14:03:31] [WARNING] ping host"192.168.1.88" failed
[2023-11-15 14:03:31] [DETAIL] average RTT value is not greater than zero
[2023-11-15 14:03:31] [DEBUG] executing:
  /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/kbha -A loadvip
[2023-11-15 14:03:31] [DEBUG] result of command was 0 (0)
[2023-11-15 14:03:31] [DEBUG] LocalCommand(): no oneLineStr returned
[2023-11-15 14:03:31] [DEBUG] executing:
  /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/kbha -A arping
[2023-11-15 14:03:31] [DEBUG] result of command was 0 (0)
[2023-11-15 14:03:31] [DEBUG] LocalCommand(): no oneLineStr returned
[2023-11-15 14:03:31] [INFO] loadvip result: 1, arping result: 1
[2023-11-15 14:03:31] [NOTICE] acquire the virtual ip 192.168.1.88 success on localhost

如下图所示,repmgrd监控并重新加载vip:

2、从系统message日志获取vip信息

Nov 15 14:01:14 node201 su: (to kingbase) root on pts/1
Nov 15 14:02:01 node201 systemd: Started Session 51 of user kingbase.
Nov 15 14:02:01 node201 systemd: Starting Session 51 of user kingbase.
Nov 15 14:03:01 node201 systemd: Started Session 52 of user kingbase.
Nov 15 14:03:01 node201 systemd: Starting Session 52 of user kingbase.
Nov 15 14:03:10 node201 avahi-daemon[643]: Withdrawing address record for 192.168.1.88 on enp0s3.
Nov 15 14:03:31 node201 avahi-daemon[643]: Registering new address record for 192.168.1.88 on enp0s3.IPv4.

如下图所示,系统message日志记录vip被卸载和加载的事件:

四、总结
对于KingbaseES V8R6集群,repmgrd进程会监控vip是否正常加载;在生产环境中,vip被意外卸载,可以通过系统message和主库hamgr.log日志获取相关日志信息判断,是人为操作,还是网络故障导致vip被卸载。

posted @ 2023-11-17 11:03  天涯客1224  阅读(13)  评论(0编辑  收藏  举报