KingbaseES V8R6集群运维案例之---手工配置集群vip
案例说明:
在集群前期部署过程中,如果没有配置vip,但部署运行后,因应用需求,需要配置vip。对于KingbaseES V8R6集群手工配置vip操作比较简单,只需要修改repmgr.conf文件即可。
适用版本:
KingbaseES V8R6
操作步骤:
1) 确定需要配置的vip地址,需和物理ip同网段,并且没有被使用。
2) 查看arping和ip可执行文件的路径及arping的版本。
3) 对ip和arping可执行文件配置setuid权限(s权限)。
4) 修改repmgr.conf文件添加配置项。
5) 重新启动集群并验证集群状态。
6) 主备切换测试。
7) 应用连接vip访问测试。
一、集群架构信息
1、前期部署
或者在脚本部署过程中,install.conf没有配置vip。
2、查看集群节点状态信息
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+
1 | node238 | primary | * running | | default | 100 | 1 | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node239 | standby | running | node238 | default | 100 | 1 | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
3、查看repmgr.conf文件
在集群环境中,vip配置在repmgr.conf文件中,当集群启动或切换时,会读取repmgr.conf文件,获取vip配置信息:
kingbase@uos01:~/cluster/R6HA/kha/kingbase/etc$ cat repmgr.conf
on_bmj=off
node_id=1
node_name='node238'
promote_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/kha/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/kha/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=3
reconnect_interval=5
failover='automatic'
recovery='manual'
monitoring_history='no'
trusted_servers='192.168.7.1'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgrd.pid'
ping_path='/usr/bin'
#从以上配置文件获知,文件中没有virtual_ip的配置项
二、修改repmgr.conf配置文件配置vip(需要在所有节点执行)
1、确定配置vip的网卡
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
#配置vip的网卡必须和物理ip是同一个设备。
2、确定ip和arping可执行文件路径和权限
1)确定ip和arping可执行文件路径即属性(建议直接使用数据库自带arping)
手工配置属主和权限:
# 对于通用机环境,arping的属主必须是root,需要u+s的权限(4755)
[root@node201 ~]# chown root.root /usr/sbin/ip
[root@node201 ~]# chmod 4755 /usr/sbin/ip
[root@node201 ~]# ls -lh /usr/sbin/ip
-rwsr-xr-x 1 root root 460K Oct 1 2020 /usr/sbin/ip
# 对于通用机环境,ip的属主必须是root,需要u+s的权限(4755)
[root@node201 ~]# chown root.root /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
[root@node201 ~]# chmod 4755 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
[root@node201 ~]# ls -lh /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
-rwsr-xr-x 1 root root 14K Sep 2 2023 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
2)查看arping版本
数据库自带arping版本:
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./arping -V
arping utility, iputils-s20200808kb
Tips:
1)ip命令用于加载和卸载vip。
2)arping命令用于vip切换中的arp cache的清理和测试。
3、修改repmgr.conf配置文件
如下所示,在repmgr.conf增加以下配置:
# 此为较早V8R6版本配置
virtual_ip='192.168.7.244/24'
net_device='enp0s3'
arping_path='/home/kingbase/cluster/R6HA/kha/kingbase/bin'
ipaddr_path='/usr/sbin'
Tips:
最新的V8R6的版本增加了net_device_ip参数(配置本机的物理ip)
# 较新V8R6版本配置
virtual_ip='192.168.1.88/24'
net_device='enp0s3'
net_device_ip='192.168.1.201'
arping_path='/home/kingbase/cluster/R6C8/HAC8/kingbase/bin'
ipaddr_path='/usr/sbin'
三、重新启动集群(sys_monitor.sh启动)
集群在启动时,将加载vip地址:
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
......
server started
......
2021-03-01 12:22:57 Success to load virtual ip [192.168.7.244/24] on primary host [192.168.7.238].
2021-03-01 12:22:57 Try to ping vip on host 192.168.7.238 ...
2021-03-01 12:22:59 Try to ping vip on host 192.168.7.239 ...
.......
2021-03-01 12:23:03 repmgrd on "[192.168.7.239]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node238 | primary | * running | | running | 15043 | no | n/a
2 | node239 | standby | running | node238 | running | 6440 | no | n/a
2021-03-01 12:23:07 Done.
# 从以上信息可获知,集群重启后已经开始加载VIP地址 [192.168.7.244/24]
四、验证集群状态
1、查看系统vip的加载
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.7.244/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
# 从以上获知,vip加载在主库节点成功===
2、通过vip连接数据库查看流复制状态
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./ksql -h 192.168.7.244 -U system test
ksql (V8.0)
Type "help" for help.
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_s
tart | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag |
replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+
14935 | 16384 | esrep | node239 | 192.168.7.239 | | 58172 | 2021-03-01 12:22:
51.831920+08 | | streaming | 0/6000670 | 0/6000670 | 0/6000670 | 0/6000670 | | |
| 1 | quorum | 2021-03-01 12:24:30.751707+08
(1 row)
五、主备switchover切换测试
1、执行switchover的切换
kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr standby switchover --siblings-follow
NOTICE: executing switchover on node "node239" (ID: 2)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
.......
DETAIL: node "node239" is now primary and node "node238" is attached as standby
INFO: unpausing repmgrd on node "node238" (ID 1)
INFO: unpause node "node238" (ID 1) successfully
INFO: unpausing repmgrd on node "node239" (ID 2)
INFO: unpause node "node239" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully
2、查看切换后vip的加载
kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:c9:c0:27 brd ff:ff:ff:ff:ff:ff
inet 192.168.7.239/24 brd 192.168.7.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.7.244/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
# 由以上获知,vip已经加载到新的主库上
六、集群failover切换测试
如下所示,failover切换完成后:
1、查看集群节点状态
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+
1 | node238 | primary | * running | | default | 100 | 3 | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node239 | standby | running | node238 | default | 100 | 2 | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
# 从以上获知,现在集群节点状态已经恢复正常
2、查看failover后vip的加载
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.7.244/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
# 从以上获知,vip已经加载到新的主库
七、配置过程中的故障信息
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
the dir "/sbin" has no execute file "arping", please set [arping_path] in /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
.......
server started
2021-03-01 12:19:34 execute to start DB on "[192.168.7.238]" success, connect to check it.
2021-03-01 12:19:35 DB on "[192.168.7.238]" start success.
2021-03-01 12:19:35 Try to ping trusted_servers on host 192.168.7.238 ...
2021-03-01 12:19:37 Try to ping trusted_servers on host 192.168.7.239 ...
2021-03-01 12:19:40 begin to start DB on "[192.168.7.239]".
incorrect command permissions for the virtual ip.
waiting for server to start.... done
server started
2021-03-01 12:19:40 execute to start DB on "[192.168.7.239]" success, connect to check it.
2021-03-01 12:19:41 DB on "[192.168.7.239]" start success.
ERROR: No execute permission for "/usr/sbin/ip"
incorrect command permissions for the virtual ip.
2021-03-01 12:19:42 There is no primary DB running, will do nothing and exit.
从以上故障获知,在配置文件没有设置arping可执行文件的路径及ip和arping可执行文件没有设置setuid权限
八、总结
在部署完成KingbaseES V8R6集群后,手工配置vip操作比较简单,只需要修改repmgr.conf配置文件即可,但是在通用机环境下,经常出现的故障是:ip和arping命令的属主和权限不正确,导致vip无法正确加载。
如下所示,属主必须是root,权限增加u+s(4755):
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」