KingbaseES V8R6集群运维案例之---手工配置集群vip

案例说明:
在集群前期部署过程中,如果没有配置vip,但部署运行后,因应用需求,需要配置vip。对于KingbaseES V8R6集群手工配置vip操作比较简单,只需要修改repmgr.conf文件即可。

适用版本:

KingbaseES V8R6

操作步骤:

      1) 确定需要配置的vip地址,需和物理ip同网段,并且没有被使用。
      2) 查看arping和ip可执行文件的路径及arping的版本。
      3) 对ip和arping可执行文件配置setuid权限(s权限)。
      4) 修改repmgr.conf文件添加配置项。
      5) 重新启动集群并验证集群状态。
      6) 主备切换测试。
      7) 应用连接vip访问测试。

一、集群架构信息
1、前期部署


或者在脚本部署过程中,install.conf没有配置vip。

2、查看集群节点状态信息

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 1        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 1        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、查看repmgr.conf文件
在集群环境中,vip配置在repmgr.conf文件中,当集群启动或切换时,会读取repmgr.conf文件,获取vip配置信息:

 kingbase@uos01:~/cluster/R6HA/kha/kingbase/etc$ cat repmgr.conf 
on_bmj=off
node_id=1
node_name='node238'
promote_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/kha/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/kha/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/kha/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=3
reconnect_interval=5
failover='automatic'
recovery='manual'
monitoring_history='no'
trusted_servers='192.168.7.1'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/kha/kingbase/hamgrd.pid'
ping_path='/usr/bin'

#从以上配置文件获知,文件中没有virtual_ip的配置项

二、修改repmgr.conf配置文件配置vip(需要在所有节点执行)

1、确定配置vip的网卡

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever

#配置vip的网卡必须和物理ip是同一个设备。

2、确定ip和arping可执行文件路径和权限

1)确定ip和arping可执行文件路径即属性(建议直接使用数据库自带arping)

手工配置属主和权限:

# 对于通用机环境,arping的属主必须是root,需要u+s的权限(4755)
[root@node201 ~]# chown root.root /usr/sbin/ip
[root@node201 ~]# chmod 4755 /usr/sbin/ip
[root@node201 ~]# ls -lh /usr/sbin/ip
-rwsr-xr-x 1 root root 460K Oct  1  2020 /usr/sbin/ip

# 对于通用机环境,ip的属主必须是root,需要u+s的权限(4755)
[root@node201 ~]# chown root.root /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
[root@node201 ~]# chmod 4755 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
[root@node201 ~]# ls -lh /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping
-rwsr-xr-x 1 root root 14K Sep  2  2023 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/arping

2)查看arping版本

数据库自带arping版本:
kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./arping -V
arping utility, iputils-s20200808kb

Tips:
1)ip命令用于加载和卸载vip。
2)arping命令用于vip切换中的arp cache的清理和测试。

3、修改repmgr.conf配置文件

如下所示,在repmgr.conf增加以下配置:

# 此为较早V8R6版本配置
virtual_ip='192.168.7.244/24'
net_device='enp0s3'
arping_path='/home/kingbase/cluster/R6HA/kha/kingbase/bin'
ipaddr_path='/usr/sbin'

Tips:
最新的V8R6的版本增加了net_device_ip参数(配置本机的物理ip)

# 较新V8R6版本配置
virtual_ip='192.168.1.88/24'
net_device='enp0s3'
net_device_ip='192.168.1.201'
arping_path='/home/kingbase/cluster/R6C8/HAC8/kingbase/bin'
ipaddr_path='/usr/sbin'

三、重新启动集群(sys_monitor.sh启动)

集群在启动时,将加载vip地址:

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
......
server started
......
2021-03-01 12:22:57 Success to load virtual ip [192.168.7.244/24] on primary host [192.168.7.238].
2021-03-01 12:22:57 Try to ping vip on host 192.168.7.238 ...
2021-03-01 12:22:59 Try to ping vip on host 192.168.7.239 ...
.......
2021-03-01 12:23:03 repmgrd on "[192.168.7.239]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node238 | primary | * running |          | running | 15043 | no      | n/a                
 2  | node239 | standby |   running | node238  | running | 6440  | no      | n/a                
2021-03-01 12:23:07 Done.

# 从以上信息可获知,集群重启后已经开始加载VIP地址 [192.168.7.244/24] 

四、验证集群状态

1、查看系统vip的加载

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.7.244/24 scope global secondary enp0s3:3
       valid_lft forever preferred_lft forever

# 从以上获知,vip加载在主库节点成功===

2、通过vip连接数据库查看流复制状态

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./ksql -h 192.168.7.244 -U system test
ksql (V8.0)
Type "help" for help.

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s
tart         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag |
 replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+
 14935 |    16384 | esrep   | node239          | 192.168.7.239 |                 |       58172 | 2021-03-01 12:22:
51.831920+08 |              | streaming | 0/6000670 | 0/6000670 | 0/6000670 | 0/6000670  |           |           |
            |             1 | quorum     | 2021-03-01 12:24:30.751707+08
(1 row)

五、主备switchover切换测试

1、执行switchover的切换

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr standby switchover --siblings-follow
NOTICE: executing switchover on node "node239" (ID: 2)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
.......
DETAIL: node "node239" is now primary and node "node238" is attached as standby
INFO: unpausing repmgrd on node "node238" (ID 1)
INFO: unpause node "node238" (ID 1) successfully
INFO: unpausing repmgrd on node "node239" (ID 2)
INFO: unpause node "node239" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully

2、查看切换后vip的加载

kingbase@uos02:~/cluster/R6HA/kha/kingbase/bin$ ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:c9:c0:27 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.239/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.7.244/24 scope global secondary enp0s3:3
       valid_lft forever preferred_lft forever

# 由以上获知,vip已经加载到新的主库上

六、集群failover切换测试

如下所示,failover切换完成后:

1、查看集群节点状态

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node238 | primary | * running |          | default  | 100      | 3        | host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node239 | standby |   running | node238  | default  | 100      | 2        | host=192.168.7.239 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

# 从以上获知,现在集群节点状态已经恢复正常

2、查看failover后vip的加载

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:56:02:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.7.238/24 brd 192.168.7.255 scope global noprefixroute enp0s3
       valid_lft forever preferred_lft forever
    inet 192.168.7.244/24 scope global secondary enp0s3:3
       valid_lft forever preferred_lft forever

# 从以上获知,vip已经加载到新的主库

七、配置过程中的故障信息

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart

the dir "/sbin" has no execute file "arping", please set [arping_path] in /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

kingbase@uos01:~/cluster/R6HA/kha/kingbase/bin$ ./sys_monitor.sh restart
.......
server started
2021-03-01 12:19:34 execute to start DB on "[192.168.7.238]" success, connect to check it.
2021-03-01 12:19:35 DB on "[192.168.7.238]" start success.
2021-03-01 12:19:35 Try to ping trusted_servers on host 192.168.7.238 ...
2021-03-01 12:19:37 Try to ping trusted_servers on host 192.168.7.239 ...
2021-03-01 12:19:40 begin to start DB on "[192.168.7.239]".
incorrect command permissions for the virtual ip.
waiting for server to start.... done
server started
2021-03-01 12:19:40 execute to start DB on "[192.168.7.239]" success, connect to check it.
2021-03-01 12:19:41 DB on "[192.168.7.239]" start success.
ERROR: No execute permission for "/usr/sbin/ip"
incorrect command permissions for the virtual ip.
2021-03-01 12:19:42 There is no primary DB running, will do nothing and exit.

从以上故障获知,在配置文件没有设置arping可执行文件的路径及ip和arping可执行文件没有设置setuid权限

八、总结
在部署完成KingbaseES V8R6集群后,手工配置vip操作比较简单,只需要修改repmgr.conf配置文件即可,但是在通用机环境下,经常出现的故障是:ip和arping命令的属主和权限不正确,导致vip无法正确加载。
如下所示,属主必须是root,权限增加u+s(4755):

posted @ 2021-06-28 19:42  天涯客1224  阅读(204)  评论(0编辑  收藏  举报