KingbaseES R6 集群启动‘incorrect command permissions for the virtual ip’故障案例

案例说明:
KingbaseES R6集群启动时,出现“incorrect command permissions for the virtual ip”故障,本案例介绍了如何分析和解决此案例方法和步骤。

数据库版本:

test=# select version();
                                                       version                                                        
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0023 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

集群架构:

一、集群启动失败

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:27:26 Ready to start all DB ...
2021-03-01 13:27:26 begin to start DB on "[192.168.7.243]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:30 execute to start DB on "[192.168.7.243]" success, connect to check it.
2021-03-01 13:27:31 DB on "[192.168.7.243]" start success.
2021-03-01 13:27:32 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:27:34 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:27:37 begin to start DB on "[192.168.7.248]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:40 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 13:27:41 DB on "[192.168.7.248]" start success.
ERROR: No execute permission for "/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin/arping"
incorrect command permissions for the virtual ip.
2021-03-01 13:27:41 There is no primary DB running, will do nothing and exit.

=从以上错误信息可知,在加载vip时访问arping时,出现权限问题=

二、故障分析

1、查看repmgr配置信息

[kingbase@node3 bin]$ cat ../etc/repmgr.conf 
on_bmj=off
node_id=1
node_name='node243'
promote_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log'
data_directory='/home/kingbase/cluster/R6C5/R6C5R/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=10
reconnect_interval=6
failover='automatic'
recovery='standby'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.241/24'
net_device='enp0s3'
net_device_ip='192.168.7.243'
ipaddr_path='/sbin'
arping_path='/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin'
synchronous='sync'
repmgrd_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off

=此版本使用的arping是数据库软件包自带的工具=

2、查看arping版本

3、查看arping权限

[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase root 11K Nov  5  2021 arping

三、问题解决步骤

1、配置arping所有者为kingbase用户

1)配置权限

[kingbase@node1 bin]$ chown -R kingbase.kingbase arping
[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase kingbase 11K Nov  5  2021 arping

2)启动集群(故障依旧)

2、配置arping所有者为root并分配setuid权限

1)配置权限

[root@node3 ~]# cd /home/kingbase/cluster/R6C5/R6C5R//kingbase/bin
[root@node3 bin]# chown -R root.root arping
[root@node3 bin]# chmod u+s arping
[root@node3 bin]# ls -lh arping
-rwsr-xr-x 1 root root 11K Nov  5  2021 arping

2)启动集群

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:38:04 Ready to start all DB ...
2021-03-01 13:38:04 begin to start DB on "[192.168.7.243]".
2021-03-01 13:38:05 DB on "[192.168.7.243]" already started, connect to check it.
2021-03-01 13:38:06 DB on "[192.168.7.243]" start success.
2021-03-01 13:38:06 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:38:08 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:38:11 begin to start DB on "[192.168.7.248]".
2021-03-01 13:38:12 DB on "[192.168.7.248]" already started, connect to check it.
2021-03-01 13:38:13 DB on "[192.168.7.248]" start success.
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node243 | primary | * running |          | default  | 100      | 3        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node248 | standby |   running | node243  | default  | 100      | 3        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 13:38:13 The primary DB is started.
2021-03-01 13:38:13 check synchronous_standby_names ... 
t
2021-03-01 13:38:24 Success to load virtual ip [192.168.7.241/24] on primary host [192.168.7.243].
2021-03-01 13:38:24 Try to ping vip on host 192.168.7.248 ...
2021-03-01 13:38:26 Try to ping vip on host 192.168.7.243 ...
2021-03-01 13:38:29 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 13:40:52] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:40:52] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log"

2021-03-01 13:38:30 execute to start repmgrd on "[192.168.7.248]" failed.
2021-03-01 13:38:30 begin to start repmgrd on "[192.168.7.243]".
[2021-03-01 13:38:30] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:38:30] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log"

2021-03-01 13:38:32 repmgrd on "[192.168.7.243]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd     | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+-------------+-------+---------+--------------------
 1  | node243 | primary | * running |          | running     | 12552 | no      | n/a                
 2  | node248 | standby |   running | node243  | not running | n/a   | n/a     | n/a                
[2021-03-01 13:40:56] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log"

[2021-03-01 13:38:37] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log"

2021-03-01 13:38:39 Done.


[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------------------------------
 1  | node243 | primary | * running |          | default  | 100      | 3        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node248 | standby |   running | node243  | default  | 100      | 3        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 由以上可知,集群启动成功。===

四、总结

对于kingbaseES R6集群使用数据库系统自带的arping软件包,一般不会出现版本不匹配的问题;对于arping工具的属主应该是root,不是kingbase用户,但为了kingbase用户也能执行arping,必须配置arping的setuid权限。

posted @ 2022-01-26 14:35  KINGBASE研究院  阅读(199)  评论(0编辑  收藏  举报