KingbaseES V8R6集群运维案例之---license禁用读写分离模块
案例说明:
KingbaseES V8R6集群,在license中禁用“读写分离模块”后,备库的数据库参数,hot_standby只能配置为off,导致备库处于"starting up"状态,无法正常连接访问;对于repmgr管理的集群,主备流复制正常,但集群状态异常,repmgrd无法正常监控集群节点状态。
Tips:
hot_standby=on ,备库处于热备状态,数据库可以连接访问。
适用版本:
KingbaseES V8R6
一、问题现象
1、备库数据库状态
如下所示,在license中禁用“读写分离模块”后,备库数据库服务启动后,sys_log日志提示“hot_standby is not supported....",此时备库的hot_standby只能配置为off,处于”starting up“状态,无法正常连接访问。
备库sys_log日志:
2、集群状态
如下所示,集群中备库状态异常:
[kingbase@node201 bin]$ ./repmgr cluster show
string
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 67 | | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | ? unreachable | ? node1 | default | 100 | | ? | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[WARNING] following issues were detected
- unable to connect to node "node2" (ID: 2)
- node "node2" (ID: 2) is registered as an active standby but is unreachable
备库数据库日志:
如下所示,备库数据库服务一直在”starting up“状态:
......
CST,,0,LOG,00000,"started streaming WAL from primary at 3/6000000 on timeline 67",,,,,,,,,""
2024-09-06 10:25:10.463 CST,"esrep","esrep",5392,"192.168.1.201:65494",66da6806.1510,1,"",2024-09-06 10:25:10 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.185 CST,"esrep","esrep",5529,"192.168.1.201:9005",66da6821.1599,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.190 CST,"esrep","esrep",5530,"192.168.1.201:9007",66da6821.159a,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
二、问题分析
1、检查流复制状态
如下所示,集群主备流复制状态正常:
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend
_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_l
ag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+----------+-----------+-----------+------------+-----------+--------
---+------------+---------------+------------+-------------------------------
3420 | 16384 | esrep | node2 | 192.168.1.202 | | 30761 | 2024-09-06 10:2
5:09.409266+08 | | streaming | | 3/70000D0 | 3/70000D0 | 3/70000D0 | |
| | 1 | quorum | 2024-09-06 10:29:18.758652+08
(1 row)
2、查看备库hamgr.log
如下所示,备库repmgrd进程检测到数据库服务一直处于”startping up“状态,无法通过pqping()连接:
[2024-09-06 10:32:23] [NOTICE] rwcmgrd (repmgrd 5.0.0) starting up
[2024-09-06 10:32:23] [INFO] connecting to database "host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[2024-09-06 10:32:23] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[2024-09-06 10:32:23] [ERROR] connection to database failed
[2024-09-06 10:32:23] [DETAIL]
FATAL: the database system is starting up
3、启动备库repmgrd进程
如下所示,备库启动repmgrd进程失败:
[kingbase@node202 bin]$ ./repmgrd -d
[2024-09-09 14:48:17] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C8/HAC8/kingbase/log/hamgr.log"
[kingbase@node202 bin]$ ps -ef |grep repmgrd
三、问题总结
读写分离模块被禁用,导致备库hot_standby=off:
1、备库数据库服务只能启动到recovery(starting up)状态,无法建立连接访问。
2、备库repmgrd进程无法启动,失去集群的高可用守候功能。
3、通过repmgr cluster show查看集群状态,备库状态异常。
4、主备流复制正常。