KingbaseES V8R6集群运维案例之---license禁用读写分离模块

案例说明:
KingbaseES V8R6集群,在license中禁用“读写分离模块”后,备库的数据库参数,hot_standby只能配置为off,导致备库处于"starting up"状态,无法正常连接访问;对于repmgr管理的集群,主备流复制正常,但集群状态异常,repmgrd无法正常监控集群节点状态。
Tips:
hot_standby=on ,备库处于热备状态,数据库可以连接访问。

适用版本:
KingbaseES V8R6

一、问题现象
1、备库数据库状态
如下所示,在license中禁用“读写分离模块”后,备库数据库服务启动后,sys_log日志提示“hot_standby is not supported....",此时备库的hot_standby只能配置为off,处于”starting up“状态,无法正常连接访问。

备库sys_log日志:

2、集群状态
如下所示,集群中备库状态异常:

[kingbase@node201 bin]$ ./repmgr cluster show
string                                                                                                               
----+-------+---------+---------------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running     |          | default  | 100      | 67       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby | ? unreachable | ? node1  | default  | 100      |          | ?       | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

[WARNING] following issues were detected
  - unable to connect to node "node2" (ID: 2)
  - node "node2" (ID: 2) is registered as an active standby but is unreachable

备库数据库日志:
如下所示,备库数据库服务一直在”starting up“状态:

......
CST,,0,LOG,00000,"started streaming WAL from primary at 3/6000000 on timeline 67",,,,,,,,,""
2024-09-06 10:25:10.463 CST,"esrep","esrep",5392,"192.168.1.201:65494",66da6806.1510,1,"",2024-09-06 10:25:10 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.185 CST,"esrep","esrep",5529,"192.168.1.201:9005",66da6821.1599,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""
2024-09-06 10:25:37.190 CST,"esrep","esrep",5530,"192.168.1.201:9007",66da6821.159a,1,"",2024-09-06 10:25:37 CST,,0,FATAL,57P03,"the database system is starting up",,,,,,,,,""

二、问题分析
1、检查流复制状态
如下所示,集群主备流复制状态正常:

test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend
_start         | backend_xmin |   state   | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_l
ag | replay_lag | sync_priority | sync_state |          reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+----------+-----------+-----------+------------+-----------+--------
---+------------+---------------+------------+-------------------------------
 3420 |    16384 | esrep   | node2            | 192.168.1.202 |                 |       30761 | 2024-09-06 10:2
5:09.409266+08 |              | streaming |          | 3/70000D0 | 3/70000D0 | 3/70000D0  |           |
   |            |             1 | quorum     | 2024-09-06 10:29:18.758652+08
(1 row)

2、查看备库hamgr.log
如下所示,备库repmgrd进程检测到数据库服务一直处于”startping up“状态,无法通过pqping()连接:

[2024-09-06 10:32:23] [NOTICE] rwcmgrd (repmgrd 5.0.0) starting up
[2024-09-06 10:32:23] [INFO] connecting to database "host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[2024-09-06 10:32:23] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[2024-09-06 10:32:23] [ERROR] connection to database failed
[2024-09-06 10:32:23] [DETAIL]
FATAL:  the database system is starting up

3、启动备库repmgrd进程
如下所示,备库启动repmgrd进程失败:

[kingbase@node202 bin]$ ./repmgrd -d
[2024-09-09 14:48:17] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C8/HAC8/kingbase/log/hamgr.log"

[kingbase@node202 bin]$ ps -ef |grep repmgrd

三、问题总结
读写分离模块被禁用,导致备库hot_standby=off:
1、备库数据库服务只能启动到recovery(starting up)状态,无法建立连接访问。
2、备库repmgrd进程无法启动,失去集群的高可用守候功能。
3、通过repmgr cluster show查看集群状态,备库状态异常。
4、主备流复制正常。

posted @ 2024-09-09 15:07  天涯客1224  阅读(26)  评论(0编辑  收藏  举报