KingbaseES V8R3集群运维案例之---failover切换后节点均处于standby

案例说明:
KingbaseES V8R3集群,一主一备架构,在主库数据库服务down,failover切换后,原主和新主集群节点状态均为standby,并且无流复制。

适用版本:
KingbaseES V8R3

一、问题现象
一主一备集群架构,在主库数据库服务down后,触发failover切换,通过failover.log日志查看,切换成功。但是查看两个节点的状态都是‘standby’,并且没有流复制。
如下图所示,集群节点状态:

如下图所示,failover切换过程:

二、问题分析
1、从failover.log及新主库可以进行读写,初步断定切换已经完成,但是状态信息没有在kingbasecluster中被更新,可以通过pcp_attach_node更新节点状态信息。
2、流复制问题,需要将原主库恢复为备库后,再创建流复制。
3、先解决流复制问题。

三、问题解决

1、流复制问题解决

1)在备库data下创建recovery.conf文件,可以拷贝etc/recovery.done到data/recovery.conf。
2)启动备库数据库服务,查看sys_log日志,判断启动是否正常。
3)查看sys_log日志,提示“缺失replication slot,不能建立流复制”。
4)在新主库创建复制槽(复制槽名称与recovery.conf文件中一致),重启备库数据库服务。
5)流复制恢复正常。

2、节点状态异常解决

1)在主备节点执行pcp_attach_node。

[kingbase@node201 bin]$ ./pcp_attach_node -h 192.168.1.201 -U kingbase -d 0
-h 节点ip(分别为主、备节点ip)
-d debug模式
node-id :0 主库id 、1 备库id(可以通过pool_nodes获取到)。

2)查看执行pcp_attach_node日志信息

2023-09-19 16:52:24: pid 27572: LOG:  received failback request for node_id: 0 from pid [27572] wd_failover_id [0]
2023-09-19 16:52:24: pid 25528: LOG:  new IPC connection received
2023-09-19 16:52:24: pid 25528: LOG:  watchdog received the failover command from local kingbasecluster on IPC interface
2023-09-19 16:52:24: pid 25528: LOG:  watchdog is processing the failover command [FAILBACK_REQUEST] received from local kingbasecluster on IPC interface
2023-09-19 16:52:24: pid 25528: LOG:  forwarding the failover request [FAILBACK_REQUEST] to all alive nodes
2023-09-19 16:52:24: pid 25528: DETAIL:  watchdog cluster currently has 1 connected remote nodes
2023-09-19 16:52:24: pid 25479: LOG:  Kingbasecluster-II parent process has received failover request
2023-09-19 16:52:24: pid 25528: LOG:  new IPC connection received
2023-09-19 16:52:24: pid 25528: LOG:  received the failover command lock request from remote kingbasecluster node "192.168.1.202:9999 Linux node202"
2023-09-19 16:52:24: pid 25528: LOG:  remote kingbasecluster node "192.168.1.202:9999 Linux node202" is requesting to become a lock holder for failover ID: 24
2023-09-19 16:52:24: pid 25528: LOG:  request to become a lock holder is denied to remote kingbasecluster node "192.168.1.202:9999 Linux node202"

3)查看节点状态信息(show pool_nodes),执行pcp_attach_node后节点状态正常。

四、总结
对于KingbaseES V8R3集群异常后,先要解决流复制问题,在流复制正常后,再去处理kingbasecluster异常的问题。

posted @ 2023-09-20 17:32  天涯客1224  阅读(3)  评论(0编辑  收藏  举报