KingbaseES V8R3集群运维案例之---failover切换后节点均处于standby
案例说明:
KingbaseES V8R3集群,一主一备架构,在主库数据库服务down,failover切换后,原主和新主集群节点状态均为standby,并且无流复制。
适用版本:
KingbaseES V8R3
一、问题现象
一主一备集群架构,在主库数据库服务down后,触发failover切换,通过failover.log日志查看,切换成功。但是查看两个节点的状态都是‘standby’,并且没有流复制。
如下图所示,集群节点状态:
如下图所示,failover切换过程:
二、问题分析
1、从failover.log及新主库可以进行读写,初步断定切换已经完成,但是状态信息没有在kingbasecluster中被更新,可以通过pcp_attach_node更新节点状态信息。
2、流复制问题,需要将原主库恢复为备库后,再创建流复制。
3、先解决流复制问题。
三、问题解决
1、流复制问题解决
1)在备库data下创建recovery.conf文件,可以拷贝etc/recovery.done到data/recovery.conf。
2)启动备库数据库服务,查看sys_log日志,判断启动是否正常。
3)查看sys_log日志,提示“缺失replication slot,不能建立流复制”。
4)在新主库创建复制槽(复制槽名称与recovery.conf文件中一致),重启备库数据库服务。
5)流复制恢复正常。
2、节点状态异常解决
1)在主备节点执行pcp_attach_node。
[kingbase@node201 bin]$ ./pcp_attach_node -h 192.168.1.201 -U kingbase -d 0
-h 节点ip(分别为主、备节点ip)
-d debug模式
node-id :0 主库id 、1 备库id(可以通过pool_nodes获取到)。
2)查看执行pcp_attach_node日志信息
2023-09-19 16:52:24: pid 27572: LOG: received failback request for node_id: 0 from pid [27572] wd_failover_id [0]
2023-09-19 16:52:24: pid 25528: LOG: new IPC connection received
2023-09-19 16:52:24: pid 25528: LOG: watchdog received the failover command from local kingbasecluster on IPC interface
2023-09-19 16:52:24: pid 25528: LOG: watchdog is processing the failover command [FAILBACK_REQUEST] received from local kingbasecluster on IPC interface
2023-09-19 16:52:24: pid 25528: LOG: forwarding the failover request [FAILBACK_REQUEST] to all alive nodes
2023-09-19 16:52:24: pid 25528: DETAIL: watchdog cluster currently has 1 connected remote nodes
2023-09-19 16:52:24: pid 25479: LOG: Kingbasecluster-II parent process has received failover request
2023-09-19 16:52:24: pid 25528: LOG: new IPC connection received
2023-09-19 16:52:24: pid 25528: LOG: received the failover command lock request from remote kingbasecluster node "192.168.1.202:9999 Linux node202"
2023-09-19 16:52:24: pid 25528: LOG: remote kingbasecluster node "192.168.1.202:9999 Linux node202" is requesting to become a lock holder for failover ID: 24
2023-09-19 16:52:24: pid 25528: LOG: request to become a lock holder is denied to remote kingbasecluster node "192.168.1.202:9999 Linux node202"
3)查看节点状态信息(show pool_nodes),执行pcp_attach_node后节点状态正常。
四、总结
对于KingbaseES V8R3集群异常后,先要解决流复制问题,在流复制正常后,再去处理kingbasecluster异常的问题。