KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明:
KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的流程去修改状态。 那么你需要确认一点,到底是主库还是备库的cluster在执行脚本,关键字:failover_stream.sh ,看cluster.log中有没有这个的名称。本案例通过主库系统宕机,分析了failover 切换过程。
适用版本:
KingbaseES V8R3
集群节点状态信息:
# 集群节点状态信息
TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 1 | true | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | standby | 0 | false | 0
(2 rows)
对于KingbaseES V8R3集群failover切换的分析:
1. 可以通过主备库的cluster.log,获取整个主备切换的过程。
2. 从failover.log日志中获取执行failover过程,注意只有执行failover_stream.sh的节点才会产生failover.log。
3. recovery.log日志记录了原主库failover完成后,从主库recovery为新备库的过程。
一、Failover切换过程分析
关闭主库系统:
[root@node101 ~]# shutdown -h now
备库cluster.log日志分析:(主库系统关机,备库调用failover_stream.sh执行failover切换)
1、主备库处于正常状态时,备库收到主库的heartbeat。
2022-08-30 15:38:03: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 0 seconds ago
2022-08-30 15:38:06: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:38:06: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 0 seconds ago
2、备库kingbasecluster检测到主库集群服务和数据库服务shutdown,kingbasecluster发生主备切换,当前的kingbasecluster由standby切换为master。
2022-08-30 15:38:09: pid 17585: LOG: remote node "192.168.1.101:9999 Linux node101" is shutting down
2022-08-30 15:38:09: pid 17585: LOG: watchdog cluster has lost the coordinator node
2022-08-30 15:38:09: pid 17585: LOG: unassigning the remote node "192.168.1.101:9999 Linux node101" from watchdog cluster master
2022-08-30 15:38:09: pid 17585: LOG: We have lost the cluster master node "192.168.1.101:9999 Linux node101"
2022-08-30 15:38:09: pid 17585: LOG: watchdog node state changed from [STANDBY] to [JOINING]
2022-08-30 15:38:09: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:38:09: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 1 seconds ago
2022-08-30 15:38:10: pid 17547: ERROR: failed to authenticate
2022-08-30 15:38:10: pid 17547: DETAIL: the database system is shutting down
2022-08-30 15:38:12: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:38:12: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 4 seconds ago
3、备库kingbasecluster对主库的数据库服务根据:health_check_max_retries * health_check_retry_delay配置尝试连接。
2022-08-30 15:38:13: pid 17547: LOG: health checking retry count 1
2022-08-30 15:38:13: pid 17585: LOG: watchdog node state changed from [JOINING] to [INITIALIZING]
2022-08-30 15:38:14: pid 17585: LOG: I am the only alive node in the watchdog cluster
2022-08-30 15:38:14: pid 17585: HINT: skipping stand for coordinator state
2022-08-30 15:38:14: pid 17585: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2022-08-30 15:38:14: pid 17585: LOG: I am announcing my self as master/coordinator watchdog node
2022-08-30 15:38:15: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:38:15: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 7 seconds ago
2022-08-30 15:38:18: pid 17585: LOG: I am the cluster leader node
2022-08-30 15:38:18: pid 17585: DETAIL: our declare coordinator message is accepted by all nodes
2022-08-30 15:38:18: pid 17585: LOG: setting the local node "192.168.1.102:9999 Linux node102" as watchdog cluster master
4、备库开启升级进程(escalation process),cluster vip漂移到新的master节点。
2022-08-30 15:38:18: pid 17585: LOG: I am the cluster leader node. Starting escalation process
2022-08-30 15:38:18: pid 17585: LOG: escalation process started with PID:22088
2022-08-30 15:38:18: pid 17547: LOG: connect_inet_domain_socket: select() interrupted by certain signal. retrying...
2022-08-30 15:38:18: pid 22088: LOG: watchdog: escalation started
2022-08-30 15:38:18: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:38:18: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 10 seconds ago
2022-08-30 15:38:21: pid 22088: LOG: successfully acquired the delegate IP:"192.168.1.201"
2022-08-30 15:38:21: pid 22088: DETAIL: 'if_up_cmd' returned with success
.......
5、 备库对原主库数据库服务的连接次数超过阈值后,将原主库的数据库服务设置为down状态。
2022-08-30 15:39:20: pid 17547: LOG: health checking retry count 10
2022-08-30 15:39:22: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:39:22: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 73 seconds ago
2022-08-30 15:39:23: pid 17547: LOG: failed to connect to kingbase server on "192.168.1.101:54321", getsockopt() detected error "
No route to host"
2022-08-30 15:39:23: pid 17547: ERROR: failed to make persistent db connection
2022-08-30 15:39:23: pid 17547: DETAIL: connection to host:"192.168.1.101:54321" failed
2022-08-30 15:39:23: pid 17547: LOG: setting backend node 0 status to NODE DOWN
6、 收到本节点kingbasecluster发出的failover request,并获取到failover lock。
2022-08-30 15:39:23: pid 17547: LOG: received degenerate backend request for node_id: 0 from pid [17547]
2022-08-30 15:39:23: pid 17585: LOG: new IPC connection received
2022-08-30 15:39:23: pid 17585: LOG: watchdog received the failover command from local kingbasecluster on IPC interface
2022-08-30 15:39:23: pid 17585: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local
kingbasecluster on IPC interface
2022-08-30 15:39:23: pid 17585: LOG: I am the only kingbasecluster node in the watchdog cluster
2022-08-30 15:39:23: pid 17585: DETAIL: no need to propagate the failover command [DEGENERATE_BACKEND_REQUEST]
2022-08-30 15:39:23: pid 17585: LOG: new IPC connection received
2022-08-30 15:39:23: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
2022-08-30 15:39:23: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to become a lock
holder for failover ID: 0
7、 获取到failover lock后,开始执行failover_stream.sh进行failover的切换。
2022-08-30 15:39:23: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is the lock holder
2022-08-30 15:39:23: pid 17547: LOG: starting degeneration. shutdown host 192.168.1.101(54321)
2022-08-30 15:39:23: pid 17547: LOG: Restart all children
2022-08-30 15:39:23: pid 17547: LOG: execute command: /home/kingbase/cluster/R3HA/kingbasecluster/bin/failover_stream.sh 192.168.
1.102 0 0 192.168.1.101 192.168.1.101 1 0 /home/kingbase/cluster/R3HA/db/data
2022-08-30 15:39:25: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:39:25: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 76 seconds ago
ssh: connect to host 192.168.1.101 port 22: No route to host
2022-08-30 15:39:31: pid 17612: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2022-08-30 15:39:31: pid 17612: DETAIL: the last heartbeat from "192.168.1.101:9999" received 82 seconds ago
2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FAIL
OVER] lock for failover ID 0
8、执行failover切换,并查找新的primary。(当前备库切换为primary)
2022-08-30 15:39:33: pid 17547: LOG: find_primary_node_repeatedly: waiting for finding a primary node
2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: checking backend no 0
2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: checking backend no 1
2022-08-30 15:39:33: pid 17547: LOG: find_primary_node: primary node id is 1
2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to release [FOLL
OW MASTER] lock for failover ID 0
2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" has released the [FOLLOW MASTE
R] lock for failover ID 0
2022-08-30 15:39:33: pid 17547: LOG: failover: set new primary node: 1
2022-08-30 15:39:33: pid 17547: LOG: failover: set new master node: 1
2022-08-30 15:39:33: pid 21501: LOG: worker process received restart request
2022-08-30 15:39:33: pid 17585: LOG: new IPC connection received
2022-08-30 15:39:33: pid 17585: LOG: received the failover command lock request from local kingbasecluster on IPC interface
2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" is requesting to resign from a
lock holder for failover ID 0
2022-08-30 15:39:33: pid 17585: LOG: local kingbasecluster node "192.168.1.102:9999 Linux node102" has resigned from the lock hol
der
failover done. shutdown host 192.168.1.101(54321)2022-08-30 15:39:33: pid 17547: LOG: failover done. shutdown host 192.168.1.101(
54321)
备库Failover.log分析:
1、执行failover_stream.sh进行failover切换
-----------------2022-08-30 15:39:23 failover beging---------------------------------------
----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/R3HA/db/data].
Tips:
1)new_master : 192.168.1.102
2) new_master_id: 1
3) old_primary: 192.168.1.101
4) old_primary_id : 0
2、ping 网关测试网络连通性
----ping trust ip
ping trust ip 192.168.1.1 success
3、ssh连接远程原主库,执行关闭数据库服务和释放db vip;并将db vip加载到当前主库节点。
----determine whether the faulty db is master or standby
master down, let 192.168.1.102 become new primary.....
2022-08-30 15:39:25 del old primary VIP on 192.168.1.101
ssh connect host:192.168.1.101 failed, can not stop old primary db and del old primary db vip
2022-08-30 15:39:29 add VIP on 192.168.1.102
ADD VIP NOW AT 2022-08-30 15:39:30 ON enp0s3
execute: [/sbin/ip addr add 192.168.1.200/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/R3HA/db/bin/arping -U 192.168.1.200 -I enp0s3 -w 1
ARPING 192.168.1.200 from 192.168.1.200 enp0s3
Sent 1 probes (1 broadcast(s))
Received 0 response(s)
2022-08-30 15:39:31 promote begin...let 192.168.1.102 become master
4、通过连接db vip执行“select 33333”测试当前数据库服务状态,如果正常,执行promote,提升为新主库,将同步模式设为async。
check db if is alive
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
2022-08-30 15:39:31 kingbase is ok , to prepare execute promote
execute promote
server promoting
check db if is alive after promote
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
2022-08-30 15:39:32 after execute promote , kingbase status is ok.
after execute promote, kingbase is ok.
2022-08-30 15:39:32 sync to async
ALTER SYSTEM
SYS_RELOAD_CONF
-----------------
t
(1 row)
5、连接新主库数据库服务,如果成功,在新主库上执行checkpoint,checkpoint成功,则failover切换完成。
2022-08-30 15:39:32 make checkpoint
check the db to see if it is alive
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
2022-08-30 15:39:32 kingbase is ok , to prepare execute checkpoint
execute checkpoint
CHECKPOINT
check the db to see if it is alive after execute checkpoint
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
2022-08-30 15:39:33 after execute checkpoint, kingbase is ok.
after execute checkpoint, kingbase is ok.
-----------------2022-08-30 15:39:33 failover end---------------------------------------
二、原主库recovery为新备库
1、在原主库生成recovery.conf文件(系统通过crond计划任务,调用network_rewind.sh脚本自动recovery为备库)
[kingbase@node101 data]$ cp ../etc/recovery.done ./recovery.conf
2、查看recovery.log日志信息
[kingbase@node101 log]$ more recovery.log
---------------------------------------------------------------------
1)开启recovery,检测存储读写功能是否正常。
2022-08-30 16:42:01 recover beging...
my pid is 11927,officially began to perform recovery
2022-08-30 16:42:01 check read/write on mount point
2022-08-30 16:42:01 check read/write on mount point (1 / 6).
2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ...
2022-08-30 16:42:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ... OK
2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
2022-08-30 16:42:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
2022-08-30 16:42:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
2022-08-30 16:42:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ...
2022-08-30 16:42:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_574940445" ... OK
2022-08-30 16:42:01 success to check read/write on mount point (1 / 6).
2022-08-30 16:42:01 check read/write on mount point ... ok
2)ping 网关检测网络的连通性
2022-08-30 16:42:01 check if the network is ok
ping trust ip 192.168.1.1 success ping times :[3], success times:[2]
3)判断当前节点role(master OR standby),如果是standby节点,并判断是否需要recovery。( I,m status is [1] (1 is down), I will be in recovery.)
determine if i am master or standby
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | down | 0.500000 | standby | 0 | false | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
(2 rows)
i am standby in cluster,determine if recovery is needed
2022-08-30 16:42:03 now will del vip [192.168.1.200/24]
but no 192.168.1.200/24 on my DEV, nothing to do with del
ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[1/10]
......
ksql execute success,but node:node101 does not have correct streaming(or catchup) replication ,will retry ,retry times:[10/10]
primary node/Im node status is changed, primary ip[192.168.1.102], recovery.conf NEED_CHANGE [0] (0 is need ), I,m status is [1] (1 is down), I will be in recovery.
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | down | 0.500000 | standby | 0 | false | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
(2 rows)
4)如果需要做recovery,执行sys_rewind。
if recover node up, let it down , for rewind
2022-08-30 16:42:34 sys_rewind...
sys_rewind --target-data=/home/kingbase/cluster/R3HA/db/data --source-server="host=192.168.1.102 port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST"
datadir_source = /home/kingbase/cluster/R3HA/db/data
rewinding from last common checkpoint at 0/66000028 on timeline 15
find last common checkpoint start time from 2022-08-30 16:42:34.727711 CST to 2022-08-30 16:42:34.839545 CST, in "0.111834" seconds.
reading source file list
reading target file list
reading WAL in target
Rewind datadir file from source
update the control file: minRecoveryPoint is '0/68024E80', minRecoveryPointTLI is '16', and database state is 'in archive recovery'
rewind start wal location 0/66000028 (file 0000000F0000000000000066), end wal location 0/68024E80 (file 000000100000000000000068). time from 2022-08-30 16:42:36.727711 CST to 2022-08-30 16:42:37.201678 CST, in "2.473967" seconds.
Done!
5)sys_rewind成功,修改集群配置文件。
sed conf change #synchronous_standby_names
2022-08-30 16:42:39 file operate
cp recovery.conf...
change recovery.conf ip -> primary.ip
2022-08-30 16:42:39 no need change recovery.conf, primary node is 192.168.1.102
6)启动新备库数据库服务,删除原复制槽并重建。备库数据库服务启动成功后,配置同步为sync。
delete pid file if exist
del the replication_slots if exis
drop the slot [slot_node101].
drop the slot [slot_node102].
2022-08-30 16:42:39 start up the kingbase...
waiting for server to start....LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log".
done
server started
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10" -c "select 33333;"
SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
(slot_node101,)
(1 row)
2022-08-30 16:42:41 create the slot [slot_node101] success.
SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
(slot_node102,)
(1 row)
2022-08-30 16:42:41 create the slot [slot_node102] success.
2022-08-30 16:42:41 start up standby successful!
cluster is sync cluster.
SYNC RECOVER MODE ...
2022-08-30 16:42:41 remote primary node change sync
ALTER SYSTEM
SYS_RELOAD_CONF
-----------------
t
(1 row)
SYNC RECOVER MODE DONE
7)执行pcp_attach_node将当前节点加入到集群。
2022-08-30 16:42:44 attach pool...
IM Node is 0, will try [pcp_attach_node -U kingbase -W MTIzNDU2 -h 192.168.1.201 -n 0]
pcp_attach_node -- Command Successful
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | standby | 0 | false | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
(2 rows)
2022-08-30 16:42:45 attach end..
recovery success,exit script with success
三、总结
通过对原主库执行系统重启服务,如果备库在阈值范围内尝试连接主库数据库服务失败,其failover切换过程和主库系统shutdown是一致的。