KingbaseES 集群运维典型案例 03 --“双主” sys_rewind恢复机制

案例说明:
主库主机系统重启,触发failover切换,原主库系统启动后,数据库服务被启动,出现“双主”,使用“repmgr node rejoin --force-rewind”恢复集群。通过本案例了解,集群“双主”产生的原因及解决方案,并熟悉sys_rewind在集群恢复中的应用。

数据库版本:
KingbaseES V8R6

一、集群架构
如下所示,集群原节点状态:

[kingbase@node202 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                         
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 17       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 17       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

二、主库系统重启
[root@node201 ~]# reboot

三、主库重启后集群状态
如下所示,原主库主机系统重启后,数据库服务自动启动,导致出现“双主”:

1、failover后集群节点状态

[kingbase@node202 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                         
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 100      |          |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | primary | * running |          | default  | 100      | 18       |         | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

[WARNING] following issues were detected
  - unable to connect to node "node1" (ID: 1)

2、查看节点上timeline变化

如下所示,新主库触发promote后,timeline为(18=1*16^1 + 2 * 16^0),原主库的timeline为(17):

新主库:
[kingbase@node202 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 10:57 0000001100000000000000EC.partial
-rw------- 1 kingbase kingbase 694 Jan 26 09:41 00000011.history
-rw------- 1 kingbase kingbase 16M Jan 31 11:08 0000001200000000000000EC
-rw------- 1 kingbase kingbase 738 Jan 31 11:07 00000012.history

原主库:
[kingbase@node201 bin]$ ls -lh ../data/sys_wal
-rw------- 1 kingbase kingbase 16M Jan 31 10:52 0000001100000000000000EB
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000EC
-rw------- 1 kingbase kingbase 16M Jan 31 11:06 0000001100000000000000ED
-rw------- 1 kingbase kingbase 694 Jan  3 16:27 00000011.history
drwx------ 2 kingbase kingbase 16K Jan 31 11:06 archive_status

四、模拟“双主”故障

1、触发failover切换

# 集群参数配置
[kingbase@node201 bin]$ cat ../etc/repmgr.conf |grep -E 'recovery|failover'
failover='automatic'
recovery='manual'

# 关闭主库数据库服务
[kingbase@node201 bin]$ ./sys_ctl stop -D ../data/
主库数据库服务被关闭后,将触发集群主备failover切换,在参数recovery=manual情况下,原主库将需要人工参与恢复为备库加入集群。

2、误启动原主库数据库服务
如下所示,如果需要人工将原主库恢复为新的备库加入到集群,需要在原主库data目录下创建standby.signal文件后,再启动数据库服务,数据库服务将启动到备库状态,如果在没有创建standby.signal标识文件下,直接启动原主库数据库服务,将直接进入主节点模式。

[kingbase@node201 bin]$ ./sys_ctl start -D ../data/

2、查看数据库服务进程
如下所示,原主库启动后,数据库服务进程启动为主库,集群出现“双主”。

[kingbase@node201 bin]$ ps -ef |grep kingbase

kingbase  4137     1  0 11:14 ?        00:00:00 /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/kingbase -D ../data
kingbase  4138  4137  0 11:14 ?        00:00:00 kingbase: logger
kingbase  4140  4137  0 11:14 ?        00:00:00 kingbase: checkpointer
kingbase  4141  4137  0 11:14 ?        00:00:00 kingbase: background writer
kingbase  4142  4137  0 11:14 ?        00:00:00 kingbase: walwriter
kingbase  4143  4137  0 11:14 ?        00:00:00 kingbase: autovacuum launcher
kingbase  4144  4137  0 11:14 ?        00:00:00 kingbase: archiver   last was 0000001100000000000000EC
kingbase  4145  4137  0 11:14 ?        00:00:00 kingbase: stats collector
kingbase  4146  4137  0 11:14 ?        00:00:00 kingbase: kwr collector
kingbase  4147  4137  0 11:14 ?        00:00:00 kingbase: ksh writer
kingbase  4148  4137  0 11:14 ?        00:00:00 kingbase: ksh collector
kingbase  4149  4137  0 11:14 ?        00:00:00 kingbase: logical replication launcher

# 数据库状态
test=# select sys_is_in_recovery();
 sys_is_in_recovery()
----------------------
 f
(1 row)

3、查看数据库日志

[kingbase@node201 sys_log]$ tail -1000 kingbase-2024-01-31_111429.csv
2024-01-31 11:14:29.129 CST,,,4137,,65b9bb15.1029,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2024-01-31 11:14:29.149 CST,,,4139,,65b9bb15.102b,1,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system was shut down at 2024-01-31 11:06:32 CST",,,,,,,,,""
2024-01-31 11:14:29.164 CST,,,4137,,65b9bb15.1029,2,,2024-01-31 11:14:29 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""
2024-01-31 11:14:31.775 CST,"esrep","esrep",4156,"192.168.1.201:32776",65b9bb17.103c,1,"ALTER SYSTEM",2024-01-31 11:14:31 CST,4/21,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = ''",,,"internal_rwcmgr"
2024-01-31 11:14:31.783 CST,,,4137,,65b9bb15.1029,3,,2024-01-31 11:14:29 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:14:31.784 CST,,,4137,,65b9bb15.1029,4,,2024-01-31 11:14:29 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to """"",,,,,,,,,""

4、集群节点状态
如下所示,集群出现“双主”状态:

[kingbase@node202 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                         
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 100      |          |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | primary | * running |          | default  | 100      | 18       |         | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

[WARNING] following issues were detected
  - unable to connect to node "node1" (ID: 1)

五、恢复集群
如下所示,将原主库通过node rejoin方式作为备库加入到集群,以下为日志信息:

1、执行‘repmgr node rejoin’

[kingbase@node201 bin]$ ./repmgr node rejoin  -h 192.168.1.202 -U esrep -d esrep --force-rewind
......
[NOTICE] rejoin target is node "node2" (ID: 2)
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=internal_rwcmgr options=-csearch_path="
[DEBUG] local tli: 17; local_xlogpos: 0/EE000028; follow_target_history->tli: 17; follow_target_history->end: 0/EC000FA0
[NOTICE] sys_rewind execution required for this node to attach to rejoin target node 2
[DETAIL] rejoin target server's timeline 18 forked off current database system timeline 17 before current recovery point 0/EE000028
[NOTICE] executing sys_rewind
[DETAIL] sys_rewind command is "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' --source-server='host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"
sys_rewind: servers diverged at WAL location 0/EC000FA0 on timeline 17
sys_rewind: rewinding from last common checkpoint at 0/EB000088 on timeline 17
sys_rewind: find last common checkpoint start time from 2024-01-31 11:18:07.003968 CST to 2024-01-31 11:18:07.204522 CST, in "0.200554" seconds.
sys_rewind: collect the number of WAL files to be processed:3, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.320593 CST, cost "0.115839" seconds.
sys_rewind: read WAL in target server from 0/EB000088 to 0/EE0000A0 (endpoint 0/EE0000A0)
sys_rewind: read the local Wal file information, start time from 2024-01-31 11:18:07.204754 CST to 2024-01-31 11:18:07.321418 CST, cost "0.000793" seconds.
sys_rewind: file replication start time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:07.827427 CST, cost "0.505974" seconds.
sys_rewind: update the control file: minRecoveryPoint is '0/EC013BE0', minRecoveryPointTLI is '18', and database state is 'in archive recovery'
sys_rewind: we will remove the dir '/home/kingbase/cluster/R6C8/HAC8/kingbase/data/sys_replslot/repmgr_slot_2.rewind' and all the file/dir in it.
sys_rewind: rewind start wal location 0/EB000058 (file 0000001100000000000000EB), end wal location 0/EC013BE0 (file 0000001200000000000000EC). wal data increment:16462(kB). time from 2024-01-31 11:18:07.321453 CST to 2024-01-31 11:18:08.543478 CST, in "1.539510" seconds.
sys_rewind: Done!
[NOTICE] 0 files copied to /home/kingbase/cluster/R6C8/HAC8/kingbase/data
[INFO] creating replication slot as user "esrep"
[DEBUG] CreateSlotBySQL(): creating slot "repmgr_slot_1" on upstream
[NOTICE] setting node 1's upstream to node 2
[WARNING] unable to ping "host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2024-01-31 11:18:08.556667
[NOTICE] starting server using "/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6C8/HAC8/kingbase/data' -l /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/logfile start"
[NOTICE] start server finish at 2024-01-31 11:18:09.265062
[NOTICE] NODE REJOIN successful
[DETAIL] node 1 is now attached to node 2

2、rewind对比原主库和新主库history文件
如下所示,sys_rewind通过对比源和目标库的history文件,查找节点之间的分叉点。

原主库:
[kingbase@node201 sys_wal]$ cat 00000011.history
1       0/180000A0      no recovery target specified
2       0/190000A0      no recovery target specified
3       0/250000A0      no recovery target specified
4       0/260000A0      no recovery target specified
5       0/AB0000A0      no recovery target specified
6       0/AD0000A0      no recovery target specified 
7       0/B10000A0      no recovery target specified
8       0/B30000A0      no recovery target specified
9       0/B60000A0      no recovery target specified
10      0/B70000A0      no recovery target specified
11      0/BE0000A0      no recovery target specified
12      0/BF0000A0      no recovery target specified
13      0/C00000A0      no recovery target specified
14      0/C10000A0      no recovery target specified
15      0/C90000A0      no recovery target specified
16      0/CA0000A0      no recovery target specified

新主库:
[kingbase@node202 bin]$ cat ../data/sys_wal/00000012.history
1       0/180000A0      no recovery target specified
2       0/190000A0      no recovery target specified
3       0/250000A0      no recovery target specified
4       0/260000A0      no recovery target specified
5       0/AB0000A0      no recovery target specified
6       0/AD0000A0      no recovery target specified
7       0/B10000A0      no recovery target specified
8       0/B30000A0      no recovery target specified
9       0/B60000A0      no recovery target specified
10      0/B70000A0      no recovery target specified
11      0/BE0000A0      no recovery target specified
12      0/BF0000A0      no recovery target specified
13      0/C00000A0      no recovery target specified
14      0/C10000A0      no recovery target specified
15      0/C90000A0      no recovery target specified
16      0/CA0000A0      no recovery target specified
17      0/EC000FA0      no recovery target specified  --history分叉点

源库和目标库的分叉点:

如下所示,在rewind过程中,源库和目标库之间通过history文件获取分叉点,然后从分叉点之前最近的checkpoint开始对wal日志执行recovery(必须保证从checkpoint开始,源库上的wal日志必须存在,否则将会因为缺失wal日志,rewind失败。):

3、查看新主库数据库日志

2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,1,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: IDENTIFY_SYSTEM",,,,,,,,,"node1"
2024-01-31 11:18:09.261 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,2,"idle",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"received replication command: START_REPLICATION SLOT ""repmgr_slot_1"" 0/EC000000 TIMELINE 18",,,,,,,,,"node1"
2024-01-31 11:18:11.237 CST,"esrep","esrep",3552,"192.168.1.202:49745",65b9b688.de0,3,"ALTER SYSTEM",2024-01-31 10:55:04 CST,2/4414,0,LOG,XX000,"attention:user esrep is modifying synchronous_standby_names by ALTER SYSTEM SET statement",,,,,,"ALTER SYSTEM SET synchronous_standby_names = 'ANY 1(node1)'",,,"internal_rwcmgr"
2024-01-31 11:18:11.239 CST,,,2823,,65b9b66f.b07,25,,2024-01-31 10:54:39 CST,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,""
2024-01-31 11:18:11.241 CST,,,2823,,65b9b66f.b07,26,,2024-01-31 10:54:39 CST,,0,LOG,00000,"parameter ""synchronous_standby_names"" changed to ""ANY 1(node1)""",,,,,,,,,""
2024-01-31 11:18:11.402 CST,"esrep","",13596,"192.168.1.201:32307",65b9bbf1.351c,3,"streaming 0/EC015300",2024-01-31 11:18:09 CST,4/0,0,LOG,00000,"standby ""node1"" is now a candidate for quorum synchronous standby",,,,,,,,,"node1"

4、集群恢复完成

[kingbase@node202 bin]$ ./repmgr cluster show


 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                         
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 17       | 0 bytes | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | primary | * running |          | default  | 100      | 18       |         | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

六、总结
sys_rewind检查源和目标集群的时间线历史以确定它们的分叉点,从分叉点之前最近的checkpoint位置开始解析WAL,解析出变动的数据页,然后将变动的数据页拷贝过来,并从分叉点最近的checkpoint开始应用wal日志,最终保证源库和目标库数据一致。
sys_rewind 的使用不限于故障转移,例如,可以提升备用服务器主库,运行一些写入事务,然后重新回滚再次成为备用服务器。

posted @ 2024-07-26 11:05  KINGBASE研究院  阅读(6)  评论(0编辑  收藏  举报