KingbaseES V8R6集群案例---一主二备架构单个备库宕机事务影响测试
案例说明:
对于KingbaseES V8R6集群,在sync模式下,对于一主一备架构,如果备库宕机时,主库事务commit,会被hang住,在wal_sender_timeout参数(默认60s)超过阈值后,sync转为async后,事务才能完成commit;本案例在测试一主二备架构下,同步备库宕机后,主库事务处理是否有hang住现象。
适用版本:
KingbaseES V8R6
集群架构:
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 25697 | no | n/a
2 | node2 | standby | running | node1 | running | 582 | no | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 12571 | no | 1 second(s) ago
测试脚本:
[kingbase@node101 shell]$ cat 1.sh
#!/bin/bash
while true
do
/home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/bin/ksql -U system prod <<EOF
insert into test1 values (now());
EOF
done
[kingbase@node101 shell]$ cat 2.sh
#!/bin/bash
while true
do
/home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/bin/ksql -U system prod <<EOF
select * from test1;
EOF
done
一、测试一(synchronous='sync')
1、流复制同步配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
synchronous='sync'
[kingbase@node101 bin]$ cat /data/kingbase/hac7/data/es_rep.conf |grep _commit
synchronous_commit = remote_apply
2、集群流复制状态
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_
start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush
_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+-----------------
--------------+--------------+-----------+------------+------------+------------+------------+-----------+------
-----+------------+---------------+------------+-------------------------------
2840 | 16385 | esrep | node2 | 192.168.1.102 | | 44199 | 2023-06-21 14:47
:49.668343+08 | | streaming | 1/A5000680 | 1/A5000680 | 1/A5000680 | 1/A5000680 | |
| | 1 | sync | 2023-06-21 16:51:00.439757+08
2852 | 16385 | esrep | node3 | 192.168.1.103 | | 62335 | 2023-06-21 14:47
:51.035563+08 | | streaming | 1/A5000680 | 1/A5000680 | 1/A5000680 | 1/A5000680 | |
| | 2 | potential | 2023-06-21 16:50:54.103476+08
(2 rows)
---如上所示,node2为流复制中的sync节点。
3、主库执行测试脚本
session 1:
[kingbase@node101 shell]$ sh 1.sh &
session 2:
[kingbase@node101 shell]$ sh 2.sh >1.log 2>&1
4、关闭同步备库数据库服务
node2节点宕机时间:
2023-06-25 13:52:01.037 CST,,,11890,,6497d534.2e72,5,,2023-06-25 13:48:36 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,"
5、查看数据插入记录
如下所示,在同步备库宕机的时间点,并没有造成事务处理的中断(事务被hang住):
.......
2023-06-25 13:52:00.907013
2023-06-25 13:52:00.937465
2023-06-25 13:52:00.966735
2023-06-25 13:52:00.995530
2023-06-25 13:52:01.026226
2023-06-25 13:52:01.055470
2023-06-25 13:52:01.082933
2023-06-25 13:52:01.168686
2023-06-25 13:52:01.238983
2023-06-25 13:52:01.324240
2023-06-25 13:52:01.374047
2023-06-25 13:52:01.427150
2023-06-25 13:52:01.464301
2023-06-25 13:52:01.519301
2023-06-25 13:52:01.573772
2023-06-25 13:52:01.628080
2023-06-25 13:52:01.735475
2023-06-25 13:52:01.804554
2023-06-25 13:52:01.891160
2023-06-25 13:52:01.957588
2023-06-25 13:52:02.070638
2023-06-25 13:52:02.166344
2023-06-25 13:52:02.219404
2023-06-25 13:52:02.264405
2023-06-25 13:52:02.304738
.....
二、测试二(synchronous='quorum')
1、流复制同步配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
synchronous='quorum'
[kingbase@node101 bin]$ cat /data/kingbase/hac7/data/es_rep.conf |grep _commit
synchronous_commit = remote_apply
2、集群流复制状态
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_
start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush
_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+-----------------
--------------+--------------+-----------+------------+------------+------------+------------+-----------+------
-----+------------+---------------+------------+-------------------------------
4598 | 16385 | esrep | node2 | 192.168.1.102 | | 41421 | 2023-06-21 14:36
:45.278823+08 | | streaming | 1/A30009F8 | 1/A30009F8 | 1/A30009F8 | 1/A30009F8 | |
| | 1 | quorum | 2023-06-21 15:16:03.583408+08
8945 | 16385 | esrep | node3 | 192.168.1.103 | | 61295 | 2023-06-21 15:14
:39.573214+08 | | streaming | 1/A30009F8 | 1/A30009F8 | 1/A30009F8 | 1/A30009F8 | |
| | 1 | quorum | 2023-06-21 15:16:02.032845+08
(2 rows)
3、主库执行测试脚本
session 1:
[kingbase@node101 shell]$ sh 1.sh &
session 2:
[kingbase@node101 shell]$ sh 2.sh >1.log 2>&1
4、关闭同步备库数据库服务
node2节点宕机时间:
2023-06-25 14:04:55.740 CST,,,18910,,6497d80f.49de,5,,2023-06-25 14:00:47 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,"
5、查看数据插入记录
如下所示,在同步备库宕机的时间点,并没有造成事务处理的中断(事务被hang住):
........
2023-06-25 14:04:54.955449
2023-06-25 14:04:54.986434
2023-06-25 14:04:55.027062
2023-06-25 14:04:55.058955
2023-06-25 14:04:55.092413
2023-06-25 14:04:55.123352
2023-06-25 14:04:55.157153
2023-06-25 14:04:55.193440
2023-06-25 14:04:55.227700
2023-06-25 14:04:55.264961
2023-06-25 14:04:55.301582
2023-06-25 14:04:55.336236
2023-06-25 14:04:55.380388
2023-06-25 14:04:55.419770
2023-06-25 14:04:55.462793
2023-06-25 14:04:55.496123
2023-06-25 14:04:55.530339
2023-06-25 14:04:55.563342
2023-06-25 14:04:55.597350
2023-06-25 14:04:55.632114
2023-06-25 14:04:55.659785
2023-06-25 14:04:55.695025
2023-06-25 14:04:55.726024
2023-06-25 14:04:55.754980
2023-06-25 14:04:55.788829
2023-06-25 14:04:55.818428
2023-06-25 14:04:55.850334
2023-06-25 14:04:55.886098
2023-06-25 14:04:55.916974
2023-06-25 14:04:55.949975
2023-06-25 14:04:55.985593
2023-06-25 14:04:56.017278
2023-06-25 14:04:56.062645
2023-06-25 14:04:56.094456
2023-06-25 14:04:56.129149
.......
三、测试三(synchronous='all')
1、流复制同步配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
synchronous='all'
[kingbase@node101 bin]$ cat /data/kingbase/hac7/data/es_rep.conf |grep _commit
synchronous_commit = remote_apply
2、集群流复制状态
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend
_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flus
h_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+------------+------------+------------+------------+-----------+-----
------+------------+---------------+------------+-------------------------------
25468 | 16385 | esrep | node2 | 192.168.1.102 | | 42097 | 2023-06-21 16:2
8:11.888318+08 | | streaming | 1/A4000508 | 1/A4000508 | 1/A4000508 | 1/A4000508 | |
| | 1 | sync | 2023-06-21 16:28:51.499723+08
25480 | 16385 | esrep | node3 | 192.168.1.103 | | 61309 | 2023-06-21 16:2
8:13.300779+08 | | streaming | 1/A4000508 | 1/A4000508 | 1/A4000508 | 1/A4000508 | |
| | 1 | sync | 2023-06-21 16:28:54.034186+08
(2 rows)
---如上所示,node2为流复制中的sync节点。
3、主库执行测试脚本
session 1:
[kingbase@node101 shell]$ sh 1.sh &
session 2:
[kingbase@node101 shell]$ sh 2.sh >1.log 2>&1
4、关闭同步备库数据库服务
node2节点宕机时间:
2023-06-25 14:12:06.618 CST,,,24878,,6497da69.612e,5,,2023-06-25 14:10:49 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,"
5、查看数据插入记录
如下所示,在同步备库宕机的时间点,并没有造成事务处理的中断(事务被hang住):
......
2023-06-25 14:12:06.910576
2023-06-25 14:12:06.944394
2023-06-25 14:12:06.977977
2023-06-25 14:12:07.006902
2023-06-25 14:12:07.027061
2023-06-25 14:12:07.058800
2023-06-25 14:12:07.090143
2023-06-25 14:12:07.121745
2023-06-25 14:12:07.159289
2023-06-25 14:12:07.192978
2023-06-25 14:12:07.216982
2023-06-25 14:12:07.244713
2023-06-25 14:12:07.275212
2023-06-25 14:12:07.305201
2023-06-25 14:12:07.333306
2023-06-25 14:12:07.365943
2023-06-25 14:12:07.396400
2023-06-25 14:12:07.430124
2023-06-25 14:12:07.457527
2023-06-25 14:12:07.487357
2023-06-25 14:12:07.518364
2023-06-25 14:12:07.551942
2023-06-25 14:12:07.580497
2023-06-25 14:12:07.615095
2023-06-25 14:12:07.642226
2023-06-25 14:12:07.667587
2023-06-25 14:12:07.705081
2023-06-25 14:12:07.738520
2023-06-25 14:12:07.777657
2023-06-25 14:12:07.816077
2023-06-25 14:12:07.847995
2023-06-25 14:12:07.888306
2023-06-25 14:12:07.917725
2023-06-25 14:12:07.952045
2023-06-25 14:12:07.984344
2023-06-25 14:12:08.021621
2023-06-25 14:12:08.051968
2023-06-25 14:12:08.082699
2023-06-25 14:12:08.114964
2023-06-25 14:12:08.145950
2023-06-25 14:12:08.175424
......
四、总结
在一主多备的架构下,同步流复制模式中,当同步备库节点宕机时,不会影响主库事务操作的连续性,建议对事务处理性能要求高的业务环境,采用一主多备的集群架构。如Oracle的DataGuard,在最大保护模式,建议采用一主多备的架构。