随笔- 320 文章- 0 评论- 5 阅读- 34799

KingbaseES V8R6集群运维案例之---同城双中心switchover案例

案例说明：
在同城双中心执行switchover在线切换后，双中心架构保持不变。
适用版本：
KingbaseES V8R6

集群架构：

一、切换前集群节点状态
如下所示，切换前集群的主库（Primary）位于同城灾备中心，现在执行switchover在线切换，将主库切换到生产中的node1节点。

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location       | Priority | Timeline | LSN_Lag | Connection string                                                                                                               
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node3    | production     | 100      | 4        | 0 bytes | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | production     | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | primary | * running |          | local_disaster | 100      | 4        |         | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

二、repmgr.conf配置

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep failover
failover='automatic'
failover_need_server_alive='none'

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep sync
sync_in_same_location='0'
synchronous='sync'

三、执行switchover在线切换

如下所示，switchover切换过程：

[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
[WARNING] following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=0
"
[NOTICE] attempting to pause repmgrd on 3 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[INFO] pausing repmgrd on node "node3" (ID 3)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby
[NOTICE] stopping current primary node "node3" (ID: 3)
[NOTICE] issuing CHECKPOINT on node "node3" (ID: 3)
[DETAIL] executing server command "/home/kingbase/cluster/tptc/rh6/kingbase/bin/sys_ctl  -D '/data/kingbase/tptc/rh6/data' -l /home/kingbase/cluster/tptc/rh6/kingbase/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/C000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/C0000A0
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node3" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[INFO] unpausing repmgrd on node "node3" (ID 3)
[INFO] unpause node "node3" (ID 3) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

四、切换后集群节点状态

1、节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location       | Priority | Timeline | LSN_Lag | Connection string                                                                                                               
----+-------+---------+-----------+----------+----------------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | production     | 100      | 5        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | production     | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | standby |   running | node1    | local_disaster | 100      | 4        | 0 bytes | host=192.168.1.103 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

如下图所示：集群主库已经切换为生产中心的node1，灾备中心备库的upstream和生产中心备库的upstream节点都是node1。

2、流复制状态

test=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_
start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_la
g | replay_lag | sync_priority | sync_state |          reply_time
------+----------+---------+------------------+---------------+-----------------+-------------+-----------------
--------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+---------
--+------------+---------------+------------+-------------------------------
 7114 |    16385 | esrep   | node2            | 192.168.1.102 |                 |       38948 | 2023-06-28 10:32
:49.507374+08 |              | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8  |           |
  |            |             1 | sync       | 2023-06-28 10:35:47.016015+08
 8255 |    16385 | esrep   | node3            | 192.168.1.103 |                 |       27536 | 2023-06-28 10:34
:25.418926+08 |              | streaming | 0/C0011A8 | 0/C0011A8 | 0/C0011A8 | 0/C0011A8  |           |
  |            |             2 | potential  | 2023-06-28 10:35:49.245420+08
(2 rows)

---切换后，生产中心node1节点为主库，同中心备库node2，同城灾备中心node3节点连接到生产中心主库为异步备库。