KingbaseES 集群节点管理系列 01 -- repmgr standby clone 与 sys_basebackup 创建备库

案例说明:
KingbaseES V8R6集群可以通过执行'repmgr clone standby' 创建备库,在一些场景,执行clone命令出现故障时,可以通过执行sys_basebackup命令分析和排除故障。本案例详细描述两种方案创建备库的区别。

适用版本:
KingbaseES V8R6

节点信息:

[kingbase@node202 ~]$ cat /etc/hosts

192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203   #新增节点

一、创建备库前系统环境准备
两种方案,创建备库前系统环境准备一致,可以参考官方文档关于备库clone部分,这里不再详细描述。

二、repmgr standby clone方案

1、创建流复制
主备集群的基础是主备流复制,需要为新增备库节点创建复制槽,如下所示,是已有备库节点的复制槽信息:

test=# select * from sys_replication_slots;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin  | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |       4935 | 15461 |
       | 2/78000058  |
(1 row)

2、执行'repmgr standby clone'
如下是执行'repmgr standby clone‘的debug日志:

[kingbase@node203 bin]$ ./repmgr standby clone -h 192.168.1.201 -U esrep -d esrep --log-level=debug
[NOTICE] destination directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.168.1.201 user=esrep dbname=esrep
[DETAIL] current installation size is 516 MB
[DEBUG] 2 node records returned by source node
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] creating directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data"...
[INFO] creating replication slot as user "esrep"
[DEBUG] CreateSlotBySQL(): creating slot "repmgr_slot_3" on upstream
[NOTICE] starting backup (using sys_basebackup)...
[HINT] this may take some time; consider using the -c/--fast-checkpoint option
[INFO] executing:
  /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup"  -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data -h 192.168.1.201 -p 54321 -U esrep -X stream -S repmgr_slot_3
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[kingbase@node203 bin]$ ./repmgr standby register --force
[INFO] connecting to local node "node3" (ID: 3)
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.203 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=internal_rwcmgr options=-csearch_path="
[ERROR] unable to connect to local node "node3" (ID: 3)
[HINT] to register a standby which is not running, additionally provide the primary connection parameters

如下所示:'repmgr standby clone'实际是调用sys_basebackup执行了备库的clone,并且创建了复制槽'repmgr_slot_3':

3、查看复制槽信息
如下所示,为新的节点自动创建了复制槽'repmgr_slot_3':

test=# select * from sys_replication_slots;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin  | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |       4935 | 15461 |
       | 2/78000058  |
 repmgr_slot_3 |        | physical  |        |          | f         | f      |            |       |
       |             |
(2 rows)

4、查看数据库配置
如下所示,创建了data/kingbase.auto.conf,并配置了到主库节点的连接串和复制槽相关参数:

[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
.......
primary_conninfo = 'host=192.168.1.201 user=esrep application_name=node3 connect_timeout=10 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000'
primary_slot_name = 'repmgr_slot_3'

5、查看备库标识文件
如下所示,自动创建了备库标识文件:data/standby.signal

[kingbase@node203 bin]$ cd ../data/
[kingbase@node203 data]$ ls -lh standby.signal
-rw------- 1 kingbase kingbase 20 Jun 12 15:33 standby.signal

三、sys_basebackup方案
通过sys_basebackup直接创建备库:

1、创建流复制
主备集群的基础是主备流复制,需要为新增备库节点创建复制槽,如下所示,是已有备库节点的复制槽信息:

test=# select * from sys_replication_slots;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin  | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |       4935 | 15461 |
       | 2/78000058  |
(1 row)

2、执行sys_basebackup
如下所示,出现错误,提示缺失复制槽:

[kingbase@node203 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup"  -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data -h 192.168.1.201 -p 54321 -U esrep -X stream -S repmgr_slot_3
sys_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "repmgr_slot_3" does not exist
sys_basebackup: error: child process exited with exit code 1
sys_basebackup: removing data directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data"

在主库创建备节点复制槽:

test=# select sys_create_physical_replication_slot('repmgr_slot_3');
 sys_create_physical_replication_slot('repmgr_slot_3')
-------------------------------------------------------
 (repmgr_slot_3,)
(1 row)

test=# select * from sys_replication_slots;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin  | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |       4935 | 15461 |
       | 2/7A000058  |
 repmgr_slot_3 |        | physical  |        |          | f         | f      |            |       |
       |             |
(2 rows)

执行sys_basebackup:

[kingbase@node203 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup"  
-D /home/kingbase/cluster/R6C8/HAC8/kingbase/data 
-h 192.168.1.201 -p 54321 -U esrep -X stream 
-S repmgr_slot_3
可以额外增加的参数:
 -c, --checkpoint=fast|spread
                         set fast or spread checkpointing
 -P, --progress         show progress information
 -v, --verbose          output verbose messages

3、查看数据库配置
如下所示,到主库的连接串及复制槽信息,没有自动被修改,只是简单复制了主库的文件内容,内容需要再次编辑:

[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'user=esrep connect_timeout=10 host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node1'
primary_slot_name = 'repmgr_slot_1'

以下为需要修改内容:

编辑后的内容:

[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
......
primary_conninfo = 'user=esrep connect_timeout=10 host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node3'
primary_slot_name = 'repmgr_slot_3'

3、备库标识文件
使用sys_basebackup在备库节点不会创建标识 文件:standby.signal,必须在执行clone后,手工创建:
[kingbase@node203 data]$ touch standby.signal

四、总结

对于主备流复制集群创建备库节点,最简单有效的方式是:'repmgr standby clone',但是一些场景clone失败后,需要分析和排除故障时,可以使用sys_basebackup执行备库的创建,创建完成后,注意备库配置文件的修改及备库标识文件standby.signal的创建。
完成以上工作后,就可以启动备库数据库服务,执行备库注册,完成备库的创建,完善集群架构。
posted @ 2024-07-26 11:05  KINGBASE研究院  阅读(82)  评论(0编辑  收藏  举报