KingbaseES 集群节点管理系列 01 -- repmgr standby clone 与 sys_basebackup 创建备库
案例说明:
KingbaseES V8R6集群可以通过执行'repmgr clone standby' 创建备库,在一些场景,执行clone命令出现故障时,可以通过执行sys_basebackup命令分析和排除故障。本案例详细描述两种方案创建备库的区别。
适用版本:
KingbaseES V8R6
节点信息:
[kingbase@node202 ~]$ cat /etc/hosts
192.168.1.201 node201
192.168.1.202 node202
192.168.1.203 node203 #新增节点
一、创建备库前系统环境准备
两种方案,创建备库前系统环境准备一致,可以参考官方文档关于备库clone部分,这里不再详细描述。
二、repmgr standby clone方案
1、创建流复制
主备集群的基础是主备流复制,需要为新增备库节点创建复制槽,如下所示,是已有备库节点的复制槽信息:
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 4935 | 15461 |
| 2/78000058 |
(1 row)
2、执行'repmgr standby clone'
如下是执行'repmgr standby clone‘的debug日志:
[kingbase@node203 bin]$ ./repmgr standby clone -h 192.168.1.201 -U esrep -d esrep --log-level=debug
[NOTICE] destination directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.168.1.201 user=esrep dbname=esrep
[DETAIL] current installation size is 516 MB
[DEBUG] 2 node records returned by source node
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] creating directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data"...
[INFO] creating replication slot as user "esrep"
[DEBUG] CreateSlotBySQL(): creating slot "repmgr_slot_3" on upstream
[NOTICE] starting backup (using sys_basebackup)...
[HINT] this may take some time; consider using the -c/--fast-checkpoint option
[INFO] executing:
/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data -h 192.168.1.201 -p 54321 -U esrep -X stream -S repmgr_slot_3
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000 fallback_application_name=internal_rwcmgr options=-csearch_path="
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[kingbase@node203 bin]$ ./repmgr standby register --force
[INFO] connecting to local node "node3" (ID: 3)
[DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.203 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=internal_rwcmgr options=-csearch_path="
[ERROR] unable to connect to local node "node3" (ID: 3)
[HINT] to register a standby which is not running, additionally provide the primary connection parameters
如下所示:'repmgr standby clone'实际是调用sys_basebackup执行了备库的clone,并且创建了复制槽'repmgr_slot_3':
3、查看复制槽信息
如下所示,为新的节点自动创建了复制槽'repmgr_slot_3':
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 4935 | 15461 |
| 2/78000058 |
repmgr_slot_3 | | physical | | | f | f | | |
| |
(2 rows)
4、查看数据库配置
如下所示,创建了data/kingbase.auto.conf,并配置了到主库节点的连接串和复制槽相关参数:
[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
.......
primary_conninfo = 'host=192.168.1.201 user=esrep application_name=node3 connect_timeout=10 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000'
primary_slot_name = 'repmgr_slot_3'
5、查看备库标识文件
如下所示,自动创建了备库标识文件:data/standby.signal
[kingbase@node203 bin]$ cd ../data/
[kingbase@node203 data]$ ls -lh standby.signal
-rw------- 1 kingbase kingbase 20 Jun 12 15:33 standby.signal
三、sys_basebackup方案
通过sys_basebackup直接创建备库:
1、创建流复制
主备集群的基础是主备流复制,需要为新增备库节点创建复制槽,如下所示,是已有备库节点的复制槽信息:
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 4935 | 15461 |
| 2/78000058 |
(1 row)
2、执行sys_basebackup
如下所示,出现错误,提示缺失复制槽:
[kingbase@node203 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /home/kingbase/cluster/R6C8/HAC8/kingbase/data -h 192.168.1.201 -p 54321 -U esrep -X stream -S repmgr_slot_3
sys_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "repmgr_slot_3" does not exist
sys_basebackup: error: child process exited with exit code 1
sys_basebackup: removing data directory "/home/kingbase/cluster/R6C8/HAC8/kingbase/data"
在主库创建备节点复制槽:
test=# select sys_create_physical_replication_slot('repmgr_slot_3');
sys_create_physical_replication_slot('repmgr_slot_3')
-------------------------------------------------------
(repmgr_slot_3,)
(1 row)
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalo
g_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+-------+-------
-------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 4935 | 15461 |
| 2/7A000058 |
repmgr_slot_3 | | physical | | | f | f | | |
| |
(2 rows)
执行sys_basebackup:
[kingbase@node203 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_basebackup -l "repmgr base backup"
-D /home/kingbase/cluster/R6C8/HAC8/kingbase/data
-h 192.168.1.201 -p 54321 -U esrep -X stream
-S repmgr_slot_3
可以额外增加的参数:
-c, --checkpoint=fast|spread
set fast or spread checkpointing
-P, --progress show progress information
-v, --verbose output verbose messages
3、查看数据库配置
如下所示,到主库的连接串及复制槽信息,没有自动被修改,只是简单复制了主库的文件内容,内容需要再次编辑:
[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=esrep connect_timeout=10 host=192.168.1.202 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node1'
primary_slot_name = 'repmgr_slot_1'
以下为需要修改内容:
编辑后的内容:
[kingbase@node203 bin]$ cat ../data/kingbase.auto.conf
......
primary_conninfo = 'user=esrep connect_timeout=10 host=192.168.1.201 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node3'
primary_slot_name = 'repmgr_slot_3'
3、备库标识文件
使用sys_basebackup在备库节点不会创建标识 文件:standby.signal,必须在执行clone后,手工创建:
[kingbase@node203 data]$ touch standby.signal
四、总结
对于主备流复制集群创建备库节点,最简单有效的方式是:'repmgr standby clone',但是一些场景clone失败后,需要分析和排除故障时,可以使用sys_basebackup执行备库的创建,创建完成后,注意备库配置文件的修改及备库标识文件standby.signal的创建。
完成以上工作后,就可以启动备库数据库服务,执行备库注册,完成备库的创建,完善集群架构。