KingbaseES V8R3集群维护案例之--- 在线扩容数据节点
案例说明:
在KingbaseES V8R3集群架构可以由Master和Standby两个管理节点(同时也是数据节点)及多个数据节点组成,在读多写少的应用环境,可以通过扩容数据节点(备库)提升集群读写分离的负载能力,本案例用于kingbaseES V8R3读写分离集群在线对数据节点扩容,主要分为三个步骤。
操作步骤:
1)流复制扩容
2)集群管理扩容
具体步骤:
1、 通过sys_basebackup创建新备库。
2、 将备库加入到Cluster nodes管理,可以用kingbase_monitor.sh一键启停。
3、 主备failover切换测试。
操作系统和数据库版本:
1)操作系统环境
[kingbase@#localhost ~]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
2)数据库适用版本
KingbaeES V8R3
前期准备:
1、 原有节点,主库(192.168.4.127)、备库(192.168.4.129),新增节点(192.168.4.159)。
2、 在新增节点安装相同版本的kingbaseES数据库。
3、 配置新增节点到原节点的ssh信任关系(root、kingbase)。
4、 关闭新增节点防火墙和selinux并启动crond服务。
一、在线增加数据库复制节点
1.1 查看kingbase cluster节点状态
=首先确定原主备库的Cluster和kingbaseES服务状态正常。=
TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replicatio
n_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-----------
--------
0 | 192.168.4.129 | 54321 | up | 0.500000 | standby | 0 | true | 0
1 | 192.168.4.127 | 54321 | up | 0.500000 | primary | 0 | false | 0
(2 rows)
1.2 查看主备复制信息
=保证集群中主备复制状态正常。=
TEST=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-------------
16851 | 10 | SYSTEM | srv129 | 192.168.4.129 | | 52967 | 2020-10-23 15:11:17.836557+08 | | streaming | 0/18000
0D0 | 0/180000D0 | 0/180000D0 | 0/180000D0 | 0 | async
(1 row)
1.3 在线增加新的备库节点
=为保证测试环境和接近于生产环境,在创建备库时,在主库端进行大的事务处理,当备库创建完成后查看是否合主库数据一致。=
在主库进行事务处理(模拟生产库):
prod=# insert into t10 values (generate_series(1,10000000),'tom');
prod=# select count(*) from t10;
count
----------
11000000
(1 row)
1.3.1 在新备库上通过sys_basebackup在线创建备库
1) 创建备库数据存储目录
=注意:目录位置尽量和其他已存在节点的目录位置一致,这样对于后期配置改动较少。=
[kingbase@srv159 v8r3]$ mkdir -p /home/kingbase/cluster/ESHA/db/data
[kingbase@srv159 v8r3]$ chmod 0700 /home/kingbase/cluster/ESHA/db/data
2)通过sys_basebackup创建备库
[kingbase@srv159 v8r3]$sys_basebackup -h 192.168.4.127 -D /home/kingbase/cluster/ESHA/db/data -F p -x -v -P -U SYSTEM -W 123456 -p 54321
事务日志起始于时间点: 0/B20060D0, 基于时间表3
2079583/2079583 kB (100%), 1/1 表空间
transaction log end point: 0/D5C48538
sys_basebackup: base backup completed
1.3.2 配置备库recovery.conf文件
[kingbase@srv159 data]$ pwd
/home/kingbase/cluster/ESHA/db/data
[kingbase@srv159 data]$ cat recovery.conf
standby_mode='on'
primary_conninfo='port=54321 host=192.168.4.127 user=SYSTEM password=MTIzNDU2 application_name=srv159'
recovery_target_timeline='latest'
primary_slot_name ='slot_srv159'
1.3.3 配置备库kingbase.conf文件(部分内容)
# Add settings for extensions here
max_wal_senders=4
wal_keep_segments=256
hot_standby_feedback=on
max_prepared_transactions=100
port=54321
control_file_copy='/home/kingbase/cluster/ESHA/template.bk'
fsync=on
wal_log_hints=on
archive_mode=on
hot_standby=on
wal_level=replica
###synchronous_standby_names='1(srv129)'
archive_dest='/home/kingbase/cluster/ESHA/archivedir/'
max_replication_slots=4
log_directory='/home/kingbase/cluster/ESHA/db/data/sys_log/'
1.3.4 配置所有节点sys_hba.conf文件
# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5
# IPv6 local connections:
host all all ::1/128 md5
host all all ::0/0 md5
# Allow replication connections from localhost, by a user with the
# replication private
host all SYSTEM 192.168.4.127/16 md5
host all SYSTEM 192.168.4.129/16 md5
host all SYSTEM 192.168.4.159/16 md5
host replication SYSTEM 192.168.4.127/16 md5
host replication SYSTEM 192.168.4.129/16 md5
host replication SYSTEM 192.168.4.159/16 md5
=注意:修改已有节点的sys_hba.conf文件后,需要reload数据库进程才能使配置生效。=
[kingbase@srv129 data]$ sys_ctl reload -D /home/kingbase/cluster/ESHA/db/data
服务器进程发出信号
1.3.5 从主库拷贝以下目录和文件到新备库
通过scp远程拷贝以下目录和文件到新增节点相同的位置:
=说明:“/home/kingbase/cluster/ESHA”是集群部署的目录,其中“ESHA”是集群名称。对于archivedir也可以创建空目录。=
[kingbase@#localhost ESHA]$ pwd
/home/kingbase/cluster/ESHA
[kingbase@#localhost ESHA]$ ls -lh
总用量 8.0K
drwxrwxr-x 2 kingbase kingbase 6 10月 30 14:23 archivedir
drwxrwxr-x 7 kingbase kingbase 84 10月 30 17:00 db
drwxrwxr-x 3 kingbase kingbase 181 10月 30 17:23 log
drwxrwxr-x 3 kingbase kingbase 29 10月 30 14:55 run
-rw------- 1 kingbase kingbase 8.0K 10月 31 13:19 template.bk
[kingbase@srv127 db]$ pwd
/home/kingbase/cluster/ESHA/db
[kingbase@srv127 db]$ ls -lh
总用量 20K
drwxrwxr-x 2 kingbase kingbase 4.0K 10月 29 17:29 bin
drwxrwxr-x 2 kingbase kingbase 295 10月 29 17:26 etc
-rw------- 1 kingbase kingbase 151 8月 18 20:13 kingbase.log
drwxrwxr-x 2 kingbase kingbase 4.0K 7月 27 10:41 lib
drwxrwxr-x 9 kingbase kingbase 4.0K 7月 27 10:41 share
=注意:目录和文件拷贝不要拷贝kingbasecluster目录到新增节点,在V8R3的集群中,cluster只支持主备两个节点,新增节点只能是data node。=
拷贝主库/etc/cron.d/KINGBASECRON文件到备库相同目录下:
注意:备库crond服务要被启动
1.3.6 在主库端添加replication_slot
1) 查看已经创建的replication_slot:
TEST=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
slot_srv127 | | physical | | | f | | | | |
slot_srv129 | | physical | | | f | | | | |
(2 rows)
2) 创建新增备库的replication_slot
TEST=# select * from sys_create_physical_replication_slot('slot_srv159');
slot_name | xlog_position
-------------+---------------
slot_srv159 |
(1 row)
3)查看主备复制的replication_slots
TEST=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
slot_srv127 | | physical | | | f | | | | |
slot_srv129 | | physical | | | f | | | | |
slot_srv159 | | physical | | | f | | | | |
(3 rows)
1.3.7 启动备库实例加入主备复制架构
[kingbase@srv159 data]$ sys_ctl start -D /home/kingbase/cluster/ESHA/db/data
正在启动服务器进程
[kingbase@srv159 data]$ 日志: 日志输出重定向到日志收集进程
提示: 后续的日志输出将出现在目录 "/home/kingbase/cluster/ESHA/db/data/sys_log"中.
查看备库进程:
[kingbase@srv159 data]$ ps -ef |grep kingbase
kingbase 5071 1 0 20:01 pts/3 00:00:00 /opt/Kingbase/ES/V8R3/Server/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase 5072 5071 0 20:01 ? 00:00:00 kingbase: logger process
kingbase 5073 5071 90 20:01 ? 00:00:19 kingbase: startup process waiting for 0000000300000000000000D6
kingbase 5077 5071 0 20:01 ? 00:00:00 kingbase: checkpointer process
kingbase 5078 5071 0 20:01 ? 00:00:00 kingbase: writer process
kingbase 5199 5071 0 20:01 ? 00:00:00 kingbase: stats collector process
kingbase 5200 5071 0 20:01 ? 00:00:00 kingbase: wal receiver process
kingbase 5201 5071 0 20:01 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.4.159(62391) idle
1.3.8 查看主备复制信息
1)查看主备复制状态信息
TEST=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-------------
16851 | 10 | SYSTEM | srv129 | 192.168.4.129 | | 52967 | 2020-10-23 15:11:17.836557+08 | | streaming | 0/ED000
000 | 0/ED000000 | 0/ED000000 | 0/ECFFF418 | 0 | async
12202 | 10 | SYSTEM | srv159 | 192.168.4.159 | | 47993 | 2020-10-23 20:10:32.521960+08 | 2059 | streaming | 0/ED000
000 | 0/ED000000 | 0/ED000000 | 0/ECFFF418 | 0 | async
(2 rows)
2)查看数据库数据信息
主库:
prod=# select count(*) from t10;
count
----------
21000000
(1 row)
备库:
prod=# select count(*) from t10;
count
----------
21000000
(1 row)
备注:
1) 在主备状态信息中已经查看到新增节点(srv159)的复制状态信息。
2) 查看主库和备库同步数据一致,说明在创建基础备份期间,备库和主库保持了数据的一致性。
3) 由以上可知,备库创建完成。
二、配置新节点加入集群服务一键启停
2.1 编辑所有节点HAmodule.conf
注意:在KB_ALL_IP参数中加入新的备库节点的ip。
2.2 在新增节点修改HAmodule.conf
注意:将KB_LOCALHOST_IP参数配置为本机ip。
2.3 配置已经存在节点(Cluster的主备库)kingbase_cluster.conf文件
注意:图中黄色部分为新增节点的配置,仔细修改,否则在show pool_nodes时无法发现新增节点。
2.4 修改新增节点etc/recovery.done配置
注意:etc/recovery.done在集群执行failover切换时会读取,如果不修改,读取后的节点名称仍然是原主库节点名称。
2.5 测试kingbase_monitor.sh一键启停集群:
[kingbase@srv129 bin]$ ./kingbase_monitor.sh restart
-----------------------------------------------------------------------
2020-10-29 17:32:33 KingbaseES automation beging...
......................
all started..
...
now we check again
==================================================================
| ip | program| [status]
[ 192.168.4.127]| [kingbasecluster]| [active]
[ 192.168.4.129]| [kingbasecluster]| [active]
[ 192.168.4.127]| [kingbase]| [active]
[ 192.168.4.129]| [kingbase]| [active]
[ 192.168.4.159]| [kingbase]| [active]
==================================================================
注意:从以上可以获得,集群架构为2个节点的cluster,3个节点的data node。
2.6 查看新增节点信息
三、集群切换测试
3.1 测试目标
当原备库数据库服务停止或宕机,在主库服务或宕机时,新的备库切换为主库。
3.2 测试步骤
1、先停止原备库的kingbaseES服务。
2、再停止主库的kingbaseES服务。
3、查看新的备库是否被切换为主库。
3.3 实施主备切换
1、停止原备库的kingbaseES服务
[kingbase@srv129 bin]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data
等待服务器进程关闭 .... 完成
服务器进程已经关闭
2、查看复制状态信息
1)原备库kingbaseES被停止
TEST=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | stat
e | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
------+----------+---------+------------------+---------------+---------------
3387 | 10 | SYSTEM | srv159 | 192.168.4.159 | | 47333 | 2020-10-31 15:21:05.275911+08 | | stream
ing | 1/5603A980 | 1/5603A980 | 1/5603A980 | 1/5603A8D8 | 0 | async
(1 row)
注意:在一主一从的架构中,新增节点的sync_stat状态为“async”。
2)查看pool node信息
TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.4.127 | 54321 | up | 0.333333 | primary | 5 | false | 0
1 | 192.168.4.129 | 54321 | down | 0.333333 | standby | 0 | false | 0
2 | 192.168.4.159 | 54321 | up | 0.333333 | standby | 0 | true | 0
(3 rows)
3、停止主库kingbaseES数据库服务
[kingbase@srv127 etc]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data
等待服务器进程关闭 .... 完成
服务器进程已经关闭
4、切换完成查看备库信息
1)查看新备库的kingbaseES进程
[kingbase@#localhost log]$ ps -ef |grep kingbase
kingbase 11680 1 0 15:21 pts/1 00:00:01 /home/kingbase/cluster/ESHA/db/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase 11681 11680 0 15:21 ? 00:00:00 kingbase: logger process
kingbase 11686 11680 0 15:21 ? 00:00:00 kingbase: checkpointer process
kingbase 11687 11680 0 15:21 ? 00:00:00 kingbase: writer process
kingbase 11689 11680 0 15:21 ? 00:00:00 kingbase: stats collector process
......
kingbase 15518 11680 0 15:33 ? 00:00:00 kingbase: wal writer process
kingbase 15519 11680 0 15:33 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 15520 11680 0 15:33 ? 00:00:00 kingbase: archiver process last was 00000005.history
kingbase 15521 11680 0 15:33 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 16075 11680 0 15:35 ? 00:00:00 kingbase: SUPERMANAGER_V8ADMIN TEST 192.168.4.127(34260) idle
2)连接实例查看数据库状态
[kingbase@#localhost data]$ ksql -U system -W 123456 TEST
ksql (V008R003C002B0160)
Type "help" for help.
TEST=# select sys_is_in_recovery();
sys_is_in_recovery
--------------------
f
(1 row)
注意:备库已经切换为主库状态。
3)查看recovery日志
[kingbase@#localhost log]$ tail -f recovery.log
---------------------------------------------------------------------
2020-10-31 15:36:02 recover beging...
2020-10-31 15:36:02 check if the network is ok
ping trust ip 192.168.4.1 scuccess ping times :[3], scuccess times:[3]
determine if i am master or standby
I,m node is primary, determine whether there is a standby db can be set to synchronous standby
===由以上可以获得,备库已经切换为主库。===
四、总结
1、kingbaseES V8R3读写分离集群可以在线手工方式增加新的节点,进行集群架构的横向扩展。
2、在修改kingbasecluster.conf后需要重新启动kingbasecluster服务,通过root用户可以先在cluster备节点通过命令(kingbasecluster -m fast stop关闭服务),然后再关闭cluster主节点服务;先启动cluster主节点服务(kingbasecluster -n >kingbasecluster.log 2>&1 &)方式启动集群服务,主节点启动完成后再启动cluster备节点。
3、增加节点前做好测试,应该在业务低峰时候来完成。
4、本测试文档只是在CentOS 7环境下,其他版本使用前请在线下测试。