KingbaseES V8R3集群维护案例之--- 在线扩容数据节点

案例说明:

在KingbaseES V8R3集群架构可以由Master和Standby两个管理节点(同时也是数据节点)及多个数据节点组成,在读多写少的应用环境,可以通过扩容数据节点(备库)提升集群读写分离的负载能力,本案例用于kingbaseES V8R3读写分离集群在线对数据节点扩容,主要分为三个步骤。

操作步骤:
1)流复制扩容
2)集群管理扩容

具体步骤:
1、 通过sys_basebackup创建新备库。
2、 将备库加入到Cluster nodes管理,可以用kingbase_monitor.sh一键启停。
3、 主备failover切换测试。

操作系统和数据库版本:

1)操作系统环境
[kingbase@#localhost ~]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
 
2)数据库适用版本
KingbaeES V8R3

前期准备:

1、 原有节点,主库(192.168.4.127)、备库(192.168.4.129),新增节点(192.168.4.159)。
2、 在新增节点安装相同版本的kingbaseES数据库。
3、 配置新增节点到原节点的ssh信任关系(root、kingbase)。
4、 关闭新增节点防火墙和selinux并启动crond服务。

一、在线增加数据库复制节点

1.1 查看kingbase cluster节点状态

=首先确定原主备库的Cluster和kingbaseES服务状态正常。=

TEST=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replicatio
n_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-----------
--------
 0       | 192.168.4.129 | 54321 | up     | 0.500000  | standby | 0          | true              | 0
 1       | 192.168.4.127 | 54321 | up     | 0.500000  | primary | 0          | false             | 0
(2 rows)

1.2 查看主备复制信息

=保证集群中主备复制状态正常。=

TEST=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-------------
 16851 |       10 | SYSTEM  | srv129           | 192.168.4.129 |                 |       52967 | 2020-10-23 15:11:17.836557+08 |              | streaming | 0/18000
0D0    | 0/180000D0     | 0/180000D0     | 0/180000D0      |             0 | async
(1 row)

1.3 在线增加新的备库节点

=为保证测试环境和接近于生产环境,在创建备库时,在主库端进行大的事务处理,当备库创建完成后查看是否合主库数据一致。=

在主库进行事务处理(模拟生产库):

prod=# insert into t10 values (generate_series(1,10000000),'tom');
 
prod=# select count(*) from t10;
  count
----------
 11000000
(1 row)

1.3.1 在新备库上通过sys_basebackup在线创建备库

1) 创建备库数据存储目录
=注意:目录位置尽量和其他已存在节点的目录位置一致,这样对于后期配置改动较少。=

[kingbase@srv159 v8r3]$ mkdir -p /home/kingbase/cluster/ESHA/db/data
[kingbase@srv159 v8r3]$ chmod 0700 /home/kingbase/cluster/ESHA/db/data

2)通过sys_basebackup创建备库

[kingbase@srv159 v8r3]$sys_basebackup -h 192.168.4.127 -D /home/kingbase/cluster/ESHA/db/data -F p -x -v -P -U SYSTEM -W 123456 -p 54321
事务日志起始于时间点: 0/B20060D0, 基于时间表3
2079583/2079583 kB (100%), 1/1 表空间
transaction log end point: 0/D5C48538
sys_basebackup: base backup completed

1.3.2 配置备库recovery.conf文件

[kingbase@srv159 data]$ pwd
/home/kingbase/cluster/ESHA/db/data
[kingbase@srv159 data]$ cat recovery.conf
standby_mode='on'
primary_conninfo='port=54321 host=192.168.4.127 user=SYSTEM password=MTIzNDU2 application_name=srv159'
recovery_target_timeline='latest'
primary_slot_name ='slot_srv159'

1.3.3 配置备库kingbase.conf文件(部分内容)

# Add settings for extensions here
max_wal_senders=4
wal_keep_segments=256
hot_standby_feedback=on
max_prepared_transactions=100
port=54321
control_file_copy='/home/kingbase/cluster/ESHA/template.bk'
fsync=on
wal_log_hints=on
archive_mode=on
hot_standby=on
wal_level=replica
###synchronous_standby_names='1(srv129)'
archive_dest='/home/kingbase/cluster/ESHA/archivedir/'
max_replication_slots=4
log_directory='/home/kingbase/cluster/ESHA/db/data/sys_log/'

1.3.4 配置所有节点sys_hba.conf文件

# "local" is for Unix domain socket connections only
local   all             all                                     md5
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0/0               md5
# IPv6 local connections:
host    all             all             ::1/128                 md5
host    all             all             ::0/0                   md5
# Allow replication connections from localhost, by a user with the
# replication private
host all SYSTEM 192.168.4.127/16 md5
host all SYSTEM 192.168.4.129/16 md5
host all SYSTEM 192.168.4.159/16 md5
 
host replication SYSTEM 192.168.4.127/16 md5
host replication SYSTEM 192.168.4.129/16 md5
host replication SYSTEM 192.168.4.159/16 md5

=注意:修改已有节点的sys_hba.conf文件后,需要reload数据库进程才能使配置生效。=

[kingbase@srv129 data]$ sys_ctl reload -D /home/kingbase/cluster/ESHA/db/data
服务器进程发出信号

1.3.5 从主库拷贝以下目录和文件到新备库

通过scp远程拷贝以下目录和文件到新增节点相同的位置:

=说明:“/home/kingbase/cluster/ESHA”是集群部署的目录,其中“ESHA”是集群名称。对于archivedir也可以创建空目录。=

[kingbase@#localhost ESHA]$ pwd
/home/kingbase/cluster/ESHA
[kingbase@#localhost ESHA]$ ls -lh
总用量 8.0K
drwxrwxr-x 2 kingbase kingbase    6 10月 30 14:23 archivedir
drwxrwxr-x 7 kingbase kingbase   84 10月 30 17:00 db
drwxrwxr-x 3 kingbase kingbase  181 10月 30 17:23 log
drwxrwxr-x 3 kingbase kingbase   29 10月 30 14:55 run
-rw------- 1 kingbase kingbase 8.0K 10月 31 13:19 template.bk
 
[kingbase@srv127 db]$ pwd
/home/kingbase/cluster/ESHA/db
[kingbase@srv127 db]$ ls -lh
总用量 20K
drwxrwxr-x  2 kingbase kingbase 4.0K 10月 29 17:29 bin
drwxrwxr-x  2 kingbase kingbase  295 10月 29 17:26 etc
-rw-------  1 kingbase kingbase  151 8月  18 20:13 kingbase.log
drwxrwxr-x  2 kingbase kingbase 4.0K 7月  27 10:41 lib
drwxrwxr-x  9 kingbase kingbase 4.0K 7月  27 10:41 share

=注意:目录和文件拷贝不要拷贝kingbasecluster目录到新增节点,在V8R3的集群中,cluster只支持主备两个节点,新增节点只能是data node。=

拷贝主库/etc/cron.d/KINGBASECRON文件到备库相同目录下:

注意:备库crond服务要被启动

1.3.6 在主库端添加replication_slot

1) 查看已经创建的replication_slot:

TEST=# select * from sys_replication_slots;
  slot_name  | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
 slot_srv127 |        | physical  |        |          | f      |            |      |              |             |
 slot_srv129 |        | physical  |        |          | f      |            |      |              |             |
(2 rows)

2) 创建新增备库的replication_slot

TEST=# select * from sys_create_physical_replication_slot('slot_srv159');
  slot_name  | xlog_position
-------------+---------------
 slot_srv159 |
(1 row)

3)查看主备复制的replication_slots

TEST=# select * from sys_replication_slots;
  slot_name  | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
 slot_srv127 |        | physical  |        |          | f      |            |      |              |             |
 slot_srv129 |        | physical  |        |          | f      |            |      |              |             |
 slot_srv159 |        | physical  |        |                 | f      |                 |      |              |           |
(3 rows)

1.3.7 启动备库实例加入主备复制架构

[kingbase@srv159 data]$ sys_ctl start -D /home/kingbase/cluster/ESHA/db/data
正在启动服务器进程
[kingbase@srv159 data]$ 日志:  日志输出重定向到日志收集进程
提示:  后续的日志输出将出现在目录 "/home/kingbase/cluster/ESHA/db/data/sys_log"中.
 
查看备库进程:
[kingbase@srv159 data]$ ps -ef |grep kingbase
kingbase  5071     1  0 20:01 pts/3    00:00:00 /opt/Kingbase/ES/V8R3/Server/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase  5072  5071  0 20:01 ?        00:00:00 kingbase: logger process
kingbase  5073  5071 90 20:01 ?        00:00:19 kingbase: startup process   waiting for 0000000300000000000000D6
kingbase  5077  5071  0 20:01 ?        00:00:00 kingbase: checkpointer process
kingbase  5078  5071  0 20:01 ?        00:00:00 kingbase: writer process
kingbase  5199  5071  0 20:01 ?        00:00:00 kingbase: stats collector process
kingbase  5200  5071  0 20:01 ?        00:00:00 kingbase: wal receiver process
kingbase  5201  5071  0 20:01 ?        00:00:00 kingbase: wal sender process SYSTEM 192.168.4.159(62391) idle

1.3.8 查看主备复制信息

1)查看主备复制状态信息

TEST=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-------------
 16851 |       10 | SYSTEM  | srv129           | 192.168.4.129 |                 |       52967 | 2020-10-23 15:11:17.836557+08 |              | streaming | 0/ED000
000    | 0/ED000000     | 0/ED000000     | 0/ECFFF418      |             0 | async
 12202 |       10 | SYSTEM  | srv159           | 192.168.4.159 |                 |       47993 | 2020-10-23 20:10:32.521960+08 |         2059 | streaming | 0/ED000
000    | 0/ED000000     | 0/ED000000     | 0/ECFFF418      |             0 | async
(2 rows)

2)查看数据库数据信息

主库:
prod=# select count(*) from t10;
  count
----------
 21000000
(1 row)
 
备库:
prod=# select count(*) from t10;
  count
----------
 21000000
(1 row)

备注:

1) 在主备状态信息中已经查看到新增节点(srv159)的复制状态信息。
2) 查看主库和备库同步数据一致,说明在创建基础备份期间,备库和主库保持了数据的一致性。
3) 由以上可知,备库创建完成。

二、配置新节点加入集群服务一键启停

2.1 编辑所有节点HAmodule.conf


注意:在KB_ALL_IP参数中加入新的备库节点的ip。

2.2 在新增节点修改HAmodule.conf

注意:将KB_LOCALHOST_IP参数配置为本机ip。

2.3 配置已经存在节点(Cluster的主备库)kingbase_cluster.conf文件

注意:图中黄色部分为新增节点的配置,仔细修改,否则在show pool_nodes时无法发现新增节点。

2.4 修改新增节点etc/recovery.done配置

注意:etc/recovery.done在集群执行failover切换时会读取,如果不修改,读取后的节点名称仍然是原主库节点名称。

2.5 测试kingbase_monitor.sh一键启停集群:

[kingbase@srv129 bin]$ ./kingbase_monitor.sh restart
-----------------------------------------------------------------------
2020-10-29 17:32:33 KingbaseES automation beging...
......................
all started..
...
now we check again
==================================================================
|             ip |                       program|              [status]
[  192.168.4.127]|             [kingbasecluster]|              [active]
[  192.168.4.129]|             [kingbasecluster]|              [active]
[  192.168.4.127]|                    [kingbase]|              [active]
[  192.168.4.129]|                    [kingbase]|              [active]
[  192.168.4.159]|                    [kingbase]|              [active]
==================================================================

注意:从以上可以获得,集群架构为2个节点的cluster,3个节点的data node。

2.6 查看新增节点信息

三、集群切换测试

3.1 测试目标

当原备库数据库服务停止或宕机,在主库服务或宕机时,新的备库切换为主库。

3.2 测试步骤

1、先停止原备库的kingbaseES服务。
2、再停止主库的kingbaseES服务。
3、查看新的备库是否被切换为主库。

3.3 实施主备切换
1、停止原备库的kingbaseES服务

[kingbase@srv129 bin]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data
等待服务器进程关闭 .... 完成
服务器进程已经关闭

2、查看复制状态信息

1)原备库kingbaseES被停止

TEST=# select * from sys_stat_replication;
 pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   stat
e   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
------+----------+---------+------------------+---------------+---------------
 3387 |       10 | SYSTEM  | srv159           | 192.168.4.159 |                 |       47333 | 2020-10-31 15:21:05.275911+08 |              | stream
ing | 1/5603A980    | 1/5603A980     | 1/5603A980     | 1/5603A8D8      |             0 | async
(1 row)

注意:在一主一从的架构中,新增节点的sync_stat状态为“async”。

2)查看pool node信息

TEST=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.4.127 | 54321 | up     | 0.333333  | primary | 5          | false             | 0
 1       | 192.168.4.129 | 54321 | down   | 0.333333  | standby | 0          | false             | 0
 2       | 192.168.4.159 | 54321 | up     | 0.333333  | standby | 0          | true              | 0
(3 rows)

3、停止主库kingbaseES数据库服务

[kingbase@srv127 etc]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data
等待服务器进程关闭 .... 完成
服务器进程已经关闭

4、切换完成查看备库信息

1)查看新备库的kingbaseES进程

[kingbase@#localhost log]$ ps -ef |grep kingbase
kingbase 11680     1  0 15:21 pts/1    00:00:01 /home/kingbase/cluster/ESHA/db/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase 11681 11680  0 15:21 ?        00:00:00 kingbase: logger process
kingbase 11686 11680  0 15:21 ?        00:00:00 kingbase: checkpointer process
kingbase 11687 11680  0 15:21 ?        00:00:00 kingbase: writer process
kingbase 11689 11680  0 15:21 ?        00:00:00 kingbase: stats collector process
......
kingbase 15518 11680  0 15:33 ?        00:00:00 kingbase: wal writer process
kingbase 15519 11680  0 15:33 ?        00:00:00 kingbase: autovacuum launcher process
kingbase 15520 11680  0 15:33 ?        00:00:00 kingbase: archiver process   last was 00000005.history
kingbase 15521 11680  0 15:33 ?        00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 16075 11680  0 15:35 ?        00:00:00 kingbase: SUPERMANAGER_V8ADMIN TEST 192.168.4.127(34260) idle

2)连接实例查看数据库状态

[kingbase@#localhost data]$ ksql -U system -W 123456 TEST
ksql (V008R003C002B0160)
Type "help" for help.
 
TEST=# select sys_is_in_recovery();
 sys_is_in_recovery
--------------------
 f
(1 row)
注意:备库已经切换为主库状态。

3)查看recovery日志

[kingbase@#localhost log]$ tail -f recovery.log
---------------------------------------------------------------------
2020-10-31 15:36:02 recover beging...
2020-10-31 15:36:02 check if the network is ok
ping trust ip 192.168.4.1 scuccess ping times :[3], scuccess times:[3]
determine if i am master or standby
I,m node is primary, determine whether there is a standby db can be set to synchronous standby
===由以上可以获得,备库已经切换为主库。===

四、总结

1、kingbaseES V8R3读写分离集群可以在线手工方式增加新的节点,进行集群架构的横向扩展。
2、在修改kingbasecluster.conf后需要重新启动kingbasecluster服务,通过root用户可以先在cluster备节点通过命令(kingbasecluster -m fast stop关闭服务),然后再关闭cluster主节点服务;先启动cluster主节点服务(kingbasecluster -n >kingbasecluster.log 2>&1 &)方式启动集群服务,主节点启动完成后再启动cluster备节点。
3、增加节点前做好测试,应该在业务低峰时候来完成。
4、本测试文档只是在CentOS 7环境下,其他版本使用前请在线下测试。
posted @ 2021-12-30 18:25  天涯客1224  阅读(305)  评论(0编辑  收藏  举报