KingbaseES V8R6 集群运维系列之 -- 物理copy方式手工添加新备库节点

案例说明:
对于主库数据量比较大的环境,在添加新节点是可以采用在线clone方式创建新的备库节点,也可以在离线的状态下,直接拷贝其中一个备库的所有集群相关目录来创建新的备库节点。本案例介绍了通过离线物理copy目录的方式创建新的备库节点,包括详细的操作步骤。

适用版本:
KingbaseES V8R6

集群架构:

操作步骤:

  1)配置新节点系统环境和集群其他节点保持一致。
  2)手工配置新节点和集群其他节点:root-root、kingbase-kingbase、kingbase-root用户之间的信任关系,在不支持ssh互信的环境,可以使用securecmdd服务。
  3)关闭新节点防火墙和selinux。
  4)关闭集群,从主库拷贝集群目录和相关文件到新节点(包括数据库)。
  5)配置ip和arping可执行文件的setuid权限。
  6)配置新备库repmgr.conf文件。
  7)启动集群,启动新备库数据库服务,将新备库注册到集群。
  8)拷贝.encpw文件到新备库,关闭新备库数据库服务,将新备库节点加入到集群。
  9)验证集群所有节点状态信息和流复制信息。
 10)重新启动集群验证。

一、配置新节点系统环境(和集群其他节点相同)

https://help.kingbase.com.cn/v8/install-updata/install-linux/install-linux-2.html
部署前系统环境的配置可以参考KingbaseES官方文档。

二、查看当前集群状态现象

1、集群节点状态信息

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3          

2、主备流复制状态信息

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+
 13347 |    16384 | esrep   | node249          | 192.168.7.249 |                 |       23228 | 2021-03-01 14:45:03.723296+08 |              | streaming | 1/F205BC90 | 1/F205BC90 | 1/F205BC90 | 1/F205BC90 |           |           |            |             1 | quorum     | 2021-03-01 14:54:58.127023+08
(1 row)

三、物理copy创建新备库节点

1、新备库创建目录(和集群其他节点一致)

[kingbase@node3 .ssh]$ mkdir -p /home/kingbase/cluster/R6HA/KHA/kingbase/data

2、从已有的主库节点拷贝集群相关目录和文件

Tips:
1)先可以在关库前,手工执行checkpoint,然后正常关闭集群。
2)对于数据量很大的,可以在备份了archive日之后,先将归档日志清理后,再拷贝,节省数据传输时间。

[kingbase@node2 KHA]$ scp -r * node3:/home/kingbase/cluster/R6HA/KHA/

3、配置ip和arping可执行文件权限

=如果集群使用vip,需要对ip和arping可执行文件配置setuid权限。=

[root@node3 soft]# chmod 4755 /sbin/ip
[root@node3 soft]# chmod 4755 /sbin/arping
[root@node3 soft]# ls -lh /sbin/ip
-rwsr-xr-x. 1 root root 319K Nov 20  2015 /sbin/ip
[root@node3 soft]# ls -lh /sbin/arping
-rwsr-xr-x. 1 root root 24K Nov 21  2015 /sbin/arping

四、将新备库节点加入集群
1、编辑repmgr.conf文件

[kingbase@node3 etc]$ cat repmgr.conf
on_bmj=off
node_id=3
node_name=node243
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr  standby follow  -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=3
reconnect_interval=5
failover='automatic'
recovery='manual'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
ipaddr_path='/sbin'
arping_path='/sbin'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid'
ping_path='/usr/bin'

2、将集群其他节点的认证文件拷贝到新节点的宿主目录下

[kingbase@node3 ~]$ ls -lha .encpwd 
-rw-------. 1 kingbase kingbase 55 Mar  1 14:33 .encpwd

[kingbase@node3 ~]$ cat .encpwd 
*:*:*:system:MTIzNDU2
*:*:*:esrep:S2luZ2Jhc2VoYTExMA==

=== V8R6集群使用了.encpwd的隐藏文件,用于系统用户免密登录数据库===

3、启动数据库服务并注册standby节点

# 启动数据库
[kingbase@node3 bin]$ chmod 700 ../data
[kingbase@node3 bin]$ ./sys_ctl start -D ../data

# register standby 
[kingbase@node3 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node243" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
WARNING: local node not attached to primary node 1
NOTICE: -F/--force supplied, continuing anyway
INFO: standby registration complete
NOTICE: standby node "node243" (ID: 3) successfully registered

4、关闭备库数据库服务及新节点加入到集群

# 关闭数据库服务
[kingbase@node102 bin]$ ./sys_ctl stop -D ../data

# 新节点加入到集群
[kingbase@node3 bin]$ ./repmgr node rejoin -h 192.168.7.248 -U esrep -d esrep
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/F2055920, rejoin target lsn is 1/F2062AB0
NOTICE: setting node 3's upstream to node 1
WARNING: unable to ping "host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 14:34:19.973116
NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6HA/KHA/kingbase/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 14:34:20.187969
NOTICE: NODE REJOIN successful
DETAIL: node 3 is now attached to node 1

=== 从以上获知,新节点node243作为备库加入到集群中===

5、查看集群节点状态

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

6、查看主备流复制状态

 test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+-------------+
 13347 |    16384 | esrep   | node249          | 192.168.7.249 |                 |       23228 | 2021-03-01 14:45:03.723296+08 |              | streaming | 1/F20659C8 | 1/F20659C8 | 1/F20659C8 | 1/F20659C8 |           |           |            |             1 | quorum     | 2021-03-01 15:35:31.027066+08
 25123 |    16384 | esrep   | node243          | 192.168.7.243 |                 |       49130 | 2021-03-01 15:33:59.607489+08 |              | streaming | 1/F20659C8 | 1/F20659C8 | 1/F20659C8 | 1/F20659C8 |           |           |            |             1 | quorum     | 2021-03-01 14:36:01.384836+08

7、数据同步测试

1) Primary DML操作

test=# \c prod
You are now connected to database "prod" as user "system".

prod=# create table t8 (like t7);
CREATE TABLE
prod=# \d
                     List of relations
 Schema |        Name         |       Type        | Owner  
--------+---------------------+-------------------+--------
 ......
 public | t8                  | table             | system
(16 rows)

2) standby 查看同步数据

 You are now connected to database "prod" as user "system".
prod=# \d
                     List of relations
 Schema |        Name         |       Type        | Owner  
--------+---------------------+-------------------+--------
 .......
 public | t8                  | table             | system
(16 rows)

五、重新启动集群测试

1、重新启动集群

[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 15:37:28 Ready to stop all DB ...
.......
2021-03-01 15:38:19 repmgrd on "[192.168.7.243]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | primary | * running |          | running | 27678 | no      | n/a                
 2  | node249 | standby |   running | node248  | running | 25551 | no      | 1 second(s) ago    
 3  | node243 | standby |   running | node248  | running | 20067 | no      | n/a                
2021-03-01 15:38:31 Done.

=== 从以上获知,新的节点已经可以通过sys_monitor.sh进行管理。===

2、查看新备库数据库服务

[kingbase@node3 bin]$ ps -ef |grep kingbase

kingbase 19538     1  0 14:38 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
kingbase 19548 19538  0 14:38 ?        00:00:00 kingbase: logger   
kingbase 19550 19538  0 14:38 ?        00:00:00 kingbase: startup   recovering 0000000500000001000000F3
kingbase 19558 19538  0 14:38 ?        00:00:00 kingbase: checkpointer   
........

3、查看集群节点状态

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+--------
 1  | node248 | primary | * running |          | default  | 100      | 5        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 3  | node243 | standby |   running | node248  | default  | 100      | 5        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

4、查看主库流复制状态

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+-------------+
 27193 |    16384 | esrep   | node249          | 192.168.7.249 |                 |       25912 | 2021-03-01 15:37:59.330171+08 |              | streaming | 1/F3001CD8 | 1/F3001CD8 | 1/F3001CD8 | 1/F3001CD8 |           |           |            |             1 | quorum     | 2021-03-01 15:39:54.163480+08
 27207 |    16384 | esrep   | node243          | 192.168.7.243 |                 |       49140 | 2021-03-01 15:38:02.326055+08 |              | streaming | 1/F3001CD8 | 1/F3001CD8 | 1/F3001CD8 | 1/F3001CD8 |           |           |            |             1 | quorum     | 2021-03-01 14:40:24.506330+08
(2 rows)

5、在通过sys_monitor.sh启动集群时自动在新节点创建crond服务


[root@node3 cron.d]# cat KINGBASECRON 
*/1 * * * * kingbase . /etc/profile;/home/kingbase/cluster/R6HA/KHA/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf >> /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log 2>&

posted @ 2021-06-29 19:45  KINGBASE研究院  阅读(706)  评论(0编辑  收藏  举报