KingbaseES V8R6 集群修改data目录
案例说明:
本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试。本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数据库存储data目录;第二种是在线修改,在集群正常运行的情况下,修改数据存储data目录,一般可用于生产环境。
数据库版本:
集群架构:
案例1:离线修改数据存储data目录
一、查看集群节点状态和配置信息
1、集群节点状态
[kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
1 | node248 | primary | * running | | default | 100 | 29 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 110 | 29 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2、集群data目录配置信息
=== 对于R6的集群,data目录的配置非常简单,只在repmgr.conf文件中。===
[kingbase@node3 bin]$ cat ../etc/repmgr.conf |grep -i data
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
二、停止集群和数据库服务
[kingbase@node3 bin]$ ./sys_monitor.sh stop
2021-03-01 12:28:00 Ready to stop all DB ...
Service process "node_export" was killed at process 13139
……
三、迁移data目录到指定位置(所有节点)
1、迁移data目录
#创建新的data目录
[kingbase@node3 kingbase]$ mkdir -p /data/kingbase/cluster/v8r6``
#拷贝data目录到指定位置
[kingbase@node3 kingbase]$ cp -var data /data/kingbase/cluster/v8r6
……
‘data/standby.signal’ -> ‘/data/kingbase/cluster/v8r6/data/standby.signal’
‘data/kingbase.auto.conf’ -> ‘/data/kingbase/cluster/v8r6/data/kingbase.auto.conf’
‘data/kingbase.opts’ -> ‘/data/kingbase/cluster/v8r6/data/kingbase.opts’
‘data/backup_label.old’ -> ‘/data/kingbase/cluster/v8r6/data/backup_label.old’
‘data/current_logfiles’ -> ‘/data/kingbase/cluster/v8r6/data/current_logfiles’
2、编辑repmgr.conf文件
=== 在repmgr.conf文件中,指定新的data目录位置。===
[kingbase@node3 bin]$ cat ../etc/repmgr.conf |grep -i data
data_directory='/data/kingbase/cluster/v8r6/data'
#data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
四、启动集群和数据库服务
[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 12:36:29 Ready to start all DB ...
2021-03-01 12:36:29 begin to start DB on "[192.168.7.243]".
waiting for server to start.... done
server started
......
2021-03-01 12:37:00 repmgrd on "[192.168.7.243]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | primary | * running | | running | 19243 | no | n/a
3 | node243 | standby | running | node248 | running | 10859 | no | 0 second(s) ago
2021-03-01 12:37:06 Done.
五、查看数据库服务进程
=集群启动后,数据库服务已经自动读取到新的data目录配置信息。=
[kingbase@node3 bin]$ ps -ef |grep kingbase
kingbase 10172 1 0 12:36 ? 00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /data/kingbase/cluster/v8r6/data
kingbase 10184 10172 0 12:36 ? 00:00:00 kingbase: logger
kingbase 10185 10172 0 12:36 ? 00:00:00 kingbase: startup recovering 0000001D000000060000000F
kingbase 10186 10172 0 12:36 ? 00:00:00 kingbase: checkpointer
kingbase 10187 10172 0 12:36 ? 00:00:00 kingbase: background writer
kingbase 10188 10172 0 12:36 ? 00:00:00 kingbase: stats collector
kingbase 10494 10172 0 12:36 ? 00:00:00 kingbase: walreceiver streaming 6/F001000
kingbase 10843 10172 0 12:36 ? 00:00:00 kingbase: esrep esrep node3(25765) idle
.....
六、数据访问测试
1、查看流复制状态信息
[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |
reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+---------
18982 | 16384 | esrep | node243 | 192.168.7.243 | node3 | 38092 | 2021-03-01 12:40:41.132419+08 | | streaming | 6/F0010B0 | 6/F0010B0 | 6/F0010B0 | 6/F0010B0 | 00:00:00.001058 | 00:00:00.068286 | 00:00:00.068295 | 1 | sync | 20
21-03-01 12:41:43.769806+08
(1 row)
2、数据库访问测试
主库:
prod=# create table tb1 (id int ,name varchar(10));
CREATE TABLE
prod=# insert into tb1 values (generate_series(1,10000),'usr'||generate_series(1,10000));
INSERT 0 10000
prod=# select count(*) from tb1;
count
-------
10000
(1 row)
备库:
prod=# \d tb1;
Table "public.tb1"
Column | Type | Collation | Nullable | Default
--------+----------------------------+-----------+----------+---------
id | integer | | |
name | character varying(10 char) | | |
prod=# select count(*) from tb1;
count
-------
10000
(1 row)
=== 由以上信息获知, 集群下数据库存储data目录已经修改成功。===
案例2: 在线修改data目录
案例说明:
此案例是先停止备库的集群和数据库服务,修改备库的数据库存储data目录,然后启动备库集群和数据库服务,集群状态正常后,执行集群在线切换(switchover),将主备切换,然后再在原主库(新备库)上进行数据存储data目录的修改,在修改的过程中,保证集群仍然可以正常访问。
一、停止备库集群和数据库服务
1、查看集群节点状态信息
[kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
1 | node248 | primary | * running | | default | 100 | 29 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 110 | 29 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2、停止备库集群服务
[kingbase@node3 bin]$ ./sys_monitor.sh stoplocal
Service process "node_export" was killed at process 19315
Service process "postgres_ex" was killed at process 19316
2021-03-01 13:12:07 begin to stop repmgrd on "[localhost]".
2021-03-01 13:12:08 repmgrd on "[localhost]" stop success.
2021-03-01 13:12:08 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2021-03-01 13:12:09 DB on "[localhost]" stop success.
3、主库查看集群状态信息
[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
1 | node248 | primary | * running | | default | 100 | 29 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | - failed | node248 | default | 110 | ? | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
WARNING: following issues were detected
- unable to connect to node "node243" (ID: 3)
=== 由以上信息获知,备库集群服务已被停止。====
二、迁移和修改备库数据存储data目录
1、迁移数据存储data目录
[kingbase@node3 bin]$ cp -var /data/kingbase/cluster/v8r6/data /home/kingbase/cluster/R6HA/KHA/kingbase/
......
‘/data/kingbase/cluster/v8r6/data/kingbase.auto.conf’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/kingbase.auto.conf’
‘/data/kingbase/cluster/v8r6/data/kingbase.opts’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/kingbase.opts’
‘/data/kingbase/cluster/v8r6/data/backup_label.old’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/backup_label.old’
‘/data/kingbase/cluster/v8r6/data/current_logfiles’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/current_logfiles’
2、编辑repmgr.conf
=== 在repmgr.conf文件中,指定新的data目录位置。===
[kingbase@node3 bin]$ cat ../etc/repmgr.conf|grep -i data
#data_directory='/data/kingbase/cluster/v8r6/data'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
三、启动备库数据库和集群服务
1、手工启动备库数据库服务
# 启动数据库服务
[kingbase@node3 bin]$ ./sys_ctl start -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
waiting for server to start....2021-03-01 13:19:50.626 CST [27722] LOG: sepapower extension initialized
2021-03-01 13:19:50.676 CST [27722] LOG: starting KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2021-03-01 13:19:50.676 CST [27722] LOG: listening on IPv4 address "0.0.0.0", port 54321
2021-03-01 13:19:50.676 CST [27722] LOG: listening on IPv6 address "::", port 54321
2021-03-01 13:19:50.772 CST [27722] LOG: listening on Unix socket "/tmp/.s.KINGBASE.54321"
2021-03-01 13:19:50.924 CST [27722] LOG: redirecting log output to logging collector process
2021-03-01 13:19:50.924 CST [27722] HINT: Future log output will appear in directory "sys_log".
done
server started
# 查看数据库服务进程
[kingbase@node3 bin]$ ps -ef |grep kingbase
kingbase 27722 1 0 13:19 ? 00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbas/data
kingbase 27723 27722 0 13:19 ? 00:00:00 kingbase: logger
kingbase 27724 27722 0 13:19 ? 00:00:00 kingbase: startup recovering 0000001D0000000600000011
kingbase 27725 27722 0 13:19 ? 00:00:00 kingbase: checkpointer
kingbase 27726 27722 0 13:19 ? 00:00:00 kingbase: background writer
kingbase 27727 27722 0 13:19 ? 00:00:00 kingbase: stats collector
kingbase 27728 27722 0 13:19 ? 00:00:00 kingbase: walreceiver streaming 6/11006B28
2、注册备库集群信息
# 注册备库
[kingbase@node3 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node243" (ID: 3)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node243" (ID: 3) successfully registered
# 查看集群状态信息
[kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
1 | node248 | primary | * running | | default | 100 | 29 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 110 | 29 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3、启动repmgrd服务
[kingbase@node3 bin]$ ./repmgrd -d
[2021-03-01 13:20:53] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"
[kingbase@node3 bin]$ ps -ef |grep repmgrd
kingbase 27999 1 0 13:20 ? 00:00:00 ./repmgrd -d
=由以上可知,备库数据存储data目录修改成功。=
四、执行在线主备切换(switchover)
1、执行switchover
# 测试switchover
[kingbase@node3 bin]$ ./repmgr standby switchover -h 192.168.7.248 -U esrep -d esrep --dry-run
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing UNKNOWN ACTION
NOTICE: checking switchover on node "node243" (ID: 3) in --dry-run mode
INFO: SSH connection to host "192.168.7.248" succeeded
INFO: able to execute "repmgr" on remote host "localhost"
INFO: 1 walsenders required, 32 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 11 pending archive files
INFO: replication lag on this standby is 0 seconds
INFO: 1 replication slots required, 10 available
INFO: would pause repmgrd on node "node248" (ID 1)
INFO: would pause repmgrd on node "node243" (ID 3)
NOTICE: local node "node243" (ID: 3) would be promoted to primary; current primary "node248" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "node248":
"/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile -W -m fast stop"
INFO: prerequisites for executing STANDBY SWITCHOVER are met
# 执行switchover
[kingbase@node3 bin]$ ./repmgr standby switchover -h 192.168.7.248 -U esrep -d esrep --siblings-follow
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node243" (ID: 3)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: pausing repmgrd on node "node248" (ID 1)
INFO: pausing repmgrd on node "node243" (ID 3)
NOTICE: local node "node243" (ID: 3) will be promoted to primary; current primary "node248" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node248" (ID: 1)
NOTICE: issuing CHECKPOINT
NOTICE: node (ID: 1) release the virtual ip 192.168.7.240/24 success
DETAIL: executing server command "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 6/12000028
NOTICE: PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data.
--- 192.168.7.240 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms
WARNING: ping host"192.168.7.240" failed
DETAIL: average RTT value is not greater than zero
NOTICE: new primary node (ID: 3) acquire the virtual ip 192.168.7.240/24 success
NOTICE: promoting standby to primary
DETAIL: promoting server "node243" (ID: 3) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node243" (ID: 3) was successfully promoted to primary
NOTICE: issuing CHECKPOINT
INFO: the file "db_rewind.status" exists, reset the data at first
NOTICE: executing sys_rewind to reset the data
DETAIL: sys_rewind command is "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_rewind --reset -D /data/kingbase/cluster/v8r6/data"
sys_rewind: start to reset the data ...
sys_rewind: status check: restore all the backup temp file, Done!
sys_rewind: finish to reset the data, Done!
INFO: local node 1 can attach to rejoin target node 3
DETAIL: local node's recovery point: 6/12000028; rejoin target node's fork point: 6/120000A0
NOTICE: setting node 1's upstream to node 3
WARNING: unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 13:29:22.430482
NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl -w -t 90 -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 13:29:23.261046
NOTICE: replication slot "repmgr_slot_3" deleted on node 1
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 3
NOTICE: switchover was successful
DETAIL: node "node243" is now primary and node "node248" is attached as standby
INFO: unpausing repmgrd on node "node248" (ID 1)
INFO: unpause node "node248" (ID 1) successfully
INFO: unpausing repmgrd on node "node243" (ID 3)
INFO: unpause node "node243" (ID 3) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully
2、查看集群状态信息
[kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | standby | running | node243 | default | 100 | 29 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 110 | 30 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
注意:
当前原主库已经备切换为新的备库,所以按照以上备库修改数据存储data目录的步骤,对原主库进行修改即可。修改成功后,启动原主库的数据服务和集群服务既可完成修改。
五、重启集群测试(可选)
# 重启集群
[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 13:54:57 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
.......
2021-03-01 13:55:36 repmgrd on "[192.168.7.243]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node248 | standby | running | node243 | running | 23749 | no | 0 second(s) ago
3 | node243 | primary | * running | | running | 1232 | no | n/a
2021-03-01 13:55:42 Done.
# 查看集群状态信息
[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
1 | node248 | standby | running | node243 | default | 100 | 30 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 110 | 30 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
=== 由以上信息获知,在线修改集群数据存储data目录成功。===
总结:
两种案例都可用于集群数据存储data目录的修改,对于离线修改相对比较简单,但不能用于生产环境。在线修改相对步骤比较繁琐,但可用于生产环境。