KingbaseES V8R6 集群修改data目录

案例说明:
本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试。本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数据库存储data目录;第二种是在线修改,在集群正常运行的情况下,修改数据存储data目录,一般可用于生产环境。

数据库版本:

集群架构:

案例1:离线修改数据存储data目录

一、查看集群节点状态和配置信息

1、集群节点状态

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                               
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
 1  | node248 | primary | * running |          | default  | 100      | 29       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | standby |   running | node248  | default  | 110      | 29       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

2、集群data目录配置信息

=== 对于R6的集群,data目录的配置非常简单,只在repmgr.conf文件中。===

[kingbase@node3 bin]$ cat ../etc/repmgr.conf |grep -i data
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'

二、停止集群和数据库服务

[kingbase@node3 bin]$ ./sys_monitor.sh stop
2021-03-01 12:28:00 Ready to stop all DB ...
Service process "node_export" was killed at process 13139
……

三、迁移data目录到指定位置(所有节点)

1、迁移data目录

#创建新的data目录
[kingbase@node3 kingbase]$ mkdir -p /data/kingbase/cluster/v8r6``

#拷贝data目录到指定位置

[kingbase@node3 kingbase]$ cp -var data /data/kingbase/cluster/v8r6
……
‘data/standby.signal’ -> ‘/data/kingbase/cluster/v8r6/data/standby.signal’
‘data/kingbase.auto.conf’ -> ‘/data/kingbase/cluster/v8r6/data/kingbase.auto.conf’
‘data/kingbase.opts’ -> ‘/data/kingbase/cluster/v8r6/data/kingbase.opts’
‘data/backup_label.old’ -> ‘/data/kingbase/cluster/v8r6/data/backup_label.old’
‘data/current_logfiles’ -> ‘/data/kingbase/cluster/v8r6/data/current_logfiles’

2、编辑repmgr.conf文件

=== 在repmgr.conf文件中,指定新的data目录位置。===

[kingbase@node3 bin]$ cat ../etc/repmgr.conf |grep -i data
data_directory='/data/kingbase/cluster/v8r6/data'
#data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'

四、启动集群和数据库服务

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 12:36:29 Ready to start all DB ...
2021-03-01 12:36:29 begin to start DB on "[192.168.7.243]".
waiting for server to start.... done
server started
......
2021-03-01 12:37:00 repmgrd on "[192.168.7.243]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | primary | * running |          | running | 19243 | no      | n/a               
 3  | node243 | standby |   running | node248  | running | 10859 | no      | 0 second(s) ago   
2021-03-01 12:37:06 Done.

五、查看数据库服务进程

=集群启动后,数据库服务已经自动读取到新的data目录配置信息。=

[kingbase@node3 bin]$ ps -ef |grep kingbase
kingbase 10172     1  0 12:36 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /data/kingbase/cluster/v8r6/data
kingbase 10184 10172  0 12:36 ?        00:00:00 kingbase: logger  
kingbase 10185 10172  0 12:36 ?        00:00:00 kingbase: startup   recovering 0000001D000000060000000F
kingbase 10186 10172  0 12:36 ?        00:00:00 kingbase: checkpointer  
kingbase 10187 10172  0 12:36 ?        00:00:00 kingbase: background writer  
kingbase 10188 10172  0 12:36 ?        00:00:00 kingbase: stats collector  
kingbase 10494 10172  0 12:36 ?        00:00:00 kingbase: walreceiver   streaming 6/F001000
kingbase 10843 10172  0 12:36 ?        00:00:00 kingbase: esrep esrep node3(25765) idle
.....

六、数据访问测试

1、查看流复制状态信息

[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.
 
test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |  state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |  
       reply_time          
-------+----------+---------+------------------+---------------+-----------------+-------------+---------
 18982 |    16384 | esrep   | node243          | 192.168.7.243 | node3           |       38092 | 2021-03-01 12:40:41.132419+08 |              | streaming | 6/F0010B0 | 6/F0010B0 | 6/F0010B0 | 6/F0010B0  | 00:00:00.001058 | 00:00:00.068286 | 00:00:00.068295 |             1 | sync       | 20
21-03-01 12:41:43.769806+08
(1 row)

2、数据库访问测试

主库:
prod=#  create table tb1 (id int ,name varchar(10));
CREATE TABLE
prod=# insert into tb1 values (generate_series(1,10000),'usr'||generate_series(1,10000));
INSERT 0 10000
prod=# select count(*) from tb1;
 count
-------
 10000
(1 row)
 
备库:
prod=# \d tb1;
                          Table "public.tb1"
 Column |            Type            | Collation | Nullable | Default
--------+----------------------------+-----------+----------+---------
 id     | integer                    |           |          |
 name   | character varying(10 char) |           |          |
 
prod=# select count(*) from tb1;
 count
-------
 10000
(1 row)

=== 由以上信息获知, 集群下数据库存储data目录已经修改成功。===

案例2: 在线修改data目录

案例说明:
此案例是先停止备库的集群和数据库服务,修改备库的数据库存储data目录,然后启动备库集群和数据库服务,集群状态正常后,执行集群在线切换(switchover),将主备切换,然后再在原主库(新备库)上进行数据存储data目录的修改,在修改的过程中,保证集群仍然可以正常访问。

一、停止备库集群和数据库服务

1、查看集群节点状态信息

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
 1  | node248 | primary | * running |          | default  | 100      | 29       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | standby |   running | node248  | default  | 110      | 29       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

2、停止备库集群服务

[kingbase@node3 bin]$ ./sys_monitor.sh stoplocal
Service process "node_export" was killed at process 19315
Service process "postgres_ex" was killed at process 19316
2021-03-01 13:12:07 begin to stop repmgrd on "[localhost]".
2021-03-01 13:12:08 repmgrd on "[localhost]" stop success.
2021-03-01 13:12:08 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2021-03-01 13:12:09 DB on "[localhost]" stop success.

3、主库查看集群状态信息

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
 1  | node248 | primary | * running |          | default  | 100      | 29       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | standby | - failed  | node248  | default  | 110      | ?        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

WARNING: following issues were detected
  - unable to connect to node "node243" (ID: 3)

=== 由以上信息获知,备库集群服务已被停止。====

二、迁移和修改备库数据存储data目录

1、迁移数据存储data目录

[kingbase@node3 bin]$ cp -var /data/kingbase/cluster/v8r6/data /home/kingbase/cluster/R6HA/KHA/kingbase/
......
‘/data/kingbase/cluster/v8r6/data/kingbase.auto.conf’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/kingbase.auto.conf’
‘/data/kingbase/cluster/v8r6/data/kingbase.opts’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/kingbase.opts’
‘/data/kingbase/cluster/v8r6/data/backup_label.old’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/backup_label.old’
‘/data/kingbase/cluster/v8r6/data/current_logfiles’ -> ‘/home/kingbase/cluster/R6HA/KHA/kingbase/data/current_logfiles’

2、编辑repmgr.conf

=== 在repmgr.conf文件中,指定新的data目录位置。===

[kingbase@node3 bin]$ cat ../etc/repmgr.conf|grep -i data
#data_directory='/data/kingbase/cluster/v8r6/data'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'

三、启动备库数据库和集群服务

1、手工启动备库数据库服务

# 启动数据库服务
[kingbase@node3 bin]$ ./sys_ctl start -D /home/kingbase/cluster/R6HA/KHA/kingbase/data
waiting for server to start....2021-03-01 13:19:50.626 CST [27722] LOG:  sepapower extension initialized
2021-03-01 13:19:50.676 CST [27722] LOG:  starting KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2021-03-01 13:19:50.676 CST [27722] LOG:  listening on IPv4 address "0.0.0.0", port 54321
2021-03-01 13:19:50.676 CST [27722] LOG:  listening on IPv6 address "::", port 54321
2021-03-01 13:19:50.772 CST [27722] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
2021-03-01 13:19:50.924 CST [27722] LOG:  redirecting log output to logging collector process
2021-03-01 13:19:50.924 CST [27722] HINT:  Future log output will appear in directory "sys_log".
 done
server started

# 查看数据库服务进程
[kingbase@node3 bin]$ ps -ef |grep kingbase

kingbase 27722     1  0 13:19 ?        00:00:00 /home/kingbase/cluster/R6HA/KHA/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/KHA/kingbas/data
kingbase 27723 27722  0 13:19 ?        00:00:00 kingbase: logger   
kingbase 27724 27722  0 13:19 ?        00:00:00 kingbase: startup   recovering 0000001D0000000600000011
kingbase 27725 27722  0 13:19 ?        00:00:00 kingbase: checkpointer   
kingbase 27726 27722  0 13:19 ?        00:00:00 kingbase: background writer   
kingbase 27727 27722  0 13:19 ?        00:00:00 kingbase: stats collector   
kingbase 27728 27722  0 13:19 ?        00:00:00 kingbase: walreceiver   streaming 6/11006B28

2、注册备库集群信息

# 注册备库
[kingbase@node3 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node243" (ID: 3)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node243" (ID: 3) successfully registered

# 查看集群状态信息
[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                     
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
 1  | node248 | primary | * running |          | default  | 100      | 29       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | standby |   running | node248  | default  | 110      | 29       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

3、启动repmgrd服务

[kingbase@node3 bin]$ ./repmgrd -d
[2021-03-01 13:20:53] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

[kingbase@node3 bin]$ ps -ef |grep repmgrd
kingbase 27999     1  0 13:20 ?        00:00:00 ./repmgrd -d

=由以上可知,备库数据存储data目录修改成功。=

四、执行在线主备切换(switchover)

1、执行switchover

# 测试switchover
[kingbase@node3 bin]$ ./repmgr standby switchover -h 192.168.7.248 -U esrep -d esrep --dry-run
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing UNKNOWN ACTION
NOTICE: checking switchover on node "node243" (ID: 3) in --dry-run mode
INFO: SSH connection to host "192.168.7.248" succeeded
INFO: able to execute "repmgr" on remote host "localhost"
INFO: 1 walsenders required, 32 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 11 pending archive files
INFO: replication lag on this standby is 0 seconds
INFO: 1 replication slots required, 10 available
INFO: would pause repmgrd on node "node248" (ID 1)
INFO: would pause repmgrd on node "node243" (ID 3)
NOTICE: local node "node243" (ID: 3) would be promoted to primary; current primary "node248" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "node248":
  "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile -W -m fast stop"
INFO: prerequisites for executing STANDBY SWITCHOVER are met

# 执行switchover
[kingbase@node3 bin]$ ./repmgr standby switchover -h 192.168.7.248 -U esrep -d esrep --siblings-follow
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node243" (ID: 3)
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: pausing repmgrd on node "node248" (ID 1)
INFO: pausing repmgrd on node "node243" (ID 3)
NOTICE: local node "node243" (ID: 3) will be promoted to primary; current primary "node248" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node248" (ID: 1)
NOTICE: issuing CHECKPOINT
NOTICE: node (ID: 1) release the virtual ip 192.168.7.240/24 success
DETAIL: executing server command "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 6/12000028
NOTICE: PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data.

--- 192.168.7.240 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms


WARNING: ping host"192.168.7.240" failed
DETAIL: average RTT value is not greater than zero
NOTICE: new primary node (ID: 3) acquire the virtual ip 192.168.7.240/24 success
NOTICE: promoting standby to primary
DETAIL: promoting server "node243" (ID: 3) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node243" (ID: 3) was successfully promoted to primary
NOTICE: issuing CHECKPOINT
INFO: the file "db_rewind.status" exists, reset the data at first
NOTICE: executing sys_rewind to reset the data
DETAIL: sys_rewind command is "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_rewind --reset -D /data/kingbase/cluster/v8r6/data"
sys_rewind: start to reset the data ...
sys_rewind: status check: restore all the backup temp file, Done!
sys_rewind: finish to reset the data, Done!
INFO: local node 1 can attach to rejoin target node 3
DETAIL: local node's recovery point: 6/12000028; rejoin target node's fork point: 6/120000A0
NOTICE: setting node 1's upstream to node 3
WARNING: unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2021-03-01 13:29:22.430482
NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl  -w -t 90 -D '/data/kingbase/cluster/v8r6/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start"
NOTICE: start server finish at 2021-03-01 13:29:23.261046
NOTICE: replication slot "repmgr_slot_3" deleted on node 1
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 3
NOTICE: switchover was successful
DETAIL: node "node243" is now primary and node "node248" is attached as standby
INFO: unpausing repmgrd on node "node248" (ID 1)
INFO: unpause node "node248" (ID 1) successfully
INFO: unpausing repmgrd on node "node243" (ID 3)
INFO: unpause node "node243" (ID 3) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully

2、查看集群状态信息

[kingbase@node3 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node248 | standby |   running | node243  | default  | 100      | 29       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | primary | * running |          | default  | 110      | 30       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

注意:
当前原主库已经备切换为新的备库,所以按照以上备库修改数据存储data目录的步骤,对原主库进行修改即可。修改成功后,启动原主库的数据服务和集群服务既可完成修改。

五、重启集群测试(可选)

# 重启集群
[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 13:54:57 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
.......

2021-03-01 13:55:36 repmgrd on "[192.168.7.243]" start success.
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node248 | standby |   running | node243  | running | 23749 | no      | 0 second(s) ago    
 3  | node243 | primary | * running |          | running | 1232  | no      | n/a                
2021-03-01 13:55:42 Done.


# 查看集群状态信息
[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+------------------------
 1  | node248 | standby |   running | node243  | default  | 100      | 30       | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
 3  | node243 | primary | * running |          | default  | 110      | 30       | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

=== 由以上信息获知,在线修改集群数据存储data目录成功。===

总结:

两种案例都可用于集群数据存储data目录的修改,对于离线修改相对比较简单,但不能用于生产环境。在线修改相对步骤比较繁琐,但可用于生产环境。
posted @ 2022-02-11 17:26  KINGBASE研究院  阅读(368)  评论(0编辑  收藏  举报