随笔- 320 文章- 0 评论- 5 阅读- 34799

KingbaseES V8R6集群运维案例---删除test库测试

案例说明：
KingbaseES V8R6默认初始化实例后，创建test库，用于测试。对于有的生产环境，为安全考虑，需要删除test库；本案例测试了在删除test库后，对集群管理的影响。
适用版本：
KingbaseES V8R6

一、集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                               
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 6        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 6        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

二、删除集群test库
1、集群启动后删除test库

[kingbase@node101 bin]$ ./ksql -U system prod
ksql (V8.0)
Type "help" for help.

prod=# drop database test;
ERROR:  database "test" is being accessed by other users
DETAIL:  There is 1 other session using the database.

如下图所示，有session访问test库，无法删除：

2、查看访问test库的session

prod=#  select * from sys_stat_activity where datname='test';
-[ RECORD 1 ]----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
datid            | 12145
datname          | test
pid              | 4653
usesysid         | 10
usename          | system
application_name |
client_addr      | ::1
client_hostname  |
client_port      | 44336
backend_start    | 2023-04-24 16:33:40.039128+08
xact_start       |
query_start      | 2023-04-24 16:33:40.082766+08
state_change     | 2023-04-24 16:33:40.083743+08
wait_event_type  | Client
wait_event       | ClientRead
state            | idle
backend_xid      |
backend_xmin     |
query            |SELECT *,                                                                                                                                              
                 |(case pg_is_in_recovery() when 't' then null else pg_current_wal_lsn() end) AS pg_current_wal_lsn,  
                 |(case pg_is_in_recovery() when 't' then null else pg_wal_lsn_diff(pg_current_wal_lsn(), pg_lsn('0/0'))::float end) AS pg_current_wal_lsn_bytes,+
                 | (case pg_is_in_recovery() when 't' then null else pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)::float end) AS pg_wal_lsn_diff 
                 | FROM pg_stat_replication                                                                                                                               
backend_type     | client backend

---如上所示，主库节点通过test库访问sys_stat_replication视图。

3、关闭集群启动主库节点数据库服务
1）修改参数synchronous_commit

[kingbase@node101 bin]$ cat /data/kingbase/hac7/data/es_rep.conf |grep -i commit
#synchronous_commit = remote_apply
synchronous_commit = local

2）启动主库节点数据库服务
3）删除主库实例test库

prod=# drop database test;
DROP DATABASE

4）修改主库synchronous_commit = remote_apply
5）启动集群后查看备库test库也已经删除。

三、集群管理测试
1、集群关闭和启动测试

# 执行集群重启
[kingbase@node101 bin]$ ./sys_monitor.sh restart

# 集群启动后节点状态正常
[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                               
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 6        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 6        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、备库recovery测试
1）配置主备库参数recovery='standby'
2）关闭备库数据库服务
3）查看主库hamgr.log日志
如下图所示，对standby库的recovery成功：

3、集群failover测试
1）配置集群参数recovery='automatic'
2）关闭主库数据库服务
3）查看failover日志信息

4）切换后集群节点状态

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                               
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
1  | node1 | standby |   running | node2    | default  | 100      | 6        | 0 bytes | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2  | node2 | primary | * running |          | default  | 100      | 7        |         | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

4、主备switchover测试

# 备库执行switchover
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
........
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                               
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 8        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 7        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 
 ---如上所示，swithover后集群状态正常。