KingbaseES V8R6集群案例之---同城双中心集群部署
案例说明:
本案例描述了在KingbaseES V8R6下部署同城双中心集群的过程,通过脚本的方式执行执行部署,部署方式和普通集群脚本部署基本一致。
适用版本:
KingbaseES V8R6
集群架构:
[kingbase@node101 ~]$ cat /etc/hosts
192.168.1.101 node1
192.168.1.102 node2
192.168.1.103 node3
一、集群部署相关文件
如下所示,将部署所需的文件存放到相关目录下:
[kingbase@node101 r6_install]$ ls -lh
total 389M
-rwxrwxr-x 1 kingbase kingbase 155K Jun 27 10:25 cluster_install.sh
-rw-rw-r-- 1 kingbase kingbase 386M Jun 27 10:25 db.zip
-rw-rw-r-- 1 kingbase kingbase 14K Jun 27 10:49 install.conf
-rw-r--r-- 1 kingbase kingbase 3.4K Jun 27 10:26 license.dat
-rw-rw-r-- 1 kingbase kingbase 2.5M Jun 27 10:25 securecmdd.zip
-rwxrwxr-x 1 kingbase kingbase 7.3K Jun 27 10:25 trust_cluster.sh
二、部署配置文件
如下所示,为install.conf脚本部署的配置文件信息:
[kingbase@node101 r6_install]$ cat install.conf |grep -v ^#|grep -v ^$
[install]
on_bmj=0
all_ip=()
witness_ip=""
production_ip=(192.168.1.101 192.168.1.102)
local_disaster_recovery_ip=(192.168.1.103)
remote_disaster_recovery_ip=()
install_dir="/home/kingbase/cluster/tptc/rh6"
zip_package="/home/kingbase/r6_install/db.zip"
license_file=(license.dat)
db_user="system" # the user name of database
db_port="54321" # the port of database, defaults is 54321
db_mode="oracle" # database mode: pg, oracle
db_auth="scram-sha-256" # database authority: scram-sha-256, md5, default is scram-sha-256
db_case_sensitive="no" # database case sensitive settings: yes, no. default is yes - case sensitive; no - case insensitive (NOTE. cannot set to 'no' when db_mode="pg").
db_checksums="yes" # the checksum for data: yes, no. default is yes - a checksum is calculated for each data block to prevent corruption; no - nothing to do.
archive_mode="on" # enables archiving; off, on, or always
db_encoding="" # Cararcter set encoding to use in the new database.Specify a tring constant,or an integer encoding number, default value provided by locale command.
db_collate="" # Collation order(LC_COLLATE) to use in the new database,This affects the sort order applied to strings, default value provided by locale command.
db_ctype="" # Character classification(LC_CTYPE) to use int the new database. This affects the categorization of characters, default value provided by locale command.
other_db_init_options="" # addional initdb options,such as "--scenario-tuning"
tcp_keepalives_idle="2" # (integer; default: 7200; since Linux 2.2)
# The number of seconds a connection needs to be idle before TCP begins sending out keep-alive counts. Keep-alives are sent only when the
# SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connection is terminated after approximately an
# additional 11 minutes (9 counts an interval of 75 seconds apart) when keep-alive is enabled.
tcp_keepalives_interval="2" # (integer; default: 75; since Linux 2.4)
# The number of seconds between TCP keep-alive counts.
tcp_keepalives_count="3" # (integer; default: 9; since Linux 2.2)
# The maximum number of TCP keep-alive counts to send before giving up and killing the connection if no response is obtained from the other end.
tcp_user_timeout="9000" # (since Linux 2.6.37)
connection_timeout="10" # connection timeout when use ssh or sys_securecmdd
wal_sender_timeout="30000" # in milliseconds; 0 disables
wal_receiver_timeout="30000" # time that receiver waits for
# communication from master
# in milliseconds; 0 disables
trusted_servers="192.168.1.1"
running_under_failure_trusted_servers='on'
data_directory="/data/kingbase/tptc/rh6/data"
waldir=''
virtual_ip=""
net_device=()
net_device_ip=()
ipaddr_path="/sbin"
arping_path=""
ping_path="/bin"
super_user="root"
execute_user="kingbase"
deploy_by_sshd=1 # choose whether to use sshd when deploy, 0 means not to use (deploy by sys_securecmdd), 1 means to use (deploy by sshd), default value is 1; when on_bmj=1, it will auto set to no(deploy_by_sshd=0)
use_scmd=0 # Is the cluster running on sys_securecmdd or sshd? 1 means yes (on sys_securecmdd), 0 means no (on sshd), default value is 1; when on_bmj=1, it will auto set to yes(use_scmd=1)
reconnect_attempts="10" # the number of retries in the event of an error
reconnect_interval="6" # retry interval
recovery="standby" # the way of cluster recovery: standby/automatic/manual
ssh_port="22" # the port of ssh, default is 22
scmd_port="8890" # the port of sys_securecmdd, default is 8890
auto_cluster_recovery_level='1'
use_check_disk='off'
synchronous='sync'
sync_in_same_location=0
failover_need_server_alive='off'
Tips:
如下图所示,不指定all_ip参数:
注意以下参数(普通集群部署不具有):
## For ha_running_mode='TPTC', if the sync nodes have the same location with primary ?
## 0: some nodes could be sync nodes. (don't care what the location is)
## 1: only the nodes have same location with primary, could be sync nodes.
## the default is 0. (when ha_running_mode='DG' or synchronous='async', this parameter has no effect)
sync_in_same_location=0 ### 灾备中心备库可以同步方式(sync)建立流复制
## For ha_running_mode='TPTC', if we can do failover when the standby node has different location with failure primary?
## 'off': can not do failover, if the standby node has different location with primary.
## 'none': can do failover.
## 'any': can do failover, need ANY server alive in primary's location if the standby node has different location with primary.
## 'all': can do failover, need ALL servers alive in primary's location if the standby node has different location with primary.
## the default is off. (when ha_running_mode='DG', this parameter has no effect)
failover_need_server_alive='none' ###支持中心之间failover启动切换
三、执行部署
执行脚本部署:
[kingbase@node101 r6_install]$ sh cluster_install.sh
四、repmgr.conf 配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf
use_scmd=off
ha_running_mode='TPTC'
node_id=1
node_name='node1'
conninfo='host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'
data_directory='/data/kingbase/tptc/rh6/data'
log_file='/home/kingbase/cluster/tptc/rh6/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/tptc/rh6/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/tptc/rh6/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'
trusted_servers='192.168.1.1'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/tptc/rh6/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/tptc/rh6/kingbase/etc/kbha.pid'
location='production'
failover='automatic'
failover_need_server_alive='none'
sync_in_same_location='0'
synchronous='sync'
recovery='standby'
auto_cluster_recovery_level='0'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6
promote_command='/home/kingbase/cluster/tptc/rh6/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/tptc/rh6/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/tptc/rh6/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/tptc/rh6/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'
如下图所示,和普通集群不同之处(普通集群为“DG”):
五、集群状态信息
如下图所示,node1和node2节点所属的location为‘production’,node3所属的location为‘local_disaster’(普通集群location默认为‘default’)。
流复制和复制槽状态信息:
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend
_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag |
flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+-----------+-----------+-----------+------------+----------------+---
-------------+----------------+---------------+------------+-------------------------------
13690 | 16385 | esrep | node2 | 192.168.1.102 | | 17911 | 2023-07-03 15:3
6:33.491733+08 | | streaming | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0:00:00.000314 | 0:
00:00.001107 | 0:00:00.001110 | 1 | sync | 2023-07-03 15:36:51.003016+08
13710 | 16385 | esrep | node3 | 192.168.1.103 | | 60118 | 2023-07-03 15:3
6:35.228838+08 | | streaming | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0:00:00.000385 | 0:
00:00.001123 | 0:00:00.001126 | 2 | potential | 2023-07-03 15:36:56.435935+08
(2 rows)
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin
| restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------
+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 13690 | 1231 |
| 0/D000EB0 |
repmgr_slot_3 | | physical | | | f | t | 13710 | 1178 |
| 0/D000EB0 |
(2 rows)
六、总结
通过脚本部署同城双中心集群和普通集群部署操作基本一致,注意同城双中心一些参数,在后面的集群管理的failover切换中将给大家详解其功能。