KingbaseES V8R6集群案例之---同城双中心集群部署

案例说明:
本案例描述了在KingbaseES V8R6下部署同城双中心集群的过程,通过脚本的方式执行执行部署,部署方式和普通集群脚本部署基本一致。

适用版本:
KingbaseES V8R6

集群架构:

[kingbase@node101 ~]$ cat /etc/hosts
192.168.1.101   node1
192.168.1.102   node2
192.168.1.103   node3

一、集群部署相关文件

如下所示,将部署所需的文件存放到相关目录下:

[kingbase@node101 r6_install]$ ls -lh
total 389M
-rwxrwxr-x 1 kingbase kingbase 155K Jun 27 10:25 cluster_install.sh
-rw-rw-r-- 1 kingbase kingbase 386M Jun 27 10:25 db.zip
-rw-rw-r-- 1 kingbase kingbase  14K Jun 27 10:49 install.conf
-rw-r--r-- 1 kingbase kingbase 3.4K Jun 27 10:26 license.dat
-rw-rw-r-- 1 kingbase kingbase 2.5M Jun 27 10:25 securecmdd.zip
-rwxrwxr-x 1 kingbase kingbase 7.3K Jun 27 10:25 trust_cluster.sh

二、部署配置文件
如下所示,为install.conf脚本部署的配置文件信息:

[kingbase@node101 r6_install]$ cat install.conf |grep -v ^#|grep -v ^$
[install]
on_bmj=0
all_ip=()
witness_ip=""
production_ip=(192.168.1.101 192.168.1.102)
local_disaster_recovery_ip=(192.168.1.103)
remote_disaster_recovery_ip=()
install_dir="/home/kingbase/cluster/tptc/rh6"
zip_package="/home/kingbase/r6_install/db.zip"
license_file=(license.dat)
db_user="system"                 # the user name of database
db_port="54321"                  # the port of database, defaults is 54321
db_mode="oracle"                 # database mode: pg, oracle
db_auth="scram-sha-256"          # database authority: scram-sha-256, md5, default is scram-sha-256
db_case_sensitive="no"          # database case sensitive settings: yes, no. default is yes - case sensitive; no - case insensitive (NOTE. cannot set to 'no' when db_mode="pg").
db_checksums="yes"               # the checksum for data: yes, no. default is yes - a checksum is calculated for each data block to prevent corruption; no - nothing to do.
archive_mode="on"                # enables archiving; off, on, or always
db_encoding=""                   # Cararcter set encoding to use in the new database.Specify a tring constant,or an integer encoding number, default value provided by locale command.
db_collate=""                    # Collation order(LC_COLLATE) to use in the new database,This affects the sort order applied to strings, default value provided by locale command.
db_ctype=""                      # Character classification(LC_CTYPE) to use int the new database. This affects the categorization of characters, default value provided by locale command.
other_db_init_options=""         # addional initdb options,such as "--scenario-tuning"
tcp_keepalives_idle="2"          # (integer; default: 7200; since Linux 2.2)
                                 # The  number  of  seconds  a  connection  needs to be idle before TCP begins sending out keep-alive counts.  Keep-alives are sent only when the
                                 # SO_KEEPALIVE socket option is enabled.  The default value is 7200 seconds (2 hours).  An idle connection is terminated after approximately  an
                                 # additional 11 minutes (9 counts an interval of 75 seconds apart) when keep-alive is enabled.
tcp_keepalives_interval="2"      # (integer; default: 75; since Linux 2.4)
                                 # The number of seconds between TCP keep-alive counts.
tcp_keepalives_count="3"         # (integer; default: 9; since Linux 2.2)
                                 # The maximum number of TCP keep-alive counts to send before giving up and killing the connection if no response is obtained from the other end.
tcp_user_timeout="9000"          # (since Linux 2.6.37)
connection_timeout="10"          # connection timeout when use ssh or sys_securecmdd
wal_sender_timeout="30000"       # in milliseconds; 0 disables
wal_receiver_timeout="30000"     # time that receiver waits for
                                 # communication from master
                                 # in milliseconds; 0 disables
trusted_servers="192.168.1.1"
running_under_failure_trusted_servers='on'
data_directory="/data/kingbase/tptc/rh6/data"
waldir=''
virtual_ip=""
net_device=()
net_device_ip=()
ipaddr_path="/sbin"
arping_path=""
ping_path="/bin"
super_user="root"
execute_user="kingbase"
deploy_by_sshd=1                 # choose whether to use sshd when deploy, 0 means not to use (deploy by sys_securecmdd), 1 means to use (deploy by sshd), default value is 1; when on_bmj=1, it will auto set to no(deploy_by_sshd=0)
use_scmd=0                       # Is the cluster running on sys_securecmdd or sshd? 1 means yes (on sys_securecmdd), 0 means no (on sshd), default value is 1; when on_bmj=1, it will auto set to yes(use_scmd=1)
reconnect_attempts="10"          # the number of retries in the event of an error
reconnect_interval="6"           # retry interval
recovery="standby"               # the way of cluster recovery: standby/automatic/manual
ssh_port="22"                    # the port of ssh, default is 22
scmd_port="8890"                 # the port of sys_securecmdd, default is 8890
auto_cluster_recovery_level='1'
use_check_disk='off'
synchronous='sync'
sync_in_same_location=0
failover_need_server_alive='off'

Tips:
如下图所示,不指定all_ip参数:

注意以下参数(普通集群部署不具有):

## For ha_running_mode='TPTC', if the sync nodes have the same location with primary ?
##    0:    some nodes could be sync nodes. (don't care what the location is)
##    1:    only the nodes have same location with primary, could be sync nodes.
## the default is 0. (when ha_running_mode='DG' or synchronous='async', this parameter has no effect)

sync_in_same_location=0  ### 灾备中心备库可以同步方式(sync)建立流复制

## For ha_running_mode='TPTC', if we can do failover when the standby node has different location with failure primary?
##    'off':  can not do failover, if the standby node has different location with primary.
##    'none': can do failover.
##    'any':  can do failover, need ANY server alive in primary's location if the standby node has different location with primary.
##    'all':  can do failover, need ALL servers alive in primary's location if the standby node has different location with primary.
## the default is off. (when ha_running_mode='DG', this parameter has no effect)

failover_need_server_alive='none'   ###支持中心之间failover启动切换

三、执行部署

执行脚本部署:
[kingbase@node101 r6_install]$ sh cluster_install.sh

四、repmgr.conf 配置

[kingbase@node101 bin]$ cat ../etc/repmgr.conf
use_scmd=off
ha_running_mode='TPTC'
node_id=1
node_name='node1'
conninfo='host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'
connection_check_type='mix'

data_directory='/data/kingbase/tptc/rh6/data'
log_file='/home/kingbase/cluster/tptc/rh6/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/tptc/rh6/kingbase/log/kbha.log'
sys_bindir='/home/kingbase/cluster/tptc/rh6/kingbase/bin'
scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -p 22 -o ServerAliveInterval=2 -o ServerAliveCountMax=3'

trusted_servers='192.168.1.1'
running_under_failure_trusted_servers='on'
repmgrd_pid_file='/home/kingbase/cluster/tptc/rh6/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/tptc/rh6/kingbase/etc/kbha.pid'

location='production'
failover='automatic'
failover_need_server_alive='none'
sync_in_same_location='0'
synchronous='sync'
recovery='standby'
auto_cluster_recovery_level='0'
monitoring_history='no'
reconnect_attempts=10
reconnect_interval=6

promote_command='/home/kingbase/cluster/tptc/rh6/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/tptc/rh6/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/tptc/rh6/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/tptc/rh6/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
ping_path='/bin'
use_check_disk='off'

如下图所示,和普通集群不同之处(普通集群为“DG”):

五、集群状态信息
如下图所示,node1和node2节点所属的location为‘production’,node3所属的location为‘local_disaster’(普通集群location默认为‘default’)。

流复制和复制槽状态信息:

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend
_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |   write_lag    |
flush_lag    |   replay_lag   | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+-----------+-----------+-----------+------------+----------------+---
-------------+----------------+---------------+------------+-------------------------------
 13690 |    16385 | esrep   | node2            | 192.168.1.102 |                 |       17911 | 2023-07-03 15:3
6:33.491733+08 |              | streaming | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0/D000EB0  | 0:00:00.000314 | 0:
00:00.001107 | 0:00:00.001110 |             1 | sync       | 2023-07-03 15:36:51.003016+08
 13710 |    16385 | esrep   | node3            | 192.168.1.103 |                 |       60118 | 2023-07-03 15:3
6:35.228838+08 |              | streaming | 0/D000EB0 | 0/D000EB0 | 0/D000EB0 | 0/D000EB0  | 0:00:00.000385 | 0:
00:00.001123 | 0:00:00.001126 |             2 | potential  | 2023-07-03 15:36:56.435935+08
(2 rows)

test=# select * from sys_replication_slots;
   slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin
| restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------
+-------------+---------------------
 repmgr_slot_2 |        | physical  |        |          | f         | t      |      13690 | 1231 |
| 0/D000EB0   |
 repmgr_slot_3 |        | physical  |        |          | f         | t      |      13710 | 1178 |
| 0/D000EB0   |
(2 rows)

六、总结
通过脚本部署同城双中心集群和普通集群部署操作基本一致,注意同城双中心一些参数,在后面的集群管理的failover切换中将给大家详解其功能。

posted @ 2023-07-03 15:53  天涯客1224  阅读(69)  评论(0编辑  收藏  举报