KingbaseES V8R6 集群运维系列之 -- 脚本方式部署集群

案例说明:
KingbaseES V8R6集群部署一般可采用图形化方式快速部署,但在生产一线,有的服务器系统未启用图形化环境或无法启用图形界面,所以对于KingbaseES V8R6集群部署需采用手工字符界面方式部署,本次文档记录了在生产环境下的字符界面部署操作步骤及部署中的注意事项和故障案例。
Tips:
本案例适用于早期的V8R6版本,只支持主机节点之间的ssh互信通讯环境。

    1)本案例在通用机环境下部署完成。
    2)需要首先安装KingbaseES V8R6 Cluster版本的软件包。
    3)本案例主要用于系统环境不能提供图形化部署或者图形化部署中出现故障时。
    4)本案例在通用机环境完成,专用机环境可用于参考。
    5)通用机环境的操作基本由kingbase用户完成。
    6)在通过脚本一键部署R6集群时,请先做好系统环境的准备工作:(如ssh信任关系、防火墙、selinux配置、进程资源管理配置、用户创建、ip分配等)。

一、系统环境

1.1 适用版本
KingbaseES V8R6

1.2 集群架构

二、配置系统环境(all nodes)

2.1 创建kingbase用户

[root@ECOLABAPP37 ~]# id kingbase
uid=1002(kingbase) gid=1002(kingbase) groups=1002(kingbase)  

2.2 关闭主机系统防火墙

[root@ECOLABAPP37 Scripts]# systemctl stop firewalld
[root@ECOLABAPP37 Scripts]# systemctl disable firewalld

2.3 配置selinux

[kingbase@node3 ~]$ cat /etc/sysconfig/selinux |grep -v  ^#|grep -v ^$
SELINUXTYPE=targeted 
SELINUX=disabled

三、通过脚本构建集群

3.1 配置部署环境

=== 相关集群部署脚本,在集群软件包安装后,在集群软件安装目录下可以查找到===

Kingbase用户在宿主目录下创建文件夹:
[kingbase@ECOLABAPP37 ~] mkdir R6_install
将部署脚本、配置文件及数据库license.dat文件放置到当前目录下。

[kingbase@ECOLABAPP37 ~]$ cd R6_install/
[kingbase@ECOLABAPP37 R6_install]$ ls -lh
total 80K
-rw------- 1 kingbase kingbase 5.0K Apr 19 17:28 install.conf
-rw-r--r-- 1 kingbase kingbase 2.9K Apr 19 17:20 license.dat
-r-xr-xr-x 1 kingbase kingbase 2.1K Apr 19 16:57 trust_cluster.sh
-rw------- 1 kingbase kingbase  32K Apr 19 16:57 V8R6_cluster_install.sh
-rw------- 1 kingbase kingbase  31K Apr 19 16:57 V8R6一键部署集群脚本操作手册.docx

1) 查看和编辑集群配置文件(根据系统环境进行修改)

 [kingbase@node3 ~]$ cat install.conf |grep -v ^#|grep -v ^$
on_bmj=0
all_ip=(10.248.52.* 10.248.52.*)
install_dir="/home/kingbase/cluster"
zip_package="/opt/Kingbase/ES/V8/DeployTools/zip/Aarch64/db.zip"
license_file=(license.dat)
db_user="system"                 # the user name of database
db_password="123456"             # the password of database
db_port="54321"                  # the port of database, defaults is 54321
db_mode="oracle"                 # database mode: pg, oracle
db_auth="scram-sha-256"          # database authority: scram-sha-256, md5, default is scram-sha-256
trusted_servers="10.248.*.1"
virtual_ip="10.248.52.*/20"
net_device=(nm-bond nm-bond)
ipaddr_path="/sbin"
arping_path="/usr/sbin"
ping_path="/bin"
super_user="root"
execute_user="kingbase"
reconnect_attempts="6"           # the number of retries in the event of an error
reconnect_interval="10"          # retry interval
recovery="manual"                # the way of cluster recovery: automatic/manual
ssh_port="22"                    # the port of ssh, default is 22

2) 配置主机间ssh互信(可以手工配置,也可以通过以下脚本配置)
配置ssh互信脚本:trust_cluster.sh
Tips:
需要配置kingbase用户之间、root用户之间、kingbase和root用户之间,配置完成后检查用户信任关系

3)cluser部署脚本(部分内容)

[kingbase@ECOLABAPP37 R6_install]$ ls V8R6_cluster_install.sh

3.2 执行脚本部署

注意:
必须将license.dat文件也存放到当前目录下,缺少license.dat将会出现错误。
当前集群手工部署文件存储目录:

[kingbase@ECOLABAPP37 ~]$ cd R6_install/
[kingbase@ECOLABAPP37 R6_install]$ ls -lh
total 80K
-rw------- 1 kingbase kingbase 5.0K Apr 19 17:28 install.conf
-rw-r--r-- 1 kingbase kingbase 2.9K Apr 19 17:20 license.dat
-r-xr-xr-x 1 kingbase kingbase 2.1K Apr 19 16:57 trust_cluster.sh
-rw------- 1 kingbase kingbase  32K Apr 19 16:57 V8R6_cluster_install.sh
-rw------- 1 kingbase kingbase  31K Apr 19 16:57 V8R6一键部署集群脚本操作手册.docx

执行部署脚本:
根据输出日志信息,判断部署过程中的故障。完整阅读输出日志,结合图形化部署工具,可以加深repmgr集群部署的工作机制。

[kingbase@ECOLABAPP37 ~]$ cd R6_install/
[kingbase@ECOLABAPP37 R6_install]$ sh V8R6_cluster_install.sh 

===读取install.conf文件,获取集群节点配置信息,创建集群数据目录环境===
[CONFIG_CHECK] file format is correct ... OK
.......
[INSTALL] Copy license to /home/kingbase/cluster/kingbase/../: license.dat
[INSTALL] success to copy /home/kingbase/R6_install/license.dat to /home/kingbase/cluster/kingbase/../ on 10.248.52.*

=== 初始化primary节点数据库,并启动数据库服务===
[INSTALL] begin to init the database on "10.248.52.*" ...
.......
[INSTALL] /home/kingbase/cluster/kingbase/bin/sys_ctl -w -t 60 -l /home/kingbase/cluster/kingbase/logfile -D er to start.... 
server started
[INSTALL] start up the database on "10.248.52.*" ... OK

=== 创建repmgr元数据库===
[INSTALL] create the database "esrep" and user "esrep" for repmgr ...
CREATE DATABASE
CREATE ROLE
......
NOTICE: "repmgr" extension successfully installed

=== 注册primary库,并clone and register standby库===
NOTICE: PING 10.248.52.* (10.248.52.*) 56(84) bytes of data.
--- 10.248.52.* ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
.......
NOTICE: standby node "node2" (ID: 2) successfully registered
[INSTALL] register the standby on "10.248.52.*" ... OK

===启动集群===
[INSTALL] start up the whole cluster ...
2021-04-19 17:31:58 Ready to start all DB ...
......

=== 查看集群节点状态,集群部署完成===
2021-04-19 17:32:18 repmgrd on "[10.248.52.*]" start success.
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 62956 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 25769 | no      | 0 second(s) ago    
.......
[INSTALL] start up the whole cluster ... OK
=== 根据以上信息获知,集群手工部署成功!===

四、查看集群部署后的状态

4.1 主备流复制状态

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_s
tart         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag |
 replay_lag | sync_priority | sync_state |          reply_time           
-------+----------+---------+------------------+---------------+-----------------+------
 62426 |    16385 | esrep   | node2            | 10.248.52.* |                 |       52926 | 2021-04-19 17:31:
57.986053+08 |              | streaming | 0/300B810 | 0/300B810 | 0/300B810 | 0/300B810  |           |           |
            |             1 | quorum     | 2021-04-19 15:19:35.941223+08
(1 row)

4.2 查看集群节点状态

[kingbase@ECOLABAPP37 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+-------+---------+-----------+----------+----------+----------+----------+--------
 1  | node1 | primary | * running |          | default  | 100      | 1        | user=esrep dbname=esrep port=54321 host=10.248.52.* connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | user=esrep dbname=esrep port=54321 host=10.248.52.* connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

4.3 测试主备流复制同步

主库DML操作:
test=# create database prod;
CREATE DATABASE
test=# \c prod;
You are now connected to database "prod" as user "system".

prod=# create table t1 (id int);
CREATE TABLE
prod=# insert into t1 values (10),(20),(30);
INSERT 0 3
prod=# select * from t1;
 id 
----
 10
 20
 30
(3 rows)

备库查看同步数据:
[kingbase@ECOLABAPP38 ~]$ ksql -U system test
ksql (V8.0)
Type "help" for help.

test=# \c prod
You are now connected to database "prod" as user "system".
prod=# select * from t1;
 id 
----
 10
 20
 30
(3    rows)

五、部署故障案例

故障现象说明:
没有将license.dat文件存放到集群部署脚本的当前目录下,在执行部署脚本时,出现故障,无法访问到license.dat文件,后将license.dat文件拷贝到此目录后,部署成功。

[kingbase@ECOLABAPP37 ~]$ cd R6_install/
[kingbase@ECOLABAPP37 R6_install]$ ls -lh
total 80K
-rw------- 1 kingbase kingbase 5.0K Apr 19 17:28 install.conf
-r-xr-xr-x 1 kingbase kingbase 2.1K Apr 19 16:57 trust_cluster.sh
-rw------- 1 kingbase kingbase  32K Apr 19 16:57 V8R6_cluster_install.sh
-rw------- 1 kingbase kingbase  31K Apr 19 16:57 V8R6一键部署集群脚本操作手册.docx
[kingbase@ECOLABAPP37 R6_install]$ sh V8R6_cluster_install.sh 
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] check if the virtual ip "10.248.52.*" already exist ...
......
[INSTALL] /home/kingbase/cluster/kingbase/bin/sys_ctl -w -t 60 -l /home/kingbase/cluster/kingbase/logfile -D /home/kingbase/cluster/kingbase/data start
waiting for server to start.... stopped waiting
sys_ctl: could not start server
Examine the log output.

=注意:以上故障是在启动数据库服务时,数据库服务启动失败;通过手工执行数据库服务启动命令(如下所示),查看日志的反馈,发现无法读取license文件,导致数据库启动失败。所以必须将license.dat文件也存放到当前目录下,以上错误就是当前目录下缺少license.dat,数据库启动是无法读取到license文件。=

在排除故障时,可以手工执行一下命令,然后查看故障日志:

/home/kingbase/cluster/kingbase/bin/sys_ctl -w -t 60 -l /home/kingbase/cluster/kingbase/logfile -D /home/kingbase/cluster/kingbase/data

posted @ 2021-10-28 20:07  KINGBASE研究院  阅读(838)  评论(0编辑  收藏  举报