瀚高数据库企业版集群(hghac2.0+hgproxy)-通用机部署手册
一、 集群架构
1. 集群各组件及其功能
zookeeper/etcd :分布式键值存储系统,具有提供分布式独享锁和选举的功能。zookeeper支持选举的功能为Java版,etcd基于Go语言实现。
hghac:数据库集群管理软件。可通过参数文件来配置自动初始化数据库、搭建流复制、指定zookeeper/etcd节点等。通过api接口连接到分布式存储系统中(zookeeper/etcd),向其插入键值记录、主备信息、连接信息等;通过zookeeper/etcd对其他节点进行心跳检测;在数据库主备切换或恢复时读取zookeeper/etcd中存储的主备信息来判断各节点状态。
hgproxy:瀚高数据库中间件。中间件处于服务器和客户端的中间层,可以通过客户端连接proxy访问HighGoDB。客户单连接proxy就如同直接连接数据库服务器一样,而proxy连接后端服务器就如同客户端连接服务器一样。Proxy现在提供的主要功能包括客户端连接负载的均衡、SQL语句的读写分离。
2. 各组件常用端口
组件名称 | hgproxy | zookeeper | etcd | hghac | HGDB |
---|---|---|---|---|---|
常用(默认)端口 | 5888 | 2181、2888、3888 | 2379、2380 | 8008 | 5866 |
二、 实施过程
1. 节点分配
IP(举例) | 192.168.197.11 | 192.168.197.12 | 192.168.197.13 |
---|---|---|---|
角色 | 数据库主库、proxy节点 | 数据库备库、proxy节点 | 数据库备库、proxy节点 |
使用的组件 | etcd、hgdb、hghac、hgproxy | etcd、hgdb、hghac、hgproxy | etcd、hgdb、hghac、hgproxy |
2. 前期准备
此项中内容如无特别说明,需在所有节点配置/执行。
2.1 安装包检查
检查确认操作系统版本及cpu架构:
(1)Redhat系/银河麒麟v10/UOS20系操作系统举例如下:(国产环境可通过“右键我的电脑—属性”查看)
# cat /etc/*release
|
(2)银河麒麟v10操作系统可通过nkvers命令确认是否为银河麒麟v10sp1或v10sp2版本;如无输出结果,则为银河麒麟v10版本。
(3)使用如下命令确认操作系统支持哪种管理器,便于判断使用rpm包还是deb包:
# rpm --version
|
(4)其他linux系列的操作系统依据实际情况判断即可。
注意:安装前请确保使用的介质与实际环境一致,不同环境之间安装包不通用;如无版本强制要求,应选择符合合约约定的大版本下最新已发布的小版本+最新补丁集作为部署版本。
如前期已有适配的版本,建议按适配版本实施。
通用机环境下hgdb/hghac/hgproxy均为独立的安装包.
安装包名称举例如下:
【瀚高数据库安装包】hgdb6.0.4-enterprise-xxx-xxx64-xxxxxxxx.rpm
或hgdb6.0.4-enterprise-xxx-xxx64-xxxxxxxx.deb
【HGHAC安装包】hghac-2.1.4.1.ky10.aarch64-20230117.rpm
【HGPROXY安装包】hgproxy-4.0.13-0db7e9b.xxx64.rpm
2.2 配置主机名
Redhat系/银河麒麟v10/UOS20系操作系统参考如下:
设置命令:
# hostnamectl set-hostname <主机名>
|
建议为不同的数据库服务器配置唯一的主机名,并在/etc/hosts文件中添加各节点IP及主机名信息。
命名方式可参考如下:yyxtdb01(应用系统名称+db+序号)。不得超过16个字符;不包含特殊字符,禁止使用下划线。
举例如下:
# cat /etc/hosts
|
2.3 服务器之间网络检查
确保各节点之间网卡支持千兆及以上网络
Redhat系/银河麒麟v10/UOS20系操作系统参考命令如下:(将如下网卡名称ens33替换为实际名称)
[root@xxdb01 ~]# ethtool ens33
|
各节点之间测试网络传输速度,注意要测试双向的网络传输速度,命令如下:
结果小于50mb/s具有一定风险
# scp local_file remote_username@remote_ip:remote_folder
|
2.4 数据库服务器文件系统目录要求
建议如下配置,数据库安装目录、数据目录、归档目录、备份目录单独划分,可减少磁盘IO争用、提升性能,亦可防止所有文件放在一起因备份文件较大进而引起空间爆满造成的数据库hang住现象。
文件系统名称 | 要求大小 | 用途 |
---|---|---|
/ | >50G | |
/tmp | >10G | |
/home | >20G | |
swap | >32G | |
/opt | >100G | 数据库安装目录 |
/hgdbdata | 根据应用数据量及增长规划提供。 | 数据存储目录 |
/hgdbarch | 根据应用业务量评估;建议500G以上 | 归档日志目录 |
/hgdbbackup | 数据目录的3倍以上 | 数据备份 |
如无独立大盘作为数据存储目录,应及时和客户反馈并让其协调存储工程师挂载存储,非必要情况工程师不能擅自挂载存储。
磁盘性能检查:
举例如下(测试后将生成的测试文件删除):
(1)模拟数据库读写小数据来测试硬盘的IO,命令如下:
# dd if=/dev/zero of=/opt/test bs=32k count=20k oflag=dsync
|
如结果低于20MB/s,则说明磁盘性能偏差,需建议客户方协调硬件工程师检查;否则将可能会系统整体性能造成较大的影响。
(2)模拟大文件写入来测试硬盘的IO,命令如下:
[root@db ~]# dd if=/dev/zero of=/opt/test bs=8192k count=100 oflag=dsync
|
如结果低于200MB/s,则说明磁盘性能偏差,需建议客户方协调硬件工程师检查;否则将可能会系统整体性能造成较大的影响。
注意:务必不要把if和of的值写反,否则有数据清除的风险。
2.5 数据库服务器内存CPU要求
如为测试环境,则无硬性要求。
如为生产环境,建议内存大于32GB,逻辑CPU个数大于8颗,具体需依据实际业务场景及硬件配置等综合判断。
2.6 操作系统rpm包依赖安装
目前HGDB中所需的操作系统依赖包,均已打包在数据库及集群组件的安装包中,无特殊需求无需单独安装。
2.7 关闭avahi-daemon及NetworkManager服务
操作系统avahi-daemon可能会影响集群正常通信,建议关闭。
Redhat系/银河麒麟v10/UOS20系操作系统举例如下:
# systemctl stop avahi-daemon
|
2.8 调整limits.conf
每台服务器使用root用户进行操作
Redhat系/银河麒麟v10/UOS20系操作系统举例如下:
修改/etc/security/limits.conf 配置文件,增加如下内容:
vi /etc/security/limits.conf
|
编辑vi /etc/pam.d/login 文件,添加一行配置:session required pam_limits.so,如图,保存退出重启服务。(部分操作系统默认含有此行,可不添加)
修改完成后验证:
# ulimit -a
|
2.9 防火墙配置
Redhat系/银河麒麟V10操作系统修改方式如下:
*关闭防火墙【需经客户同意】
|
关闭selinux:
# sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
|
如有特殊要求,无法完全关闭防火墙,则开放HAC集群所需的端口(各节点均开),如下:
开放etcd端口:
firewall-cmd --permanent --add-port=2379/tcp
|
开放数据库端口:
firewall-cmd --permanent --add-port=5866/tcp
|
开放hghac端口:
firewall-cmd --permanent --add-port=8008/tcp
|
开放hgproxy端口:
firewall-cmd --permanent --add-port=5888/tcp
|
UOS20操作系统使用如下命令禁用防火墙并禁止开机自启:
root@highgo:~# systemctl stop ufw.service
|
2.10 配置环境变量
一般数据库节点除配置数据库的环境变量外,还需配置hac及etcd的相关环境变量,示例如下:(如下参数中的路径依据实际情况修改)
[root@xxdb01 ~]# vi /opt/HighGo6.0.4-cluster/etc/highgodb.env
|
2.11 集群服务器之间时间同步要求
根据客户标准设置OS时区,国内通常为东八区”Asia/Shanghai”.
Redhat系/银河麒麟V10/UOS20操作系统参考如下:
使用命令”timedatectl”查看和修改OS时区。
查看OS时区:
# timedatectl status
|
修改OS时区:
# timedatectl set-timezone "Asia/Shanghai"
|
修改OS时间:
[root@xxdb01 ~]# date -s '20200928 11:07:00'
|
风险说明:如若集群节点间时间差别较大,则可能导致集群状态异常或集群无法启动。建议集群个节点之间配置时钟同步,配置专用时间服务器;如无单独的时间服务器,可在集群各节点间配置NTP时间同步,并确保时间同步服务能够开机自启。
该项需操作系统厂家工程师完成。
3. 安装过程
整个的安装过程可分为以下步骤:
- 安装软件包,根据实际情况在每个节点上安装对应安装包
- 配置etcd或zookeeper,组建Etcd或zookeeper集群
- 主库初始化数据库
- 配置主库hghac并启动
- 待主库hghac状态正常后配置备库hghac并启动
- 待hghac集群状态正常后配置hgproxy并启动
- 验证hgproxy正常后根据实际情况各节点启动对应组建自启服务
3.1 安装各软件包
(本步骤各节点均需操作)
验证安装包MD5值:
[root@xxdb01 ~]# md5sum hgdb6.0.4-enterprise-xxx-xxx64-xxxxxxxx.rpm
|
查看生成的MD5值,若与MD5文件中一致,则可继续进行下面的操作,若不一致则建议重新上传并检验。
RPM包安装:
[root@xxdb01 highgo]# rpm -ivh hgdb6.0.4-enterprise-xxx-xxx64-xxxxxxxx.rpm
|
可选项:
--nodeps 忽略依赖包进行安装
|
DEB包安装:
# dpkg -i XXXX.deb
|
查看安装后的状态:
#dpkg -l | grep hgdb
|
注意:如安装包选择错误或因其他原因需卸载重装,命令参考如下:
RPM包卸载
# rpm -e XXX.rpm
|
注意:卸载之前需确认数据库进程是否关闭
DEB包卸载
# dpkg -r xxx.deb
|
注意:卸载之前需确认数据库进程是否关闭
3.2 配置并启动ETCD
Tips:需将etcd各节点一块启动,验证状态正常后再配置hgahc
3.2.1 修改etcd.yml文件
需修改/usr/local/hghac/etcd/etcd.yml里的内容,修改内容如下:
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd_01' ##etcd实例名称,最好用主机名或IP进行区分
# Path to the data directory.
data-dir: /usr/local/hghac/etcd/etcd01 ##etcd数据保存目录;data目录绝对地址
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100 ##检测间隔的时间(毫秒)
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000 ##超时时间(毫秒)
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://192.168.197.11:2380 ##本member侧使用,用于监听其他member发送信息的地址
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://192.168.197.11:2379,http://127.0.0.1:2379 ##本member侧使用,用于监听etcd客户端发送信息的地址,填写127.0.0.1方便使用etcdctl工具
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://192.168.197.11:2380 ##其他member使用,其他member通过该地址与本member交互信息。该参数的value一定要同时在–initial-cluster参数中存在
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://192.168.197.11:2379 ##etcd客户端使用,客户端通过该地址与本member交互信息
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: etcd_01=http://192.168.197.11:2380,etcd_02=http://192.168.197.12:2380,etcd_03=http://192.168.197.13:2380
## 初始集群成员列表
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new' ##指示本次是否为新建集群
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true ##接受etcd V2客户端请求
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# DEPRECATED: Path to the client server TLS CA file.
ca-file:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# DEPRECATED: Path to the peer server TLS CA file.
ca-file:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
peer-client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
# Specify a particular log level for each etcd package (eg: 'etcdmain=CRITICAL,etcdserver=DEBUG'.
log-package-levels:
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-output: default
# Force to create a new one member cluster.
force-new-cluster: false
各节点配置信息如下
192.168.197.11:
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd_01' ##etcd实例名称,最好用主机名或IP进行区分
# Path to the data directory.
data-dir: /usr/local/hghac/etcd/etcd01 ##etcd数据保存目录;data目录绝对地址
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100 ##检测间隔的时间(毫秒)
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000 ##超时时间(毫秒)
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://192.168.197.11:2380 ##本member侧使用,用于监听其他member发送信息的地址
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://192.168.197.11:2379,http://127.0.0.1:2379 ##本member侧使用,用于监听etcd客户端发送信息的地址,填写127.0.0.1方便使用etcdctl工具
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://192.168.197.11:2380 ##其他member使用,其他member通过该地址与本member交互信息。该参数的value一定要同时在–initial-cluster参数中存在
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://192.168.197.11:2379 ##etcd客户端使用,客户端通过该地址与本member交互信息
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: etcd_01=http://192.168.197.11:2380,etcd_02=http://192.168.197.12:2380,etcd_03=http://192.168.197.13:2380
## 初始集群成员列表
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new' ##指示本次是否为新建集群
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true ##接受etcd V2客户端请求
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# DEPRECATED: Path to the client server TLS CA file.
ca-file:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# DEPRECATED: Path to the peer server TLS CA file.
ca-file:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
peer-client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
# Specify a particular log level for each etcd package (eg: 'etcdmain=CRITICAL,etcdserver=DEBUG'.
log-package-levels:
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-output: default
# Force to create a new one member cluster.
force-new-cluster: false
192.168.197.12:
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd_02' ##etcd实例名称,最好用主机名或IP进行区分
# Path to the data directory.
data-dir: /usr/local/hghac/etcd/etcd02 ##etcd数据保存目录;data目录绝对地址
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100 ##检测间隔的时间(毫秒)
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000 ##超时时间(毫秒)
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://192.168.197.12:2380 ##本member侧使用,用于监听其他member发送信息的地址
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://192.168.197.12:2379,http://127.0.0.1:2379 ##本member侧使用,用于监听etcd客户端发送信息的地址,填写127.0.0.1方便使用etcdctl工具
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://192.168.197.12:2380 ##其他member使用,其他member通过该地址与本member交互信息。该参数的value一定要同时在–initial-cluster参数中存在
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://192.168.197.12:2379 ##etcd客户端使用,客户端通过该地址与本member交互信息
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: etcd_01=http://192.168.197.11:2380,etcd_02=http://192.168.197.12:2380,etcd_03=http://192.168.197.13:2380
## 初始集群成员列表
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new' ##指示本次是否为新建集群
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true ##接受etcd V2客户端请求
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# DEPRECATED: Path to the client server TLS CA file.
ca-file:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# DEPRECATED: Path to the peer server TLS CA file.
ca-file:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
peer-client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
# Specify a particular log level for each etcd package (eg: 'etcdmain=CRITICAL,etcdserver=DEBUG'.
log-package-levels:
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-output: default
# Force to create a new one member cluster.
force-new-cluster: false
192.168.197.13:
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd_02' ##etcd实例名称,最好用主机名或IP进行区分
# Path to the data directory.
data-dir: /usr/local/hghac/etcd/etcd02 ##etcd数据保存目录;data目录绝对地址
# Path to the dedicated wal directory.
wal-dir:
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100 ##检测间隔的时间(毫秒)
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000 ##超时时间(毫秒)
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://192.168.197.13:2380 ##本member侧使用,用于监听其他member发送信息的地址
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://192.168.197.13:2379,http://127.0.0.1:2379 ##本member侧使用,用于监听etcd客户端发送信息的地址,填写127.0.0.1方便使用etcdctl工具
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://192.168.197.13:2380 ##其他member使用,其他member通过该地址与本member交互信息。该参数的value一定要同时在–initial-cluster参数中存在
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://192.168.197.13:2379 ##etcd客户端使用,客户端通过该地址与本member交互信息
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: etcd_01=http://192.168.197.11:2380,etcd_02=http://192.168.197.12:2380,etcd_03=http://192.168.197.13:2380
## 初始集群成员列表
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new' ##指示本次是否为新建集群
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true ##接受etcd V2客户端请求
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# DEPRECATED: Path to the client server TLS CA file.
ca-file:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# DEPRECATED: Path to the peer server TLS CA file.
ca-file:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
peer-client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
# Specify a particular log level for each etcd package (eg: 'etcdmain=CRITICAL,etcdserver=DEBUG'.
log-package-levels:
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-output: default
# Force to create a new one member cluster.
force-new-cluster: false
3.2.2 配置etcd服务
安装完成后默认未配置etcd服务,相关文件在/usr/local/hghac/etcd/下需手动进行拷贝。
cp /usr/local/hghac/etcd/etcd.service /usr/lib/systemd/system/
systemctl daemon-reload
3.2.3 启动etcd
各节点配置完成后启动etcd(启动时各节点间隔时间不要太长)
[root@xxdb01 ~]# systemctl start etcd.service
|
3.2.4 检查etcd状态
查看服务状态:
[root@xxdb01 ~]# systemctl status etcd
|
查看集群状态:
[root@xxdb01 ~]# etcdctl cluster-health
|
正常情况为:各节点服务均正常运行,etcd集群中含有一个leader,整个集群状态为健康。至此etcd即启动成功。
如发现节点服务未正常运行可根据systemctl status etcd -l或journalctl -xe进行检查;
如节点服务正常,但有一节点未在集群中,需检查etcd.yaml配置文件。
3.2.5 etcdAPI未授权访问漏洞修复
因etcd api接口分为v2及v3两种,现针对各版本分别介绍,按实际api版本选取对应步骤进行操作即可。
V2 版本开启basic认证
①创建root用户
[root@xxdb01 ~]# etcdctl user add root
|
②开启认证
[root@xxdb01 ~]# etcdctl -u root:highgo auth enable
|
③查询角色权限
[root@xxdb01 ~]# etcdctl -u root:highgo role get root
|
④开启认证后验证
[root@xxdb01 ~]# etcdctl role get root
|
不使用密码无法正常获取相关信息。至此,etcd apiv2版本开启basic认证成功。
V3 版本开启basic认证
①创建root用户
[root@xxdb01 ~]#ETCDCTL_API=3 /usr/local/hghac/etcd/etcdctl --endpoints=http://12.0.0.55:2379 user add root
|
②开启认证
[root@xxdb01 ~]# ETCDCTL_API=3 /usr/local/hghac/etcd/etcdctl --endpoints=http://12.0.0.55:2379 --user root:highgo auth enable
|
③查询角色权限
[root@xxdb01 ~]# ETCDCTL_API=3 /usr/local/hghac/etcd/etcdctl --endpoints=http://12.0.0.55:2379 --user root:highgo role get root
|
④开启认证后验证
[root@xxdb01 ~]# ETCDCTL_API=3 /usr/local/hghac/etcd/etcdctl --endpoints=http://12.0.0.55:2379 --user root role get root
|
至此,etcd apiv3版本开启basic认证成功。
3.3 配置数据库主库
3.3.1 初始化数据库(此步骤仅在主节点执行即可)
[highgo@xxdb01 etcd]# initdb
|
3.3.2 数据库参数配置
参数文件优先级为postgresql.auto.conf>hghac.yml>postgresql.conf,同参数不同配置,优先级最高的生效,以下参数配置请根据实际情况自行调整。
listen_addresses = '*'
|
3.4 配置HGHAC组件
3.4.1 配置hghac文件(所有节点均需配置)
配置文件模板及注意事项如下,请按实际情况进行修改
[root@xxdb01 etcd]# cat /usr/local/hghac/hac/hghac.yml
|
使用hghactl命令配置如下两个参数(在hghac.yml文件手动添加如下两个参数可能无法生效)
# /usr/local/hghac/hac/hghactl -c /usr/local/hghac/hac/hghac.yml edit-config -s 'synchronous_mode=true'
|
使用此方式配置后,参数不会写入hghac.yml文件中;需使用hghactl show-config命令查看;默认自动生效,无需重启。
3.4.2 配置hac服务
安装完成后默认未配置hghac服务,相关文件在/usr/local/hghac/hac/下需手动进行拷贝。
cp /usr/local/hghac/hac/hghac.service /usr/lib/systemd/system/
3.4.3 启动服务(所有节点均需配置)
修改hghac服务
# vi /usr/lib/systemd/system/hghac.service
[Unit]
Description=hghac
After=syslog.target network.target
[Service]
Type=simple
User=highgo #将用户和组改为highgo
Group=highgo
EnvironmentFile=/usr/local/hghac/hac/hghac.env
# Start the hghac process
ExecStart=/usr/local/hghac/hac/hghac /usr/local/hghac/hac/hghac.yml
# Send HUP to reload from hghac.yml
ExecReload=/bin/kill -s HUP $MAINPID
#Stop the hghac process,the main process is killed by default
Restart=no
[Install]
WantedBy=multi-user.target
重新加载服务:
# systemctl daemon-reload
# systemctl start hghac.service
3.4.4 查看服务状态
[root@xxdb01highgo]# systemctl status hghac
● hghac.service - hghac
Loaded: loaded (/usr/lib/systemd/system/hghac.service; disabled; vendor preset: disabled)
Active: active (running) since 一 2021-05-31 12:34:47 CST; 48s ago
Main PID: 19037 (hghac)
CGroup: /system.slice/hghac.service
├─19037 /usr/local/hghac/hac/hghac /usr/local/hghac/hac/hghac.yml
├─19038 /usr/local/hghac/hac/hghac /usr/local/hghac/hac/hghac.yml
├─19052 /opt/HighGo6.0.4-cluster/bin/postgres -D /opt/HighGo6.0.4-cluster/data --config-file= /opt/HighGo6.0.4-cluster/data/postgresql.conf --listen_addresses=0.0.0...
├─19056 postgres: ha: logger
├─19059 postgres: ha: checkpointer
├─19060 postgres: ha: background writer
├─19061 postgres: ha: stats collector
├─19065 postgres: ha: highgo highgo 127.0.0.1(57790) idle
├─19071 postgres: ha: walwriter
├─19072 postgres: ha: autovacuum launcher
└─19074 postgres: ha: logical replication launcher
3.4.5 查看集群状态及日志
[root@xxdb01 highgo]# hghactl list
+ Cluster: ha (6968272112388067034) ----+---------+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+--------+---------------------+--------+---------+----+-----------+-----------------+
| xxdb01 | 192.168.197.11:5866 | Leader | running | 2 | | |
+--------+---------------------+--------+---------+----+-----------+-----------------+
如配置了对应环境变量,可直接执行hghactl list
[root@xxdb01 HighGo4.5.6-see]# tail -f /highgo/patroni.log
2021-05-31 12:44:05,210 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:05,227 INFO: no action. i am the leader with the lock
2021-05-31 12:44:15,209 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:15,222 INFO: no action. i am the leader with the lock
2021-05-31 12:44:25,210 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:25,236 INFO: no action. i am the leader with the lock
2021-05-31 12:44:35,210 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:35,222 INFO: no action. i am the leader with the lock
2021-05-31 12:44:45,210 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:45,223 INFO: no action. i am the leader with the lock
2021-05-31 12:44:55,210 INFO: Lock owner: xxdb01; I am xxdb01
2021-05-31 12:44:55,222 INFO: no action. i am the leader with the lock
Hghac主节点状态正常,此时可做备节点
3.4.6 备库启动hghac
①配置hghac.yml
192.168.197.12:
[root@xxdb02 ~]# cat /usr/local/hghac/hac/hghac.yml
restapi:
connect_address: 192.168.197.12:8008
listen: 0.0.0.0:8008
etcd:
hosts: 192.168.197.11:2379,192.168.197.12:2379,192.168.197.13:2379
proxy:
weight: 1
streaming_replication_delay_time: 5000
name: xxdb02
namespace: hgha
scope: ha
bootstrap:
# initdb:
# - encoding: UTF8
# - locale: en_US.UTF-8
# - data-checksums
# - auth: md5
# 如果需要创建账号
# users:
# admin:
# password: Hello@123
dcs:
loop_wait: 10
maximum_lag_on_failover: 5242880
retry_timeout: 10
ttl: 30
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
archive_mode: 'on'
archive_timeout: 1800s
autovacuum_analyze_scale_factor: 0.02
autovacuum_max_workers: 5
autovacuum_vacuum_scale_factor: 0.05
checkpoint_completion_target: 0.9
hot_standby: 'on'
log_autovacuum_min_duration: 0
log_checkpoints: 'on'
log_connections: 'on'
log_disconnections: 'on'
log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
log_lock_waits: 'on'
log_min_duration_statement: 500
log_statement: ddl
log_temp_files: 0
listen_addresses: '*'
max_connections: 2000
max_replication_slots: 5
max_wal_senders: 5
tcp_keepalives_idle: 900
tcp_keepalives_interval: 100
track_functions: all
wal_keep_segments: 8
wal_level: replica
wal_log_hints: 'on'
postgresql:
database: highgo
bin_dir: /opt/HighGo6.0.4-cluster/bin
data_dir: /data/highgo/data
pgpass: /data/highgo/.pgpass
connect_address: 192.168.197.12:5866
listen: 0.0.0.0:5866
authentication:
replication:
password: Hello@123
username: highgo
rewind:
password: Hello@123
username: highgo
superuser:
password: Hello@123
username:highgo
parameters:
archive_command: ' cp %p /data/hgdbbak/archive/%f'
log_destination: csvlog
log_directory: ../hgdb_log #修改为hgdb相关参数
log_file_mode: '0644'
log_filename: highgodb-%d.log #修改为hgdb相关参数
log_rotation_age: 1d
log_truncate_on_rotation: 'on'
logging_collector: 'on'
shared_buffers: 512MB
ssl: 'off'
pg_hba:
- local all all md5
- host all all 0.0.0.0/0 md5
- host all all ::1/128 md5
- local replication all md5
- host replication all 0.0.0.0/0 md5
- host replication all ::1/128 md5
use_unix_socket: false
log:
level: INFO
dir: /data/highgo/hghalog #提前创建日志目录
192.168.197.13:
[root@xxdb03 highgo]# cat /usr/local/hghac/hac/hghac.yml
restapi:
connect_address: 192.168.197.13:8008
listen: 0.0.0.0:8008
etcd:
hosts: 192.168.197.11:2379,192.168.197.12:2379,192.168.197.13:2379
proxy:
weight: 1
streaming_replication_delay_time: 5000
name: xxdb03
namespace: hgha
scope: ha
bootstrap:
# initdb:
# - encoding: UTF8
# - locale: en_US.UTF-8
# - data-checksums
# - auth: md5
# 如果需要创建账号
# users:
# admin:
# password: Hello@123
dcs:
loop_wait: 10
maximum_lag_on_failover: 5242880
retry_timeout: 10
ttl: 30
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
archive_mode: 'on'
archive_timeout: 1800s
autovacuum_analyze_scale_factor: 0.02
autovacuum_max_workers: 5
autovacuum_vacuum_scale_factor: 0.05
checkpoint_completion_target: 0.9
hot_standby: 'on'
log_autovacuum_min_duration: 0
log_checkpoints: 'on'
log_connections: 'on'
log_disconnections: 'on'
log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
log_lock_waits: 'on'
log_min_duration_statement: 500
log_statement: ddl
log_temp_files: 0
listen_addresses: '*'
max_connections: 2000
max_replication_slots: 5
max_wal_senders: 5
tcp_keepalives_idle: 900
tcp_keepalives_interval: 100
track_functions: all
wal_keep_segments: 8
wal_level: replica
wal_log_hints: 'on'
postgresql:
database: highgo
bin_dir: /opt/HighGo6.0.4-cluster/bin
data_dir: /data/highgo/data
pgpass: /data/highgo/.pgpass
connect_address: 192.168.197.13:5866
listen: 0.0.0.0:5866
authentication:
replication:
password: Hello@123
username: highgo
rewind:
password: Hello@123
username: highgo
superuser:
password: Hello@123
username:highgo
parameters:
archive_command: ' cp %p /highgo/hgdbbak/archive/%f'
log_destination: csvlog
log_directory: ../hgdb_log #修改为hgdb相关参数
log_file_mode: '0644'
log_filename: highgodb-%d.log #修改为hgdb相关参数
log_rotation_age: 1d
log_truncate_on_rotation: 'on'
logging_collector: 'on'
shared_buffers: 512MB
ssl: 'off'
pg_hba:
- local all all md5
- host all all 0.0.0.0/0 md5
- host all all ::1/128 md5
- local replication all md5
- host replication all 0.0.0.0/0 md5
- host replication all ::1/128 md5
use_unix_socket: false
log:
level: INFO
dir: /data/highgo/hghalog #提前创建日志目录
②启动hgahc服务
备库启动顺序无先后
[root@xxdb03 highgo]# systemctl start hghac
|
③检查状态
[root@xxdb01 highgo]# /usr/local/hghac/hac/hghactl -c /usr/local/hghac/hac/hghac.yml list
|
主备库在同一时间线,且含有一个leader为正常。如遇到集群状态异常需结合hghac服务状态、patroni日志、数据库日志排查问题原因。
④集群切换验证,务必确保各节点切换均正常
[root@xxdb01 highgo]# /usr/local/hghac/hac/hghactl -c /usr/local/hghac/hac/hghac.yml switchover
|
3.5 配置HGPROXY组件(所有节点均需配置)
Tip:
proxy目前没有高可用功能,故目前的方案是在每个节点均部署hgproxy,应用使用时需将每个节点的信息均配置到jdbc连接串中,各节点信息之间用逗号分隔;如果配置了vip,可直接配置vip+proxy端口。
因proxy每个节点配置与启动步骤均一致,故在此并未分开阐述,各节点均可按照下述步骤进行启动、验证、排查。
3.5.1 配置proxy文件
配置模板及释义如下,请按实际情况填写
[root@xxdb01 etc]# cat /opt/HighGo/tools/hgproxy/etc/proxy.conf
|
3.5.2 初始化proxy
[highgo@xxdb01 etc]# /opt/HighGo/tools/hgproxy/bin/proxy_ctl init -h 本机ip -p 5866 -U highgo -d highgo
|
键入highgo用户密码提示初始化成功即初始化完成,以192.168.197.11为例:
[highgo@xxdb01 etc]# /opt/HighGo/tools/hgproxy/bin/proxy_ctl init -h 192.168.197.11 -U highgo -d highgo -p 5866
|
3.5.3 启动服务
[root@xxdb01 etc]# systemctl start hgproxy-4.0.13.service
|
3.5.4 验证状态
[root@xxdb01 etc]# systemctl status hgproxy-4.0.13
|
3.5.5 尝试连接数据库
[root@xxdb01 etc]# psql -U highgo -d highgo -h 192.168.197.11 -p 5888
|
使用psql正常连接即表示proxy成功启用,可按上述步骤进行其余节点的启用。
3.6 配置vip-manager
3.6.1 配置vip-manager相关文件
[root@xxdb01 ~]# cat vip-manager.yml
interval: 1000
trigger-key: "/hgha/ha/leader" ##格式为hghac.yml中/namespace/scope/leader
trigger-value: "hac1" ##为hghac.yml中name
ip: 192.168.197.15 ##vip
netmask: 24 ##掩码
interface: enp0s3 ##网卡名
hosting-type: basic
dcs-type: etcd ##dcs类型,可选etcd或zookeeper
dcs-endpoints:
http://127.0.0.1:2379,http://192.168.197.11:2379,http://192.168.197.12:2379,http://192.168.197.13:2379 ##dcs节点信息
#etcd-user: "hghac"
#etcd-password: "Julian's secret password"
#etcd-ca-file: "/path/to/etcd/trusted/ca/file"
#etcd-cert-file: "/path/to/etcd/client/cert/file"
#etcd-key-file: "/path/to/etcd/client/key/file"
#consul-token: "Julian's secret token"
retry-num: 2
retry-after: 250
verbose: false
附:vip-manager.yml文件
3.6.2 配置服务
安装完成后默认未配置vip-manager服务,相关文件在/usr/local/hghac/vip-manager/下需手动进行拷贝。
cp /usr/local/hghac/vip-manager/vip-manager.service /usr/lib/systemd/system/
systemctl daemon-reload
3.6.3 启动vip-manager
systemctl start vip-manager
4. 服务自启管理
4.1 关闭原数据库服务自启
(具体服务名称以实际为准,如无服务自启可自动忽略)
[root@xxdb01 system]# systemctl disable hgdb-cluster-6.0.4.service
|
4.2 启用etcd服务自启
[root@xxdb01 system]# systemctl enable etcd.service
|
4.3 启用hghac服务自启
[root@xxdb01 system]# systemctl enable hghac.service
|
4.4 启用hgproxy服务自启
[root@xxdb01 system]# systemctl enable hgproxy-4.0.13.service
|
5. 安装postgis
此项在与地理信息相关的业务系统中使用的可能性较大,若无此类需求可不安装此组件。
5.1 安装postgis包
[root@xxdb01 ~]# rpm -ivh postgis3.0-6.0.4-enterprise-centos7-x86-64-20220127.rpm
|
5.2 创建postgis扩展
[highgo@node1 ~]$ psql -U highgo -d highgo
|
说明:集群状态下,只需要在主端创建postgis,备端会自动同步创建。
6. 测试开发管理工具
如现场环境有开发管理工具则需打开新建连接进行测试确保工具正常使用
示例如下:
6.1 新建连接
6.2 执行测试sql
7. license激活(所有节点均需配置)
注意:
- 激活license之前一定要确认本次实施要求是测试license还是永久license
- 使用与派工单中获取的license,禁止license混用
7.1 数据库启动时会显示剩余时间
日志: 00000: This is a trial edition, validate until 2021-09-28 16:09:09, database will not be able to start up after that time,please apply an official license by that time.
|
7.2 上传license并复制到data目录修改名称为hgdb.lic
[highgo@xxdb01 uplaod]# cp hgdb_12m.lic /opt/HighGo6.0.4-ee/data/hgdb.lic
|
7.3 激活
[highgo@xxdb01 data]$ pg_ctl reload
|
7.4 检查license,确认激活成功
[highgo@xxdb01 data]$ check_lic -D $PGDATA
|
8. 数据库自动备份部署(所有节点均需配置)
备份方式请参考最新版的“瀚高数据库备份手册”。