KingbaseES V8R6 集群运维系列 -- 命令行部署repmgr管理集群+switchover测试

本次部署未使用securecmd/kbha工具,无需普通用户到root用户的互信。

一、环境准备

1、创建OS用户

建立系统数据库安装用户组及用户,在所有的节点执行。
root用户登陆服务器,创建用户组及用户并且设置密码

[root@ora19c ~]# groupadd -g 6000 kes86
[root@ora19c ~]# useradd -G kes86 -g 6000 -u 6000 kes86 -m
[root@ora19c ~]# passwd kes86
Changing password for user kes86.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@ora19c ~]# 

[root@postgres ~]# groupadd -g 6000 kes86
[root@postgres ~]# useradd -G kes86 -g 6000 -u 6000 kes86 -m
[root@postgres ~]# passwd kes86
Changing password for user kes86.
New password: 
BAD PASSWORD: The password is shorter than 7 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@postgres ~]# 
备注:用户组及用户名称可以自定义,这里以kes86用户为例

2、配置互信

配置/etc/hosts文件

vim /etc/hosts
192.168.57.30  node1      node1      #主库
192.168.57.10  node2      node2      #备库
备注:/etc/hosts配置属于可选项目,这里配置主要是区分后续的复制槽,在一主多备的场景方便区分复制槽对应的主机

配置数据库系统用户ssh互信
最好把root/系统数据库都配置(如果只访问数据库data目录文件,无需配置root用户互信,只配置数据库用户就可以)

--切换到kes86用户
[root@ora19c ~]# su - kes86 
--执行 ssh-keygen -t rsa
[kes86@ora19c ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/kes86/.ssh/id_rsa): 
Created directory '/home/kes86/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/kes86/.ssh/id_rsa.
Your public key has been saved in /home/kes86/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:c6CPByaWLrQebsIbKBYBmyr+CjEzUmZoAas0M9gSBRg kes86@ora19c
The key's randomart image is:
+---[RSA 3072]----+
|E=.              |
|+B.              |
|BB=     .        |
|===  . . .       |
|X.. + + S .      |
|=*.+ o + o       |
|=++ . . o        |
|=++o   .         |
| ==.             |
+----[SHA256]-----+
ssh-copy-id kes86@192.168.57.30

[kes86@ora19c ~]$ ssh-copy-id kes86@192.168.57.30
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kes86/.ssh/id_rsa.pub"
The authenticity of host '192.168.57.30 (192.168.57.30)' can't be established.
ECDSA key fingerprint is SHA256:P2+7+Mqp10TIbLRkwBQT6lTVBxbLtWrz+zEMRNwYA2Y.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
kes86@192.168.57.30's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'kes86@192.168.57.30'"
and check to make sure that only the key(s) you wanted were added.

ssh-copy-id kes86@192.168.57.10

[kes86@ora19c ~]$ ssh-copy-id kes86@192.168.57.10
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kes86/.ssh/id_rsa.pub"
The authenticity of host '192.168.57.10 (192.168.57.10)' can't be established.
ECDSA key fingerprint is SHA256:jbiSTLyWOSvk9P6Rrum0V/H5PE52Fz48bO69ttVGvjU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
kes86@192.168.57.10's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'kes86@192.168.57.10'"
and check to make sure that only the key(s) you wanted were added.

ssh-copy-id kes86@node1

[kes86@ora19c ~]$ ssh-copy-id kes86@node1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kes86/.ssh/id_rsa.pub"
The authenticity of host 'node1 (192.168.57.30)' can't be established.
ECDSA key fingerprint is SHA256:P2+7+Mqp10TIbLRkwBQT6lTVBxbLtWrz+zEMRNwYA2Y.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
		(if you think this is a mistake, you may want to use -f option)

ssh-copy-id kes86@node2

[kes86@ora19c ~]$ ssh-copy-id kes86@node2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/kes86/.ssh/id_rsa.pub"
The authenticity of host 'node2 (192.168.57.10)' can't be established.
ECDSA key fingerprint is SHA256:jbiSTLyWOSvk9P6Rrum0V/H5PE52Fz48bO69ttVGvjU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
		(if you think this is a mistake, you may want to use -f option)

--scp .ssh目录到node2节点

[kes86@ora19c ~]$ scp -r .ssh/ kes86@192.168.57.10:~
id_rsa                                                                                    100% 2602     1.1MB/s   00:00    
id_rsa.pub                                                                                100%  566     5.9KB/s   00:00    
known_hosts                                                                               100%  684   182.8KB/s   00:00    
authorized_keys                                                                           100%  566   255.8KB/s   00:00    
[kes86@ora19c ~]$ 

--免密测试
--主节点执行

[kes86@ora19c ~]$ ssh kes86@node1 date
Tue Dec  6 09:29:02 CST 2022
[kes86@ora19c ~]$ ssh kes86@node2 date
Fri Dec 16 16:10:06 CST 2022

--备节点执行

[kes86@postgres ~]$ ssh kes86@node1 date
Tue Dec  6 09:29:51 CST 2022
[kes86@postgres ~]$ ssh kes86@node2 date
Fri Dec 16 16:10:52 CST 2022

测试通过

3、操作系统参数设置

系统参数配置:所有节点都必须配置
系统内核参数/etc/sysctl.conf,详细配置请参考系统手册

示例 vim /etc/sysctl.conf
fs.aio-max-nr= 1048576
fs.file-max= 6815744
kernel.shmall= 2097152
kernel.shmmax= 2174483648
kernel.shmmni= 4096
kernel.sem= 5010 641280 5010 256
net.ipv4.ip_local_port_range= 9000 65500
net.core.rmem_default= 262144
net.core.rmem_max= 4194304
net.core.wmem_default= 262144
net.core.wmem_max= 1048576
fs.aio-max-nr参数决定了系统中所允许的文件句柄最大数目,文件句柄设置代表linux系统中可以打开的文件的数量
fs.file-max参数表示进程(比如一个worker进程)可以同时打开的最大句柄数,这个参数限制最大并发连接数。
kernel.shmall参数表示系统一次可以使用的共享内存总量(以页为单位)缺省值就是2097152,通常不需要修改
kernel.sshmmax参数定义了共享内存段的最大尺寸(以字节为单位)
kernel.shmmni参数用于设置系统范围内共享内存段的最大数量。该参数的默认值是 4096
kernel.sem参数表示设置的信号量
# 5010是参数semmsl的值,表示一个信号量集合中能够包含的信号量最大数目。
# 641280是参数semmns的值,表示系统内可允许的信号量最大数目。
# 5010是参数semopm的值,表示单个semopm()调用在一个信号量集合上可以执行的操作数量。
# 256是参数semmni的值,表示系统信号量集合总数
net.ipv4.ip_local_port_range参数表示应用程序可使用的IPv4端口范围
net.core.rmem_default参数表示套接字接收缓冲区大小的缺省值
net.core.rmem_max参数表示套接字接收缓冲区大小的最大值
net.core.wmem_default参数表示套接字发送缓冲区大小的缺省值
net.core.wmem_max=参数表示套接字发送缓冲区大小的最大值

NUMA设置

大多数情况,可以在BIOS层面关闭NUMA支持,并且在OS启动参数中设置numa off参数,那么我们再OS上就可以不用关注NUMA问题了。
如果在OS上没有关闭NUMA,也可以通过下面的手段让数据库在分配内存的时候不理会NUMA的远程内存。
vm.zone_reclaim_mode=0
vm.numa_balancing=0
numactl-interleave=all

避免OOM发生

vm.swappiness = 0           # 建议设置为0(
vm.overcommit_memory = 2   # 允许一定程度的Overcommit
vm.overcommit_ratio = 50   # 允许的Overcommit:$((($mem - $swap) * 100 / $mem))

文件缓存脏块回写策略

vm.dirty_background_ratio = 5(10–>5)
vm.dirty_ratio = 10~15
vm.dirty_background_bytes=25

当pagecache变脏了,需要将脏页刷出内存。linux中通过一些参数来控制何时进行内存回写。

vm.dirty_background_ratio/vm.dirty_background_bytes

内存中允许存在的脏页比率或者具体值。达到该值,后台刷盘。取决于外存读写速度的不同,通常将vm.dirty_background_ratio设置为5,而

vm.dirty_background_bytes设置为读写速度的25%。
vm.dirty_ratio/vm.dirty_bytes

前台刷盘会阻塞读写,一般vm.dirty_ratio设置的比vm.dirty_background_ratio大,设置该值确保系统不会再内存中保留过多数据,避免丢失。

vm.dirty_expire_centisecs

脏页在内存中保留的最大时间。

vm.dirty_writeback_centisecs

刷盘进程(pdflush/flush/kdmflush)周期性启动的时间

预留足够的物理内存,用于文件的读写CACHE
根据业务系统的特点与硬件的IO能力调整脏块刷新频率
如果物理IO性能很强,则降低刷新频率,减少脏块比例,如果物理IO性能较弱,则往反方向调整。

设置完成后使用root用户执行 sysctl -p 是参数生效

4、修改系统资源限制

系统资源限制,查看/etc/security/limits.conf /etc/security/limits.d/20-nproc.conf
如果系统只有/etc/security/limits.conf文件,没有/etc/security/limits.d/20-nproc.conf文件,只需修改/etc/security/limits.conf文件。如果全部都有,那么2个文件都要修改。

vim /etc/security/limits.conf
kingbase  soft nofile 65536
kingbase  hard nofile 65535
kingbase  soft nproc 65536
kingbase  hard nproc 65535
kingbase  soft core unlimited
kingbase  hard core unlimited

设置完成后,切换到kingbase用户使用ulimit -a进行查看

# 使用unlimited ,是最大数量则表示无限制
# * 表示所有用户,这里也可只设置root 和要安装的kingbase 用户设置其值
# nofile 是打开文件最大数目,nproc 为进程最大数目,core 为生成内核文件大小的上限
# soft 代表一个警告值,hard 为真正的阈值,超过就会报错,可以适当继续往高调
# PAM 的调整针对单个会话生效,调整后需重新登录root 和kingbase,用ulimit -n 查看生效情况
# 注意:设置nofile 的hard limit 不能大于/proc/sys/fs/nr_open,否则注销后将无法正常登陆

5、修改磁盘调度策略

查看磁盘IO调度
查看当前I/O 调度策略

cat /sys/block/{DEVICE-NAME}/queue/scheduler

• 修改I/O 调度策略为deadline(最后期限算法,根据算法保证请求不饿死)
• {DEVICE-NAME} = 硬盘名称
机械硬盘,推荐deadline 调度算法,较为适合业务单一并且IO 比较重的业务,比如数据库。
固态硬盘,推荐noop 调度算法。
查看系统支持IO 调度算法:

dmesg | grep -i scheduler
[ 1.203287] io scheduler noop registered
[ 1.203289] io scheduler deadline registered (default) [ 1.203311]
io scheduler cfq registered
[ 1.203314] io scheduler mq-deadline registered [ 1.203316] io
scheduler kyber registered

查看某块盘的IO 调度算法

cat /sys/block/{DEVICE-NAME}/queue/scheduler

如果是普通的机械硬盘建议修改磁盘IO调度策略为:deadline (最后期限算法,根据算法保证请求不饿死)

修改IO磁盘调度策略为deadline,示例如下:

linux6版本:

echo deadline > /sys/block/sda/queue/scheduler

也可以写在grub中:

kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/
    elevator=deadline crashkernel=128M@16M  quiet console=tty1
    console=ttyS1,115200 panic=30 transparent_hugepage=never 
    initrd /initrd-2.6.18-274.3.1.el5.img    

linux7版本

grubby --update-kernel=ALL --args="elevator=deadline"  

暂时不建议使用透明大页,并且在一些高负载的大型数据库系统中建议关闭操作系统的透明大页功能

grubby --update-kernel=ALL --args="transparent_hugepage=never"

磁盘阵列一般都用write-back cache,因为他配置了电池,一般称为battery-backed write cache(BBC or BBWC)

6、kingbase.conf常用参数说明

wal_log_hints=on                                   # 参数必须开启,在流复制主备需要进行切换时,主备时间线出现分歧,方便使用sys_rewind进行时间同步 
full_page_writes=on                                # 参数必须开启,在流复制主备需要进行切换时,主备时间线出现分歧,方便使用sys_rewind进行时间同步
archive_mode=on                                    # 开启归档模式。启用archive_mode时,通过设置archive_command将已完成的WAL段发送到归档存储。
                                                   # 除了off,disable,还有两种模式:on,always
                                                   # 在正常操作期间,两种模式之间没有区别,但是当设置为always的情况下,WAL archiver在存档恢复或待              																									 # 机模式下也被启用。
                                                   # 在always模式下,从归档还原或流式复制流的所有文件都将被再次归档
                                                   # archive_mode和archive_command是单独的变量,因此可以在不更改存档模式的情况下更改archive_command
                                                   # 此参数只能在服务器启动时设置。当wal_level设置为minimal时,无法启用archive_mode
wal_level=replica                                  # minimal 记录基本的数据操作,保证数据库的ACID
                                                   # replica 在minmal的基础上,记录额外的事务类型操作和数据,保证主备同步一致
                                                   # logical 在replica的基础上,记录完整的数据(主要是更新操作中的旧数据,低于此级别只记录旧数据的																										# 某个标记
                                                   # 保证逻辑解码功能逻辑解码是为逻辑同步服务的,两个独立主库通过逻辑同步的方式同步表数据.
archive_command='test ! -f /home/kingbase/data/sys_wal/archive_status/%f && cp %p /home/kingbase/archive/%f' # 定义对wal进行归档的命令。
                                                                                                             # 当archive_mode配置参数启用并且archive_command配置参数是空字符串时,

                                                                                                             # wal archiving暂时被禁用,但是数据库会继续积累wal segment文件。
                                                                                                             # archive_command参数值设置为/bin/true会禁用归档
                                                                                                             # 但这样会导致wal文件归档中断,归档中断是无法进行归档恢复的,请注意这一点。
                                                                                                             # archive_command = 'test ! -f /mnt/archivedir/%f && cp %p /mnt/archivedir/%f'
                                                                                                             # archive_command = 'gzip < %p > /mnt/archivedir/%f'
archive_timeout=0                                  # archive_command执行本地shell命令来归档已完成的WAL文件段。仅对已完成的WAL段进行调用。
                                                   # 因此,如果你的服务器产生很少的WAL(或者在这种情况下有很长的时间),在事务完成和归档存储器中的安																									 # 全记录之间可能会有很长的延迟。
                                                   # 为了限制未归档的数据的可能性,可以设置archive_timeout来强制服务器定期切换到新的WAL段文件。
                                                   # 当此参数大于零时只要从最后一个段文件切换开始经过了许多秒,服务器就会切换到一个新的段文件,
                                                   # 并且存在任何数据库活动,包括一个检查点(如果没有检查点,则跳过检查点数据库活动)。
                                                   # 请注意,由于强制切换而提前关闭的归档文件的长度与完整文件的长度相同。 因此,使用一个非常短的																											 # archive_timeout是不明智的
                                                   # 这会使您的存档存储空间膨胀。一分钟左右的archive_timeout设置通常是合理的。
synchronous_commit=on                              # on off local remote_write remote_apply
                                                   # 在流复制的环境下对性能的影响由小到大分别是:
                                                   # off (async) > on (async) > remote_write (sync)  > on|local (sync)  > remote_apply (sync)
                                                   # remote_apply 应用发起提交后,等到在备库上应用WAL(更新数据)后,它将返回COMMIT响应,并且可																										 # 以在备库上进行引用。
                                                   # 由于完全保证了数据同步,因此它适合需要备库始终保持最新数据的负载分配场景。
                                                   # on 应用发起提交后,在备库上写入WAL之后,返回COMMIT响应。该选项是性能和可靠性之间的最佳平衡。
                                                   # remote_write 应用发起提交后,等到WAL已传输到备库后,返回COMMIT响应。
                                                   # local 应用发起提交后,写入主库WAL之后,返回COMMIT响应。
                                                   # off 应用发起提交后,直接返回COMMIT响应,而无需等待主库WAL完成写入。
synchronous_standby_names='node2'                  # 如果不配置此参数,默认为async同步方式。kingbaseES生成规则为''kingbase_*&+_++'',主库																											 # synchronous_standby_names参数为备机名称。
                                                   # 当自身是备库的时候synchronous_standby_names参数为自己
                                                   # 主库配置为备库节点名称。当自身为备库时,配置为备库节点名称。
max_wal_size=1GB                                   # 两个检查点之间,wal可增长的最大大小,这是一个软限制
                                                   # 如果日志量大于max_wal_size,则WAL日志空间尽量保持在max_wal_size。因为会触发检查点,不需要																										 # 的段文件将被移除直到系统回到这个限制以下
                                                   # 如果日志量小于max_wal_size,则WAL日志空间至少保持min_wal_size
                                                   # 通常情况下,WAL日志空间大小在min_wal_size和max_wal_size之间动态评估。该估计基于在以前的检																										 # 查点周期中使用的WAL文件数的动态平均值。
                                                   # 如果实际使用量超过估计值,动态平均数会立即增加
min_wal_size=100MB                                 # 检查点后用来保留的,用于未来循环使用的wal文件。可以被用来确保有足够的 WAL 空间被保留来应付 																										 # WAL 使用的高峰
                                                   # WAL异常增长,或WAL一直膨胀且超过max_wal_size,执行检查点后,WAL使用量未见降低或WAL日志不会																									 # 被删除重用,需要排查以下因素
                                                   # 独立于max_wal_size之外,wal_keep_segments + 1 个最近的 WAL 文件将总是被保留
                                                   # 启用了WAL 归档,旧的段在被归档之前不能被移除或者再利用
                                                   # 启用了复制槽功能,一个使用了复制槽的较慢或者失败的后备服务器也会导致WAL不能被删除或重用
                                                   # checkpoing未完成,长事务未提交
checkpoint_completion_target=0.5                   # 100GB /(0.5*5*60)*1024≈670M/s 100GB /(0.9*5*60)*1024≈380M/s
checkpoint_timeout=10min                           # 写入速度越低,对客户而言,体验越好,性能越高。反之,较低的值可能会引起I/O峰值,导致“卡死”的现象
shared_buffers=1GB                                 # 最佳值为内存RAM 1/3
archive_cleanup_command                            # 提供一个清理不被standby server所需要的老的archived wal file
                                                   # %r代表最后一个有效的restart point 的 wal file.该 wal file 是最早一个必须保留的文件,以便																										# 允许 restore 操作可以被 restart
                                                   # 注意:restart point 是一个 point ,该 point 用于 standby server 重启 recovery 操作。
                                                   # 因此,所有早于 % r 的文件可以被安全的清理掉。本信息可以用来 truncate 掉 
                                                   # archive wal file,以便满足从当前 restore 可以 restart 的最小需求
                                                   # 常被用在单个 standby 配置的 archive_cleanup_command 参数中
                                                   # 当命令被一个 signal 终止或者 shell 中有错误时,一个 fatal error 会被抛出

关于synchronous_standby_names参数

该参数用于指定: 基于优先的多同步后备节点对事务提交的影响
synchronous_standby_names =“FIRST 2 (*)”,所有流复制链接到此主节点的standy节点中,任意选择两个节点是sync状态,剩下的都是potential状态;
synchronous_standby_names =“ANY 2 (*)”,所有流复制链接到此主节点的所有standy节点都是quorum状态。

pg_stat_replication 表中的 sync_state 字段的状态:
FIRST语法下,同步的节点值为sync,异步的节点值为asyn;被匹配为同步的节点,但是被个数限制的值为 potential;
Any语法下,值为quorum,表现上没有同步和异步一说,一个写事务主节点收到指定数量的standy节点的反馈,接着就会给客户端返回事务执行成功。

多个同步后备
同步复制支持一个或者更多个同步后备服务器,事务将会等待,直到所有同步后备服务器都确认收到了它们的数据为止。事务必须等待其回复的同步后备的数量由synchronous_standby_names指定。这个参数还指定一个后备服务器名称及方法(FIRST和ANY)的列表来从列出的后备中选取同步后备。

方法FIRST指定一种基于优先的同步复制并且让事务提交等待,直到它们的WAL记录被复制到基于优先级选中的所要求数量的同步后备上为止。在列表中出现较早的后备被给予较高的优先级,并且将被考虑为同步后备。其他在这个列表中位置靠后的后备服务器表示可能的同步后备。如果任何当前的同步后备由于任何原因断开连接,它将立刻被下一个最高优先级的后备所替代。

基于优先的多同步后备的synchronous_standby_names示例为:

synchronous_standby_names = ‘FIRST 2 (s1, s2, s3)’
在这个例子中,如果有四个后备服务器s1、s2、s3和s4在运行,两个后备服务器s1和s2将被选中为同步后备,因为它们出现在后备服务器名称列表的前部。s3是一个潜在的同步后备,当s1或s2中的任何一个失效, 它就会取而代之。s4则是一个异步后备因为它的名字不在列表中。

方法ANY指定一种基于规定数量的同步复制并且让事务提交等待,直到它们的WAL记录至少被复制到列表中所要求数量的同步后备上为止。

synchronous_standby_names的基于规定数量的多同步后备的例子:

synchronous_standby_names = ‘ANY 2 (s1, s2, s3)’
在这个例子中,如果有四台后备服务器s1、s2、s3以及s4正在运行,事务提交将会等待来自至少其中任意两台后备服务器的回复。s4是一台异步后备,因为它的名字不在该列表中。

后备服务器的同步状态可以使用pg_stat_replication视图查看。

7、repmgr.conf参数说明

# 日志管理

log_level=INFO
log_file='/home/kes86/cluster/log/repmgrd.log'      # repmgr log 文件
log_status_interval=10                              # 此设置导致 repmgrd 以指定的时间间隔(以秒为单位,默认为 300)发出状态日志行,
																										# 描述 repmgrd 的当前状态, 
                                                    # 例如:  [2022-10-30 17:51:15] [INFO] monitoring primary node "node1" (ID: 1) in normal state
failover=automatic/manual
# failover设置

promote_command='/home/kes86/kes86/cluster/bin/repmgr standby promote -f /home/kes86/kes86/cluster/etc/repmgr.conf'                     
# 当repmgrd 确定当前节点将成为新的主节点时 ,将在故障转移情况下执行 
follow_command='/home/kes86/kes86/cluster/bin/repmgr standby follow -f /home/kes86/kes86/cluster/etc/repmgr.conf --upstream-node=%n'   
# %n将被替换repmgrd与新的主节点的ID,如果没有提供,repmgr standby  follow将尝试自行确定新的主repmgr standby follow节点
# 但如果在新主节点提升后原主节点重新上线,则存在导致节点继续跟随原主节点的风险。
repmgrd_pid_file='/home/kes86/kes86/cluster/etc/hamgrd.pid'
# repmgrd 运行时的 pid 文件
# 高可用参数设置

location='location1'                # 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
priority=100                        # 节点优先级,选主时可能使用到。(lsn > priority > node_id)
                                    # 0 代表该节点不会被提升为主节点
monitoring_history=yes              # 是否将监控数据写入“monitoring_history”表
reconnect_interval=10               # 故障转移之前,尝试重新连接的间隔(以秒为单位)
reconnect_attempts=6                # 故障转移之前,尝试重新连接的次数
connection_check_type=ping          # ping: repmg 使用PQPing() 方法测试连接
                                    # connection: 尝试与节点建立新的连接
                                    # query: 通过现有连接在节点上执行 SQL 语句
monitor_interval_secs=5             # 写入监控数据的间隔
use_replication_slots=true          # 是否使用复制槽

8、集群相关视图

为了有效地管理复制集群,repmgr提供专用数据库存储和管理有关repmgr集群服务的相关信息。

此模式在部署repmgr服务时,由repmgr扩展自动创建,该扩展在初始化repmgr集群(repmgr主节点)的第一步中安装。

包含以下对象:

1、表

repmgr.events:记录感兴趣的事件
repmgr.nodes:复制群集中每个服务器的连接和状态信息
repmgr.monitoring_history:repmgrd写入的历史备用监视信息

2、视图

repmgr.show_nodes:基于表repmgr.nodes,另外显示服务器上游节点的名称
repmgr.replication_status:启用repmgrd的监视时,显示每个备用数据库的当前监视状态。
repmgr元数据模式可以存储在现有的数据库或在自己的专用数据库。

注意,repmgr元数据模式不能存储在不属于repmgr管理的复制集群的数据库服务器上。
数据库用户必须可供repmgr访问此数据库并执行必要的更改。
此用户不需要是超级用户,但是某些操作(如初始安装repmgr扩展)将需要超级用户连接(可以使用命令行选项--superuser在需要时指定 )。

9、关于KBHA工具:

kbha工具是保护KingbaseES的集群高可用/数据库服务器硬件掉电或者其他的故障导致的宕机后,故障恢复后会自动对数据库进行恢复。
如果不使用自动auto failover,那么就不需要启动kbha这个进程,虚拟ip也无需添加。

二、部署过程

1、配置主节点:

在系统数据库用户家目录上传db.zip文件。

scp db.zip kes86@192.168.57.10:~

1. 将上传的db.zip解压并初始化数据库

解压上传的db.zip文件

unzip db.zip -d 路径

初始化数据库,只初始化主节点,备节点无需做此操作
initdb -Usystem -Eutf-8 -mpg -D /home/kes86/data -A scram-sha-256 -x system --data-checksums

[kes86@node1 ~]$ initdb -Usystem -Eutf-8 -mpg -D /home/kes86/data -A scram-sha-256 -x system --data-checksums
属于此数据库系统的文件宿主为用户 "kes86".
此用户也必须为服务器进程的宿主.
数据库簇将使用本地化语言 "zh_CN.UTF-8"进行初始化.
initdb: could not find suitable text search configuration for locale "zh_CN.UTF-8"
缺省的文本搜索配置将会被设置到"simple"

The comparision of strings is case-sensitive.
允许生成数据页校验和.

创建目录 /home/kes86/data ... 成功
正在创建子目录 ... 成功
选择动态共享内存实现 ......posix
选择默认最大联接数 (max_connections) ... 100
选择默认共享缓冲区大小 (shared_buffers) ... 128MB
selecting default time zone ... Asia/Shanghai
创建配置文件 ... 成功
Begin setup encrypt device
initializing the encrypt device ... 成功
正在运行自举脚本 ...成功
正在执行自举后初始化 ...成功
create security database ... 成功
load security database ... 成功
同步数据到磁盘...成功

成功。您现在可以用下面的命令开启数据库服务器:

sys_ctl -D /home/kes86/data -l 日志文件 start

2. 修改数据库kingbase.conf配置文件内容

shared_preload_libraries='repmgr'
listen_addresses = '*'
port = 54322
full_page_writes = on
wal_log_hints = on
max_wal_senders = 32
wal_keep_segments = 512
max_connections = 100
wal_level = replica 
archive_mode = on
archive_command = '/bin/cp -f %p /home/kes86/data/archive/%f'
control_file_copy = '/home/kes86/data/copy_file_bak'
max_replication_slots = 32
hot_standby = on
hot_standby_feedback = on
logging_collector = on
log_destination = 'csvlog'
log_checkpoints = on
log_replication_commands = on
wal_compression = on
synchronous_commit = remote_write
max_prepared_transactions = 100
#shared_buffers = 512MB
fsync = on
#synchronous_standby_names='ANY 1(node2)'

备注:
archive_command参数配置的目录需提前创建,或者已经存在的目录。
mkdir -p /home/kes86/data/archive
synchronous_standby_names此参数暂时先不配置,可以考虑注释掉。如果配置此参数会导致做任何操作都需等待备节点的响应。

3. 修改sys_hba.conf文件:

在sys_hba.conf末尾添加以下内容(考虑密码安全性,可以考虑复制角色不验证密码,或者通过对密码加密)

# add replication priv

host    all             all             0.0.0.0/0               trust
host    replication     all             0.0.0.0/0               trust
host    replication     all             ::0/0                   trust

4. 启动数据库,最好使用绝对路径启动数据库。

备注:后期进行数据库备份使用相对路径启动的话,备份会有问题

[kes86@node1 ~]$ sys_ctl -D /home/kes86/data/ start
waiting for server to start....2022-10-30 15:38:45.726 CST [7921] 日志:  正在启动 KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-10-30 15:38:45.726 CST [7921] 日志:  正在监听IPv4地址"0.0.0.0",端口 54322
2022-10-30 15:38:45.726 CST [7921] 日志:  正在监听IPv6地址"::",端口 54322
2022-10-30 15:38:45.731 CST [7921] 日志:  在Unix套接字 "/tmp/.s.KINGBASE.54322"上侦听
2022-10-30 15:38:45.759 CST [7921] 日志:  日志输出重定向到日志收集进程
2022-10-30 15:38:45.759 CST [7921] 提示:  后续的日志输出将出现在目录 "sys_log"中.
 done
server started

5. 创建replication用户并授权

create user repmgr replication login;    
alter user repmgr password 'repmgr';
alter user repmgr superuser createdb createrole;

6. 创建repmgr数据库,储存repmgr相关的信息

create database repmgr encoding UTF8;
alter database repmgr owner to repmgr ;

至此主节点数据库已配置完成。

7. 配置repmgr.conf文件,默认路径跟软件bin目录同级路径

备注:use_scmd='off' 未使用securecmdd必须显式设置为off。注释或者默认都是on

[kes86@node1 etc]$ cat repmgr.conf 
node_id=1
node_name='node1'
promote_command='/home/kes86/kes86/cluster/bin/repmgr standby promote -f /home/kes86/kes86/cluster/etc/repmgr.conf'
follow_command='/home/kes86/kes86/cluster/bin/repmgr standby follow -f /home/kes86/kes86/cluster/etc/repmgr.conf  --upstream-node=%n'
conninfo='host=node1 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kes86/kes86/cluster/log/hamgr.log'
log_level=info
#kbha_log_file='/home/kes86/kes86/cluster/log/kbha.log'
data_directory='/home/kes86/data'
sys_bindir='/home/kes86/kes86/cluster/bin'
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'
reconnect_attempts=10
reconnect_interval=6
failover='manual'
recovery='standby'
monitoring_history='no'
#trusted_servers='192.168.57.1'
#virtual_ip='192.168.57.32/24'
#net_device='enp0s17'
#net_device_ip='192.168.57.40'
#ipaddr_path='/sbin'
#arping_path='/home/kes86/kes86/cluster/bin'
synchronous='quorum'
#repmgrd_pid_file='/home/kes86/kes86/cluster/etc/hamgrd.pid'
#kbha_pid_file='/home/kes86/kes86/cluster/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off
use_scmd='off'
#running_under_failure_trusted_servers=on
connection_check_type='mix'
location='location1'                
priority=100                        


如果未部署securecmdd节点通信工具,以下参数无需配置
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'
此参数未部署securecmdd必须配置为off
use_scmd='off'

手动switchover无需配置kbha

kbha工具选项

[kes86@node1 ~]$ kbha --help
kbha: replication management daemon for Kingbase

kbha starts the repmgrd and do auto-recovery for Kingbase.

Usage:
    kbha [OPTIONS]

General options:
  -f, --config-file=PATH              path to the repmgr configuration file
  -A, --action={rejoin|register|follow|daemon|loadvip|unloadvip|arping|stopdb|startdb|updateinfo|removeinfo|check_ip}
                                      what to do for program, default is 'daemon'
      --dev=device name,              if -A is check_ip, could input net device name
      --ip=ip address,                if -A is check_ip, must be input ip which will be check, support IPV4 and IPV6
      --upstream-node-id=NODE_ID,     if -A is rejoin or follow, this node will follow the node of NODE_ID

Database connection options:
  -d, --dbname=DBNAME                 database to connect to (default: "kes86")
  -h, --host=HOSTNAME                 database server host
  -p, --port=PORT                     database server port (default: "54322")
  -U, --username=USERNAME             database user name to connect as (default: "kes86")

Other options:
  -?, --help                          show this help, then exit
  -V, --version                       output version information, then exit
  -v, --verbose                       display additional log output (useful for debugging)

通过kbha工具添加虚拟ip

添加vip命令
kbha -A 192.168.57.32 kbha -A arping

删除vip命令
kbha -A unloadvip

8. repmgr注册primary节点

[kes86@node1 ~]$ repmgr primary register
[INFO] connecting to primary database...
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[NOTICE] attempting to install extension "repmgr"
[NOTICE] "repmgr" extension successfully installed
[INFO] primary registration complete
[NOTICE] primary node record (ID: 1) registered

查看ip是否正常

[kes86@node1 ~]$ ifconfig 
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.30  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::a00:27ff:fede:9422  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:de:94:22  txqueuelen 1000  (Ethernet)
        RX packets 24  bytes 2976 (2.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 32  bytes 3616 (3.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s9: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.61  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::a00:27ff:fef0:68ca  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:f0:68:ca  txqueuelen 1000  (Ethernet)
        RX packets 255  bytes 33730 (32.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 29  bytes 4466 (4.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp0s17: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.57.40  netmask 255.255.255.0  broadcast 192.168.57.255
        inet6 fe80::a00:27ff:fe59:efcf  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:59:ef:cf  txqueuelen 1000  (Ethernet)
        RX packets 1718  bytes 150972 (147.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 997  bytes 138337 (135.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 756  bytes 280777 (274.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 756  bytes 280777 (274.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:68:17:f3  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

9. 登陆主节点数据库执行以下命令:

alter system set synchronous_standby_names='ANY 1(node2)';

10. 重新加载配置文件

```
select pg_reload_conf();
```

三、配置备节点:

1. 备用节点数据库软件安装,备节点软件目录跟主节点软件目录一致:2种方式

拷贝软件目录到备用节点

scp -r kes86 kes86@node2:~

解压db.zip压缩包

unzip db.zip -d /home/kes86/kes86/cluster

配置数据库用户环境变量vi .bash_profile

export KDBHOME=/home/kes86/kes86/cluster
export LANG=zh_CN.UTF-8
export KDBDATA=/home/kes86/data
export PATH=/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:$KDBHOME/bin

source .bash_profile使环境变量生效

2. 配置repmgr.conf文件,默认路径跟kingbaseES软件bin目录同级

use_scmd='off' 未使用securecmdd必须显式设置为off。注释或者默认都是on
[kes86@node1 etc]$ cat repmgr.conf 
node_id=1
node_name='node1'
promote_command='/home/kes86/kes86/cluster/bin/repmgr standby promote -f /home/kes86/kes86/cluster/etc/repmgr.conf'
follow_command='/home/kes86/kes86/cluster/bin/repmgr standby follow -f /home/kes86/kes86/cluster/etc/repmgr.conf  --upstream-node=%n'
conninfo='host=node1 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kes86/kes86/cluster/log/hamgr.log'
log_level=info
#kbha_log_file='/home/kes86/kes86/cluster/log/kbha.log'
data_directory='/home/kes86/data'
sys_bindir='/home/kes86/kes86/cluster/bin'
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'
reconnect_attempts=10
reconnect_interval=6
failover='manual'
recovery='standby'
monitoring_history='no'
#trusted_servers='192.168.57.1'
#virtual_ip='192.168.57.32/24'
#net_device='enp0s17'
#net_device_ip='192.168.57.40'
#ipaddr_path='/sbin'
#arping_path='/home/kes86/kes86/cluster/bin'
synchronous='quorum'
#repmgrd_pid_file='/home/kes86/kes86/cluster/etc/hamgrd.pid'
#kbha_pid_file='/home/kes86/kes86/cluster/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off
use_scmd='off'
#running_under_failure_trusted_servers=on
connection_check_type='mix'
location='location1'                
priority=100

如果未部署securecmdd节点通信工具,以下参数无需配置
#scmd_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 8890'

此参数不使用securcmdd插件,需要设置为off
use_scmd='off'

3. 在备用节点测试连通性:

在备用节点执行以下命令检查是否有错误:

repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[INFO] "repmgr" extension is installed in database "repmgr"
[DEBUG] 1 node records returned by source node
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[INFO] parameter "max_replication_slots" set to 32
[INFO] parameter "max_wal_senders" set to 32
[NOTICE] checking for available walsenders on the source node (2 required)
[INFO] sufficient walsenders available on the source node
[DETAIL] 2 required, 32 available
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] required number of replication connections could be made to the source server
[DETAIL] 2 replication connections required
[INFO] replication slots will be created by user "repmgr"
[NOTICE] standby will attach to upstream node 1
[HINT] consider using the -c/--fast-checkpoint option
[INFO] would execute:
  /home/kes86/kes86/cluster/bin/sys_basebackup -l "repmgr base backup"  -D /home/kes86/data -h node1 -p 54322 -U repmgr -X stream -S repmgr_slot_2 
[INFO] all prerequisites for "standby clone" are met

如果repmgr未注册主节点,会提示 unable to retrieve record for upstream node 1错误

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone --dry-run
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[INFO] "repmgr" extension is installed in database "repmgr"
[DEBUG] 0 node records returned by source node
[DEBUG] upstream_node_id determined as 1
[INFO] parameter "max_replication_slots" set to 32
[INFO] parameter "max_wal_senders" set to 32
[NOTICE] checking for available walsenders on the source node (2 required)
[INFO] sufficient walsenders available on the source node
[DETAIL] 2 required, 32 available
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] required number of replication connections could be made to the source server
[DETAIL] 2 replication connections required
[ERROR] unable to retrieve record for upstream node 1

4. 备用节点执行未发现错误信息就可以执行standby clone(在standby过程中会自动创建复制槽):

repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone

[kes86@node2 ~]$ repmgr -h node1 -U repmgr -d repmgr -p 54322 --upstream-node-id 1 standby clone
[NOTICE] destination directory "/home/kes86/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=node1 user=repmgr port=54322 dbname=repmgr
[DETAIL] current installation size is 45 MB
[DEBUG] 1 node records returned by source node
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[DEBUG] upstream_node_id determined as 1
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] creating directory "/home/kes86/data"...
[INFO] creating replication slot as user "repmgr"
[DEBUG] create_replication_slot_sql(): creating slot "repmgr_slot_2" on upstream
[NOTICE] starting backup (using sys_basebackup)...
[HINT] this may take some time; consider using the -c/--fast-checkpoint option
[INFO] executing:
  /home/kes86/kes86/cluster/bin/sys_basebackup -l "repmgr base backup"  -D /home/kes86/data -h node1 -p 54322 -U repmgr -X stream -S repmgr_slot_2 
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kes86/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"

5. 执行standby clone无错误信息,启动备节点数据库

[kes86@node2 ~]$ sys_ctl -D /home/kes86/data/ start
waiting for server to start....2022-10-30 16:41:28.063 CST [1595] 日志:  正在启动 KingbaseES V008R006C006B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-10-30 16:41:28.063 CST [1595] 日志:  正在监听IPv4地址"0.0.0.0",端口 54322
2022-10-30 16:41:28.063 CST [1595] 日志:  正在监听IPv6地址"::",端口 54322
2022-10-30 16:41:28.068 CST [1595] 日志:  在Unix套接字 "/tmp/.s.KINGBASE.54322"上侦听
2022-10-30 16:41:28.092 CST [1595] 日志:  日志输出重定向到日志收集进程
2022-10-30 16:41:28.092 CST [1595] 提示:  后续的日志输出将出现在目录 "sys_log"中.
 done
server started

6. repmgr注册备用节点:

repmgr standby register

[kes86@node2 ~]$ repmgr standby register
[INFO] connecting to local node "node2" (ID: 2)
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node2 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[INFO] connecting to primary database
[DEBUG] connecting to: "user=repmgr connect_timeout=10 dbname=repmgr host=node1 port=54322 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[NOTICE] failed to update nodes_info file on primary node.
[INFO] standby registration complete
[NOTICE] standby node "node2" (ID: 2) successfully registered

7. 通过repmgr查看集群节点状态

[kes86@node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                          
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=node1 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=node2 user=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

配置.encpwd密码文件:

ksql登录时,会去数据库安装用户家目录下的密码文件.encpwd中读取用户信息,然后登陆数据库;
密码文件为普通文本文件,可以vi创建,也可以使用kingbase提供的工具sys_encpwd工具来生成密码文件。
sys_encpwd配置工具使用方法:
该工具包含5个参数,5个参数均需要输入才能配置成功
[kes86@node2 ~]$ sys_encpwd -H * -P 54322 -D * -U repmgr -W repmgr
[kes86@node2 ~]$ sys_encpwd -H * -P 54322 -D * -U system -W system

8. 回到主节点登陆主节点数据库执行以下查询查看数据库同步状态跟复制槽信息:

[kes86@node1 ~]$ ksql -Usystem -dtest
ksql (V8.0)
输入 "help" 来获取帮助信息.


test=# \x
扩展显示已打开.
test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 2170
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.40
client_hostname  | 
client_port      | 57184
backend_start    | 2022-12-20 16:52:08.272839+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6001940
write_lsn        | 0/6001940
flush_lsn        | 0/6001940
replay_lsn       | 0/6001940
write_lag        | 00:00:00.000869
flush_lag        | 00:00:00.004688
replay_lag       | 00:00:00.00547
sync_priority    | 0
sync_state       | async
reply_time       | 2022-12-20 16:53:11.721118+08

test=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 2170
xmin                | 577
catalog_xmin        | 
restart_lsn         | 0/6001940
confirmed_flush_lsn | 
可以看到目前集群同步状态未async异步方式。

要想修改为sync同步模式,需修改一下参数,主节点执行
alter system set synchronous_standby_names='ANY 1(node2)';

重新加载配置文件
select pg_reload_conf();

[kes86@node1 ~]$ ksql -Usystem -dtest -p54322
ksql (V8.0)
输入 "help" 来获取帮助信息.

test=# alter system set synchronous_standby_names='ANY 1(node2)';
ALTER SYSTEM
test=# select pg_reload_conf();

 pg_reload_conf 
----------------
 t
(1 行记录)

9. 查看修改synchronous_standby_names参数后的同步模式

test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 2170
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.40
client_hostname  | 
client_port      | 57184
backend_start    | 2022-12-20 16:52:08.272839+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6002418
write_lsn        | 0/6002418
flush_lsn        | 0/6002418
replay_lsn       | 0/6002418
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 1
sync_state       | quorum
reply_time       | 2022-12-20 16:55:56.498748+08

test=# select * from pg_replication_slots ;                      
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 2170
xmin                | 577
catalog_xmin        | 
restart_lsn         | 0/6002418
confirmed_flush_lsn | 

10. 启动repmgrd进程(所有节点执行):如果不使用auto failover此服务可以不开启

Repmgrd 守护进程它主动监视复制集群中的服务器并执行以下任务:

监控和记录集群复制性能
通过检测主服务器故障并提升最合适的备用服务器来执行故障转移
将有关群集中事件的通知提供给用户定义的脚本,该脚本可以执行诸如通过电子邮件发送警报等任务

repmgrd 根据本地数据库角色不同,其功能也不同:
主库:repmgrd仅监控本地数据库,负责自动恢复、同异步切换
备库:repmgrd监控本地数据库和主数据库,负责自动切换、复制槽删除
repmgr.conf文件中将location参数设置为一致,不设置的话默认也是一致的。(location='location1').
同时启动repmgrd必须在kingbase.conf配置文件中设置shared_preload_libraries='repmgr'.

11. repmgrd启动命令(所有节点执行):

repmgrd -d -v -f kes86/cluster/etc/repmgr.conf

[kes86@node1 ~]$ repmgrd -d -v -f kes86/cluster/etc/repmgr.conf
[2022-12-20 17:10:57] [NOTICE] using provided configuration file "kes86/cluster/etc/repmgr.conf"
[2022-12-20 17:10:57] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

启动kbha服务(所有节点执行):如果不使用auto failover建议不要启动此服务

kbha -A daemon -f kes86/cluster/etc/repmgr.conf

[kes86@node1 ~]$ kbha -A daemon -f kes86/cluster/etc/repmgr.conf
[2022-12-20 17:12:57] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

repmgrd 日志轮换

为确保当前的 repmgrd 日志文件(repmgr.conf配置文件中用参数log_file指定的文件)不会无限增长,请将您的系统配置logrotate为定期轮换它。

/data/sys_log/repmgr/repmgrd.log {
missingok
compress
rotate 52
maxsize 500M
weekly
create 0600 postgres postgres
postrotate
/usr/bin/killall -HUP repmgrd
endscript
}

四、switchover切换测试:

1. 主节点查看集群状态

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 7        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 7        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

主节点查看服务状态

[kes86@node1 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 2730 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 2017 | no      | 0 second(s) ago    

2. 备节点查看集群状态

[kes86@node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 7        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 7        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3


备节点查看服务状态

[kes86@node2 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 2730 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 2017 | no      | 0 second(s) ago    

3. 在备节点执行switchover切换命令:

repmgr standby switchover --dry-run
过程中没有有warning/error信息说明可以正常切换

[kes86@node2 ~]$ repmgr standby switchover  --dry-run
[NOTICE] checking switchover on node "node2" (ID: 2) in --dry-run mode
[INFO] ES connection to host "node1" succeeded
[INFO] able to execute "repmgr" on remote host "node1"
[INFO] 1 walsenders required, 32 available
[INFO] demotion candidate is able to make replication connection to promotion candidate
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[INFO] 14 pending archive files
[INFO] replication lag on this standby is 0 seconds
[INFO] 1 replication slots required, 32 available
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] would pause repmgrd on node "node1" (ID: 1)
[INFO] would pause repmgrd on node "node2" (ID: 2)
[NOTICE] local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby
[INFO] following shutdown command would be run on node "node1":
  "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] parameter "shutdown_check_timeout" is set to 60 seconds
[INFO] prerequisites for executing STANDBY SWITCHOVER are met

正式切换

4. 在备节点执行:repmgr standby switchover

执行过程中会删除掉老主库的复制槽。

[kes86@node2 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node2" (ID: 2)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
[NOTICE] stopping current primary node "node1" (ID: 1)
[NOTICE] issuing CHECKPOINT on node "node1" (ID: 1) 
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/D000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node2" (ID: 2) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/D0000A0
[INFO] SET synchronous TO "async" on primary host 
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node2" (ID: 2) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node2" is now primary and node "node1" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

执行成功后,查看集群状态

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | location1 | 100      | 7        | 0 bytes | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | primary | * running |          | location1 | 100      | 8        |         | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

继续通过switchover切换回原来的主备角色:

[kes86@node1 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=12
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby
[NOTICE] stopping current primary node "node2" (ID: 2)
[NOTICE] issuing CHECKPOINT on node "node2" (ID: 2) 
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/E000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/E0000A0
[INFO] SET synchronous TO "async" on primary host 
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node2" (ID: 2) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node2" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

5. 切换成功后查看数据库集群状态:

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 9        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 8        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

6. ksql登陆数据库

查看复制槽状态是否正常

[kes86@node1 ~]$ ksql -Usystem -dtest
ksql (V8.0)
输入 "help" 来获取帮助信息.

test=# \x
扩展显示已打开.
test=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 7121
xmin                | 635
catalog_xmin        | 
restart_lsn         | 0/E001BC8
confirmed_flush_lsn | 

test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 7121
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.43
client_hostname  | 
client_port      | 45080
backend_start    | 2022-12-21 17:19:52.395706+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/E001BC8
write_lsn        | 0/E001BC8
flush_lsn        | 0/E001BC8
replay_lsn       | 0/E001BC8
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 1
sync_state       | quorum
reply_time       | 2022-12-21 17:24:53.675362+08

到此switchover切换测试验证通过。

建议:如果不使用autofailover,建议不要开启kbha服务。只需启动repmgrd守护进程就可以。

启动repmgrd守护进程命令(如果使用sys_monitor.sh管理Kingbase数据库启停,不用额外执行命令启动repmgrd服务,sys_monitor.sh脚本会自动启动repmgrd服务)

repmgrd -d -v -f /home/kes86/kes86/cluster/etc/repmgr.conf

测试使用非root权限sys_monitor.sh脚本启停数据库

--停止数据库

[kes86@node1 ~]$ sys_monitor.sh stop
2022-12-21 17:28:06 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
2022-12-21 17:28:13 begin to stop repmgrd on "[node1]".
2022-12-21 17:28:18 repmgrd on "[node1]" stop success.
2022-12-21 17:28:18 begin to stop repmgrd on "[node2]".
2022-12-21 17:28:19 repmgrd on "[node2]" stop success.
2022-12-21 17:28:19 begin to stop DB on "[node2]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:20 DB on "[node2]" stop success.
2022-12-21 17:28:20 begin to stop DB on "[node1]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:34 DB on "[node1]" stop success.
2022-12-21 17:28:34 Done. 

--启动数据库

[kes86@node1 ~]$ sys_monitor.sh start
2022-12-21 17:30:29 Ready to start all DB ...
2022-12-21 17:30:29 begin to start DB on "[node1]".
waiting for server to start.... done
server started
2022-12-21 17:30:57 execute to start DB on "[node1]" success, connect to check it.
2022-12-21 17:30:59 DB on "[node1]" start success.
2022-12-21 17:31:00 Try to ping trusted_servers on host node1 ...
2022-12-21 17:31:00 Try to ping trusted_servers on host node2 ...
2022-12-21 17:31:00 begin to start DB on "[node2]".
waiting for server to start.... done
server started
2022-12-21 17:31:02 execute to start DB on "[node2]" success, connect to check it.
2022-12-21 17:31:03 DB on "[node2]" start success.
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 9        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 9        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2022-12-21 17:31:04 The primary DB is started.
2022-12-21 17:31:04 begin to start repmgrd on "[node1]".
[2022-12-21 17:31:12] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:12] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

2022-12-21 17:31:19 repmgrd on "[node1]" start success.
2022-12-21 17:31:19 begin to start repmgrd on "[node2]".
[2022-12-21 17:31:51] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:51] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

2022-12-21 17:31:22 repmgrd on "[node2]" start success.
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 8637 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 8708 | no      | 1 second(s) ago    
[2022-12-21 17:31:22] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

[2022-12-21 17:31:56] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

2022-12-21 17:31:26 Done.
posted @ 2023-03-02 16:09  KINGBASE研究院  阅读(649)  评论(0编辑  收藏  举报