PostgreSQL一主一从高可用搭建

1. PostgreSQL一主一从高可用搭建(VIP)

IP 主机名 角色
192.168.198.146 test11 主服务器
192.168.198.143 test12 备用服务器
192.168.198.144 test13 见证服务器

一、基本环境配置

1、hosts文件配置(所有节点)
vim /etc/hosts
# 新增
192.168.198.146   msp-app01
192.168.198.143   msp-app02
192.168.198.144   msp-app03
2、配置SSH免密登录

root用户下免密登录设置(所有节点都执行)

ssh-keygen
ssh-copy-id 192.168.198.146
ssh-copy-id 192.168.198.143
ssh-copy-id 192.168.198.144

postgres用户下免密登录设置,postgresql需要postgres用户

#创建postgres用户
groupadd postgres
useradd -g postgres postgres
passwd postgres

su - postgres
ssh-keygen
ssh-copy-id 192.168.198.146
ssh-copy-id 192.168.198.143
ssh-copy-id 192.168.198.144
3、添加sudo权限

主节点与备用节点postgres用户添加sudo权限,用于执行VIP自动切换脚本

sudo visudo

postgres ALL=(ALL)       NOPASSWD:ALL

如果上面的方法无法使用:

创建完用户后,需要给用户添加权限。

[root@localhost ~]# chmod -v u+w /etc/sudoers  
mode of '/etc/sudoers' changed from 0440 (r--r-----) to 0640 (rw-r-----)

修改sudoers权限,使其可编辑,编辑sudoers文件。

vim /etc/sudoers
## Allow root to run any commands anywhere 
root    ALL=(ALL)       ALL
postgres ALL=(ALL)       NOPASSWD:ALL

将sudoers文件权限修改为不可编辑。

[root@localhost ~]# chmod -v u-w /etc/sudoers   
mode of "/etc/sudoers" changed from 0640 (rw-r-----) to 0440 (r--r-----)

可能出现问题:

1、使用test用户进行切换(su root)root用户,提示鉴定故障。

可能在权限不足带来的问题。

检查办法如下:

1)检查/etc目录下passwd的权限,该文件应该具备的权限如下:

[root@localhost ~]# ll /etc/passwd
-rw-r--r--. 1 root root 2347 4f  16 21:00 /etc/passwd

如果该权限不足,则使用chmod 补充完成。

2)检查/bin/su文件是否有s位权限 也有可能是所属用户为普通用户,该文件应该具备的权限如下:

[root@localhost ~]# ll /bin/su
-rwsrwxrwx. 1 root root 32128 8f   9 2019 /bin/su

如果该权限不足,则使用chmod u+s /bin/su补充完成即可。

二、安装配置PostgreSQL

1、下载postgresql-16源码包(所有节点)
wget https://ftp.postgresql.org/pub/source/v16.0/postgresql-16.0.tar.gz --no-check-certificate
2、安装依赖包(所有节点)
yum install gcc gcc-c++ readline-devel readline readline-dev zlib-devel
3、解压编译(所有节点)
tar -zxvf postgresql-16.0.tar.gz -C /app

cd /app/postgresql-16.0

./configure --prefix=/app/pgsql
或
./configure --prefix=/app/pgsql --without-icu

make && make install

mv /app/postgresql-16.0 /app/pgsql
4、配置数据、日志目录(所有节点)
mkdir -p /app/pgsql/{data,log}

chown -R postgres:postgres /app/pgsql
5、初始化数据库(主节点操作)

只需要在主数据库服务器上对 PostgreSQL数据库服务进行初始化并且正常启动即可,其余备服务器数据库可利用 repmgr 工具进行初始化搭建部署。

su - postgres

/app/pgsql/bin/initdb -D /app/pgsql/data/
6、启动数据库(主节点操作)
/app/pgsql/bin/pg_ctl -D /app/pgsql/data/ -l logfile start
7、验证测试(主节点操作)
#软链接
ln -s /app/pgsql/bin/psql /bin/psql
ln -s /app/pgsql/bin/pg_config /bin/pg_config

su - postgres

#测试登录
psql -U postgres -d postgres
8、配置环境变量(所有节点)
vim /etc/profile

# PG安装目录
export PGHOME=/app/pgsql/
# PG数据目录
export PGDATA=/app/pgsql/data/
export PATH=$PGHOME/bin:$PATH
export PG_CONFIG=/bin/pg_config

source /etc/profile

三、安装repmgr

1、下载源码包
wget --no-check-certificate https://www.repmgr.org/download/repmgr-5.4.1.tar.gz
2、解压编译
yum install flex libcurl-devel json-c-devel

tar -zxvf repmgr-5.4.1.tar.gz -C /app/

cd /app/repmgr-5.4.1

./configure && make install

make install 成功后,pg_bin_path 里会有 repmgr、repmgrd 两个可执行文件

四、主节点配置

1、postgresql.conf配置文件
vim /app/pgsql/data/postgresql.conf
#直接添加在配置文件最下面就可以
listen_addresses = '*'
port = 5432
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
wal_log_hints = on
2、pg_hba.conf配置文件
vim /app/pgsql/data/pg_hba.conf
#直接在文件最下面添加
local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      msp-app01               trust
host    replication   repmgr      msp-app02               trust
host    replication   repmgr      msp-app03               trust
local   repmgr        repmgr                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      msp-app01               trust
host    repmgr        repmgr      msp-app02               trust
host    repmgr        repmgr      msp-app03               trust
3、创建repmgr用户和库
[root@msp-app01 src]# su - postgres
Last login: Fri Jan 26 10:09:52 CST 2024 on pts/0
[postgres@msp-app01 ~]$ createuser -s repmgr					#创建repmgr用户
[postgres@msp-app01 ~]$ createdb repmgr -O repmgr				#创建repmgr库,并将其所有者设置为repmgr用户
[postgres@msp-app01 ~]$ psql -U repmgr -d repmgr 				#以repmgr身份登录到repmgr库
psql (16.0)
Type "help" for help.

repmgr=# alter user repmgr with password '123456';			#修改repmgr密码
ALTER ROLE
repmgr=#
4、配置用户登录数据库免密
# root用户下添加以下内容到~/.pgpass文件,用户、数据库和密码修改为自己的即可

vim /home/postgres/.pgpass

# ip:port:repmgr:repmgr:repmgr
*:*:repmgr:repmgr:123456

# 修改 ~/.pgpass 文件权限
chmod 600 /home/postgres/.pgpass
5、重启生效
[root@msp-app01 src]#  su - postgres -c "pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop"
waiting for server to shut down.... done
server stopped
[root@msp-app01 src]#  su - postgres -c "pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start"
waiting for server to start.... done
server started
6、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}

chown -R postgres:postgres /etc/repmgr

配置文件内容:

vim /etc/repmgr/conf/repmgr.conf
# 节点ID,各节点标识
node_id=1
# 节点名称
node_name='msp-app01'
# 本节点数据库连接信息
conninfo='host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'

# 日志管理
log_level=INFO
# log文件需要提前创建
log_file='/etc/repmgr/log/repmgrd.log'
# 此设置导致repmgrd以指定的时间间隔(以秒为单位,默认为300)发出状态日志行,描述repmgrd的当前状态
log_status_interval=10
 
# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'

# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd  start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'

# failover设置
failover='automatic'
# 当repmgrd确定当前节点将成为新的主节点时,将在故障转移情况下执行promote_command中定义的程序或脚本
promote_command='repmgr standby promote   --log-to-file'
# %n将被替换repmgrd与新的主节点的ID,如果没有提供,repmgr standby follow将尝试自行确定新的主repmgr standby follow节点,但如果在新主节点提升后原主节点重新上线,则存在导致节点继续跟随原主节点的风险
follow_command='repmgr standby follow   --log-to-file --upstream-node-id=%n'
#发生特定事件时要执行的命令,比如升级备用节点为主节点
event_notification_command='bash /usr/local/bin/repmgr_promote_self.sh %n'
#要通知的事件类型,例如 'standby_promote' 表示备用节点升级为主节点时触发通知
event_notifications='standby_promote'

# 高可用参数设置
# 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
location='location1'
# 节点优先级,选主时可能使用到(lsn > priority > node_id),0代表该节点不会被提升为主节点
priority=100
# 是否将监控数据写入monitoring_history表
monitoring_history=yes
# 故障转移之前,尝试重新连接的间隔(以秒为单位)
reconnect_interval=5
# 故障转移之前,尝试重新连接的次数
reconnect_attempts=3
# ping: repmg使用PQPing()方法测试连接
# connection: 尝试与节点建立新的连接
# query:通过现有连接在节点上执行SQL语句
connection_check_type=ping
# 写入监控数据的间隔
monitor_interval_secs=5
use_replication_slots=true
7、VIP自动切换脚本
vim /usr/local/bin/repmgr_promote_self.sh
#!/usr/bin/env bash
#虚拟IP地址
VIP=192.168.198.145
#网络接口名称
IFACE=eno16777736
#postgresql的bin目录
PATH=/app/pgsql/bin:$PATH
VIP_CONF="${VIP}/23 dev ${IFACE} label ${IFACE}"
##IPDOWN
#通过SSH连接到另一台服务器上,删除指定的VIP配置
ssh postgres@msp-app02 sudo ip addr del ${VIP_CONF}
sleep 5;
##IPUP
#本地添加VIP
sudo /usr/sbin/ip addr add ${VIP_CONF}
sudo /usr/sbin/arping -b -A -c 3 -I ${IFACE} ${VIP}
exit 0

脚本授权

chmod +x /usr/local/bin/repmgr_promote_self.sh
8、注册本地节点为主服务器节点
su - postgres

repmgr -f /etc/repmgr/conf/repmgr.conf primary register
9、查看集群信息
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
10、数据库查看集群基础信息
[postgres@msp-app01 ~]$  psql -U repmgr -d repmgr
psql (16.0)
Type "help" for help.

repmgr=# \x 1
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+---------------------------------------------------------------------------
node_id          | 1
upstream_node_id | 
active           | t
node_name        | msp-app01
type             | primary
location         | location1
priority         | 100
conninfo         | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser         | repmgr
slot_name        | repmgr_slot_1
config_file      | /etc/repmgr/conf/repmgr.conf
11、启动repmgrd服务

修改postgresql.conf文件,加入 repmgr 共享库。

vim /app/pgsql/data/postgresql.conf

# 添加如下内容
shared_preload_libraries = 'repmgr'

重启数据库

pg_ctl -D /app/pgsql/data/ -l logfile restart

启动repmgrd服务

[postgres@msp-app01 ~]$ repmgrd -f /etc/repmgr/conf/repmgr.conf start
[2024-01-26 12:50:31] [NOTICE] redirecting logging output to "/etc/repmgr/log/repmgrd.log"

#查看/etc/repmgr/log/repmgrd.log启动是否有报错

12、添加VIP

sudo ip addr add 192.168.198.145/23 dev eno16777736

五、备用节点配置

1、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}

chown -R postgres:postgres /etc/repmgr

配置文件内容:

vim /etc/repmgr/conf/repmgr.conf
# 节点ID,各节点标识
node_id=2
# 节点名称
node_name='msp-app02'
# 本节点数据库连接信息
conninfo='host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'

# 日志管理
log_level=INFO
# log文件需要提前创建
log_file='/etc/repmgr/log/repmgrd.log'
# 此设置导致repmgrd以指定的时间间隔(以秒为单位,默认为300)发出状态日志行,描述repmgrd的当前状态
log_status_interval=10
 
# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'

# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd  start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'

# failover设置
failover='automatic'
# 当repmgrd确定当前节点将成为新的主节点时,将在故障转移情况下执行promote_command中定义的程序或脚本
promote_command='repmgr standby promote   --log-to-file'
# %n将被替换repmgrd与新的主节点的ID,如果没有提供,repmgr standby follow将尝试自行确定新的主repmgr standby follow节点,但如果在新主节点提升后原主节点重新上线,则存在导致节点继续跟随原主节点的风险
follow_command='repmgr standby follow   --log-to-file --upstream-node-id=%n'
#发生特定事件时要执行的命令,比如升级备用节点为主节点
event_notification_command='bash /usr/local/bin/repmgr_promote_self.sh %n'
#要通知的事件类型,例如 'standby_promote' 表示备用节点升级为主节点时触发通知
event_notifications='standby_promote'

# 高可用参数设置
# 定义节点位置的任意字符串,在故障转移期间用于检查当前主节点的可见性
location='location1'
# 节点优先级,选主时可能使用到(lsn > priority > node_id),0代表该节点不会被提升为主节点
priority=100
# 是否将监控数据写入monitoring_history表
monitoring_history=yes
# 故障转移之前,尝试重新连接的间隔(以秒为单位)
reconnect_interval=5
# 故障转移之前,尝试重新连接的次数
reconnect_attempts=3
# ping: repmg使用PQPing()方法测试连接
# connection: 尝试与节点建立新的连接
# query:通过现有连接在节点上执行SQL语句
connection_check_type=ping
# 写入监控数据的间隔
monitor_interval_secs=5
use_replication_slots=true
2、VIP自动切换脚本
vim /usr/local/bin/repmgr_promote_self.sh
#!/usr/bin/env bash
#虚拟IP地址
VIP=192.168.198.145
#网络接口名称
IFACE=eno16777736
#postgresql的bin目录
PATH=/app/pgsql/bin:$PATH
VIP_CONF="${VIP}/23 dev ${IFACE} label ${IFACE}"
##IPDOWN
#通过SSH连接到另一台服务器上,删除指定的VIP配置
ssh postgres@msp-app01 sudo ip addr del ${VIP_CONF}
sleep 5;
##IPUP
#本地添加VIP
sudo /usr/sbin/ip addr add ${VIP_CONF}
sudo /usr/sbin/arping -b -A -c 3 -I ${IFACE} ${VIP}
exit 0

脚本授权

chmod +x /usr/local/bin/repmgr_promote_self.sh
3、检查备库是否可克隆
su - postgres

# --dry-run表示命令测试,并不会实际执行,可用于验证是否会出现一些基本错误
repmgr -h msp-app01 -U repmgr -d repmgr -f /etc/repmgr/conf/repmgr.conf standby clone --dry-run

# 实际执行pg的克隆操作
repmgr -h msp-app01 -U repmgr -d repmgr -f /etc/repmgr/conf/repmgr.conf standby clone
4、启动备库
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start
5、注册从库为备用服务器
repmgr -f /etc/repmgr/conf/repmgr.conf --upstream-node-id=1 standby register
6、查看当前集群信息
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show


[postgres@msp-app02 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
 ID | Name      | Role    | Status    | Upstream  | Location  | Priority | Timeline | Connection string                                                         
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
 1  | msp-app01 | primary | * running |           | location1 | 100      | 1        | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 2  | msp-app02 | standby |   running | msp-app01 | location1 | 100      | 1        | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
7、启动repmgrd服务
# 启动 repmgrd 服务
[postgres@msp-app02 ~]$ repmgrd -f /etc/repmgr/conf/repmgr.conf start
[2024-01-26 13:11:12] [NOTICE] redirecting logging output to "/etc/repmgr/log/repmgrd.log"

#查看/etc/repmgr/log/repmgrd.log启动是否有报错

六、见证节点配置

1、启动节点postgres服务
su - postgres

#初始化数据库
initdb -D /app/pgsql/data

#启动
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log start
2、创建repmgr.conf配置文件
mkdir -p /etc/repmgr/{log,conf}

chown -R postgres:postgres /etc/repmgr

配置文件内容:

vim /etc/repmgr/conf/repmgr.conf

# 基本信息
node_id=3
# 节点名称,高可用集群各节点名称
node_name='msp-app03'
# 本节点数据库连接信息
conninfo='host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2'
# pg数据目录
data_directory='/app/pgsql/data'
# 流复制数据库用户,默认使用repmgr
replication_user='repmgr'
# repmgr软件目录
repmgr_bindir='/app/pgsql/bin'
# pg软件目录
pg_bindir='/app/pgsql/bin'            

# 日志管理
log_level=INFO
log_file='/etc/repmgr/log/repmgrd.log'
log_status_interval=10

# pg、repmgr服务管理命令
service_start_command='pg_ctl -D /app/pgsql/data/ start -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_stop_command='pg_ctl -D /app/pgsql/data/ stop -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_restart_command='pg_ctl -D /app/pgsql/data/ restart -o \'-c config_file=/app/pgsql/data/postgresql.conf\' -l /app/pgsql/log/start.log'
service_reload_command='pg_ctl reload'
# repmgrd运行时的pid文件
repmgrd_pid_file='/etc/repmgr/log/repmgrd.pid'
repmgrd_service_start_command='repmgrd -f /etc/repmgr/conf/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /etc/repmgr/log/repmgrd.pid`'

# failover设置
failover='automatic'
promote_command='repmgr standby promote -f /etc/repmgr/conf/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /etc/repmgr/conf/repmgr.conf --log-to-file --upstream-node-id=%n'

# 高可用参数设置
location='location1'
priority=100
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
connection_check_type=ping
monitor_interval_secs=5
use_replication_slots=true
3、启动repmgrd服务

修改postgresql.conf,加入repmgr共享库。

vim /app/pgsql/data/postgresql.conf 

# 添加如下内容
listen_addresses = '*'
shared_preload_libraries = 'repmgr'

修改 pg_hba.conf 文件

vim /app/pgsql/data/pg_hba.conf 

# 添加如下内容
host    repmgr        repmgr      0.0.0.0/0               trust

重启数据库,创建repmgr用户和库。

su - postgres

#重启
pg_ctl -D /app/pgsql/data/ -l logfile restart

#创建repmgr用户和库
[postgres@msp-app03 ~]$ psql -U postgres -d postgres
psql (16.0)
Type "help" for help.

postgres=# create user repmgr with password '123456' superuser replication;
CREATE ROLE
postgres=# create database repmgr owner repmgr;
CREATE DATABASE
postgres=# \q

启动repmgrd服务

repmgrd -f /etc/repmgr/conf/repmgr.conf start
4、注册witness
repmgr -f /etc/repmgr/conf/repmgr.conf witness register -h msp-app01 -U repmgr -d repmgr
5、查看集群
repmgr -f /etc/repmgr/conf/repmgr.conf cluster show

[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
 ID | Name      | Role    | Status    | Upstream  | Location  | Priority | Timeline | Connection string                                                         
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
 1  | msp-app01 | primary | * running |           | location1 | 100      | 1        | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 2  | msp-app02 | standby |   running | msp-app01 | location1 | 100      | 1        | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 3  | msp-app03 | witness | * running | msp-app01 | location1 | 0        | n/a      | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
6、集群最终效果
[postgres@msp-app01 pgsql]$  psql -U repmgr -d repmgr
psql (16.0)
Type "help" for help.

repmgr=# \x 1
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+---------------------------------------------------------------------------
node_id          | 1
upstream_node_id | 
active           | t
node_name        | msp-app01
type             | primary
location         | location1
priority         | 100
conninfo         | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser         | repmgr
slot_name        | repmgr_slot_1
config_file      | /etc/repmgr/conf/repmgr.conf
-[ RECORD 2 ]----+---------------------------------------------------------------------------
node_id          | 2
upstream_node_id | 1
active           | t
node_name        | msp-app02
type             | standby
location         | location1
priority         | 100
conninfo         | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser         | repmgr
slot_name        | repmgr_slot_2
config_file      | /etc/repmgr/conf/repmgr.conf
-[ RECORD 3 ]----+---------------------------------------------------------------------------
node_id          | 3
upstream_node_id | 1
active           | t
node_name        | msp-app03
type             | witness
location         | location1
priority         | 0
conninfo         | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2
repluser         | repmgr
slot_name        | repmgr_slot_3
config_file      | /etc/repmgr/conf/repmgr.conf

七、开启守护进程(所有节点)

su - postgres

/app/pgsql/bin/repmgrd -f /etc/repmgr/conf/repmgr.conf -d  -p /etc/repmgr/log/repmgrd.pid

八、模拟故障自动转换

1、查看集群状态
[postgres@msp-app01 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
 ID | Name      | Role    | Status    | Upstream  | Location  | Priority | Timeline | Connection string                                                         
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
 1  | msp-app01 | primary | * running |           | location1 | 100      | 1        | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 2  | msp-app02 | standby |   running | msp-app01 | location1 | 100      | 1        | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 3  | msp-app03 | witness | * running | msp-app01 | location1 | 0        | n/a      | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2

主节点IP信息

[postgres@msp-app01 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:ad:ac:5b brd ff:ff:ff:ff:ff:ff
    inet 192.168.198.146/24 brd 192.168.198.255 scope global dynamic eno16777736
       valid_lft 1390sec preferred_lft 1390sec
    inet 192.168.198.145/23 scope global eno16777736
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fead:ac5b/64 scope link 
       valid_lft forever preferred_lft forever
2、手动关闭主节点(primary),模拟故障宕机
pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop
3、通过见证服务器查看集群信息变化
[postgres@msp-app01 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
 ID | Name      | Role    | Status               | Upstream    | Location  | Priority | Timeline | Connection string                                                         
----+-----------+---------+----------------------+-------------+-----------+----------+----------+----------------------------------------------------------------------------
 1  | msp-app01 | primary | * running            |             | location1 | 100      | 1        | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 2  | msp-app02 | standby | ! running as primary |             | location1 | 100      | 2        | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 3  | msp-app03 | witness | * running            | ! msp-app02 | location1 | 0        | n/a      | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2

#提升新的primary节点完成后,旧的primary节点被至为failed
[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
 ID | Name      | Role    | Status    | Upstream  | Location  | Priority | Timeline | Connection string                                                         
----+-----------+---------+-----------+-----------+-----------+----------+----------+----------------------------------------------------------------------------
 1  | msp-app01 | primary | - failed  | ?         | location1 | 100      |          | host=msp-app01 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 2  | msp-app02 | primary | * running |           | location1 | 100      | 2        | host=msp-app02 user=repmgr dbname=repmgr password=123456 connect_timeout=2
 3  | msp-app03 | witness | * running | msp-app02 | location1 | 0        | n/a      | host=msp-app03 user=repmgr dbname=repmgr password=123456 connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "msp-app01" (ID: 1)

HINT: execute with --verbose option to see connection error messages
4、重新将旧的 primary 节点以 standby 的身份加入到集群中

测试是否能重新加入集群

pg_ctl -D /app/pgsql/data/ -l /app/pgsql/log/start.log stop
repmgr node rejoin -f /etc/repmgr/conf/repmgr.conf -h msp-app02 -U repmgr -d repmgr --dry-run --force-rewind

加入集群

repmgr node rejoin -f /etc/repmgr/conf/repmgr.conf -h msp-app02 -U repmgr -d repmgr --force-rewind
5、查看最终集群状态
[postgres@msp-app02 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:40:16:78 brd ff:ff:ff:ff:ff:ff
    inet 192.168.198.143/24 brd 192.168.198.255 scope global dynamic eno16777736
       valid_lft 1563sec preferred_lft 1563sec
    inet6 fe80::20c:29ff:fe40:1678/64 scope link 
       valid_lft forever preferred_lft forever

查看集群信息

[postgres@msp-app03 ~]$ repmgr -f /etc/repmgr/conf/repmgr.conf cluster show
ID | Name      | Role    | Status    | Upstream  | Location  | Priority | Timeline | Connection string 
----+-----------+---------+-----------+-----------+-----------+----------+----------+------------------
1  | msp-app01 | standby |   running | msp-app02 | location1 | 100      | 4        | host=msp-app01 user=rep
2  | msp-app02 | primary | * running |           | location1 | 100      | 4        | host=msp-app02 user=rep
3  | msp-app03 | witness | * running | msp-app02 | location1 | 0        | n/a      | host=msp-app03 user=rep
posted @ 2024-08-11 15:24  零の守墓人  阅读(68)  评论(0编辑  收藏  举报