repmgr搭建一主+一从+一witness的PostgreSQL高可用集群
测试环境说明
环境如下:
OS: ubuntu 20
PostgreSQL: PostgreSQL 16.4
repmgr: repmgr 5.3.3
三台测试环境,均已安装好PostgreSQL 16.4,数据库实例信息如下
IP postgresql实例名 身份
192.168.152.100: postgresql9000: 主节点
192.168.152.101: postgresql9000: 从节点
192.168.152.102: postgresql9000: witness节点
主节点上的操作步骤
###1 repmgr源码编译安装
drwxr-xr-x 4 root root 4096 Nov 5 12:45 ./
drwxr-xr-x 14 root root 4096 Sep 29 16:10 ../
-rw-r--r-- 1 root root 2784 Nov 6 11:28 pg_install.sh
drwxrwxrwx 6 1107 1107 4096 Nov 6 11:31 postgresql-16.4/
-rw-r--r-- 1 root root 32660355 Aug 6 04:19 postgresql-16.4.tar.gz
drwxrwxr-x 6 root root 4096 Nov 7 09:20 repmgr-5.3.3/
-rw-r--r-- 1 root root 456931 Oct 21 17:00 repmgr-5.3.3.tar.gz
root@ubuntu01:/usr/local/pg16#
root@ubuntu01:/usr/local/pg16# tar -xzvf repmgr-5.3.3.tar.gz
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3# cd repmgr-5.3.3/
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3# whereis pg_config
pg_config: /usr/local/pgsql16/server/bin/pg_config
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3#
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3#
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3# ./configure --prefix=/usr/local/pgsql16/server
checking for a sed that does not truncate output... /usr/bin/sed
checking for pg_config... /usr/local/pgsql16/server/bin/pg_config
configure: building against PostgreSQL 16.4
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h
config.status: config.h is unchanged
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3#
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3# make && make install
Building against PostgreSQL 16
make: Nothing to be done for 'all'.
Building against PostgreSQL 16
/usr/bin/mkdir -p '/usr/local/pgsql16/server/lib'
/usr/bin/mkdir -p '/usr/local/pgsql16/server/share/extension'
/usr/bin/mkdir -p '/usr/local/pgsql16/server/share/extension'
/usr/bin/mkdir -p '/usr/local/pgsql16/server/bin'
/usr/bin/install -c -m 755 repmgr.so '/usr/local/pgsql16/server/lib/repmgr.so'
/usr/bin/install -c -m 644 .//repmgr.control '/usr/local/pgsql16/server/share/extension/'
/usr/bin/install -c -m 644 .//repmgr--unpackaged--4.0.sql .//repmgr--unpackaged--5.1.sql .//repmgr--unpackaged--5.2.sql .//repmgr--unpackaged--5.3.sql .//repmgr--4.0.sql .//repmgr--4.0--4.1.sql .//repmgr--4.1.sql .//repmgr--4.1--4.2.sql .//repmgr--4.2.sql .//repmgr--4.2--4.3.sql .//repmgr--4.3.sql .//repmgr--4.3--4.4.sql .//repmgr--4.4.sql .//repmgr--4.4--5.0.sql .//repmgr--5.0.sql .//repmgr--5.0--5.1.sql .//repmgr--5.1.sql .//repmgr--5.1--5.2.sql .//repmgr--5.2.sql .//repmgr--5.2--5.3.sql .//repmgr--5.3.sql '/usr/local/pgsql16/server/share/extension/'
/usr/bin/install -c -m 755 repmgr repmgrd '/usr/local/pgsql16/server/bin/'
root@ubuntu01:/usr/local/pg16/repmgr-5.3.3#
注意:
1,编译后会在/usr/local/pgsql16/server/bin路径下生成repmgrd和repmgr两个文件
2,尝试偷懒,将本地编译的repmgrd和repmgr两个文件复制到其他机器上,绕过编译步骤,但是在启动repmgr的时候会一直提示找不到可执行文件,所以还是老老实实逐台机器编译。
3,提示找不到libpq.so.5,尝试建一个软链接后解决
root@ubuntu01:~# repmgr -V
repmgr: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory
root@ubuntu01:~# psql --version
psql (PostgreSQL) 16.4
root@ubuntu01:~# sudo find / -name libpq.so.5
/usr/local/pgsql16/server/lib/libpq.so.5
/usr/local/pg16/postgresql-16.4/src/interfaces/libpq/libpq.so.5
root@ubuntu01:~#
root@ubuntu01:~# ln -s /usr/local/pgsql16/server/lib/libpq.so.5 /usr/lib/x86_64-linux-gnu/libpq.so.5
root@ubuntu01:~# repmgr -V
repmgr 5.3.3
root@ubuntu01:~#
###2 主节点配置文件修改
wal_level = 'replica'或者'logic'
hot_standby = on
archive_mode = on
archive_command = 'test ! -f /postgres/product/archivedir/%f && cp %p /postgres/product/archivedir/%f'
shared_preload_libraries = 'repmgr'
###3 修改属主
由于是root用户编译的repmgr,所以重新授权
chown -R postgres:postgres /usr/local/pgsql16/
chmod 700 -R /usr/local/pgsql16/
###4 重启数据库
systemdctl restart postgresql9000
###5 创建repmgr数据库
--创建repmgr用户
create user repmgr replication password 'password123456';
--alter user repmgr replication password 'password123456';
alter user repmgr superuser;
create database repmgr owner repmgr;
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
alter user repmgr superuser ;
--进入repmgr数据库
create schema repmgr ;
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
--创建repmgr扩展
create extension repmgr;
--检查扩展
SELECT * FROM pg_extension;
###6 pg_hba.conf新增配置
#TYPE DATABASE USER ADDRESS METHOD [OPTIONS]
host repmgr repmgr 192.168.152.0/24 md5
host repmgr repmgr 192.168.152.0/24 md5
host replication repmgr 192.168.152.0/24 md5
host replication repmgr 192.168.152.0/24 md5
###7 编辑repmgr配置文件
###/usr/local/pgsql16/repmgr/repmgr.conf
node_id=1
node_name='ubuntu01'
conninfo='host=192.168.150.100 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100'
data_directory='/usr/local/pgsql16/pg9000/data'
pg_bindir='/usr/local/pgsql16/server/bin'
failover=automatic
promote_command='/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql16/server/bin/repmgr standby follow -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/usr/local/pgsql16/repmgr/repmgr.log'
###8 编辑密码文件
root@ubuntu01:~# su postgres
postgres@ubuntu01:/root$
postgres@ubuntu01:/root$
postgres@ubuntu01:/root$ cd /home/postgres/
postgres@ubuntu01:~$
postgres@ubuntu01:~$ vi ~/.pgpass
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password12345
对密码文件重新授权,一定一定一定要授权,否则后续节点注册的时候会提示connection to server at "192.168.152.100", port 9000 failed: fe_sendauth: no password supplied
chmod 0600 ~/.pgpass
###9 注册主节点
postgres@ubuntu01:~$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf primary register
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 100) registered
postgres@ubuntu01:~$
此时可以从repmgr数据库的表中查询到已注册的节点
select * from repmgr.repmgr.nodes;
node_id|upstream_node_id|active|node_name|type |location|priority|conninfo |repluser|slot_name|config_file |
-------+----------------+------+---------+-------+--------+--------+----------------------------------------------------------------------------------------------------+--------+---------+-------------------------------------+
100| |true |ubuntu01 |primary|default | 100|host=192.168.152.100 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
从节点上的操作步骤
###1 编译安装repmgr
与主节点一直,略
###2 编辑配置文件
node_id=101
node_name='192.168.152.101'
conninfo='host=192.168.152.101 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100'
data_directory='/usr/local/pgsql16/pg9000/data'
pg_bindir='/usr/local/pgsql16/server/bin'
failover=automatic
promote_command='/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql16/server/bin/repmgr standby follow -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/usr/local/pgsql16/repmgr/repmgr.log'
###3 首先停止从节点上的服务
systemctl stop postgresql9000
备份好当前节点的数据文件,清空数据文件目录内容
###4 编辑密码文件
root@ubuntu01:~# su postgres
postgres@ubuntu01:/root$
postgres@ubuntu01:/root$
postgres@ubuntu01:/root$ cd /home/postgres/
postgres@ubuntu01:~$
postgres@ubuntu01:~$ vi ~/.pgpass
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password12345
对密码文件重新授权,一定一定一定要授权,否则会提示connection to server at "192.168.152.100", port 9000 failed: fe_sendauth: no password supplied
chmod 0600 ~/.pgpass
###5 从主节点clone测试
/usr/local/pgsql16/server/bin/repmgr -h 192.168.152.100 -p 9000 -U repmgr -d repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby clone --dry-run
postgres@ubuntu02:/usr/local/pgsql16$ /usr/local/pgsql16/server/bin/repmgr -h 192.168.152.100 -p 9000 -U repmgr -d repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/usr/local/pgsql16/pg9000/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.152.100 port=9000 user=repmgr dbname=repmgr
DETAIL: current installation size is 28 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested; no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 100
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
/usr/local/pgsql16/server/bin/pg_basebackup -l "repmgr base backup" -D /usr/local/pgsql16/pg9000/data -h 192.168.152.100 -p 9000 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met
postgres@ubuntu02:/usr/local/pgsql16$
###6 从主节点clone
postgres@ubuntu02:/usr/local/pgsql16$ /usr/local/pgsql16/server/bin/repmgr -h 192.168.152.100 -p 9000 -U repmgr -d repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/usr/local/pgsql16/pg9000/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.152.100 port=9000 user=repmgr dbname=repmgr
DETAIL: current installation size is 28 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
INFO: checking and correcting permissions on existing directory "/usr/local/pgsql16/pg9000/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/usr/local/pgsql16/server/bin/pg_basebackup -l "repmgr base backup" -D /usr/local/pgsql16/pg9000/data -h 192.168.152.100 -p 9000 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /usr/local/pgsql16/pg9000/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
postgres@ubuntu02:/usr/local/pgsql16$
###7 从节点注册
postgres@ubuntu02:/usr/local/pgsql16$ /usr/local/pgsql16/server/bin/repmgr -f /usr/local/pgsql16/repmgr/repmgr.conf standby register
INFO: connecting to local node "192.168.152.101" (ID: 101)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 100)
INFO: standby registration complete
NOTICE: standby node "192.168.152.101" (ID: 101) successfully registered
postgres@ubuntu02:/usr/local/pgsql16$
###8 主节点上:检查节点以及信息
###主节点上:检查主从复制状态
select * from pg_stat_replication ;
pid |usesysid|usename|application_name|client_addr |client_hostname|client_port|backend_start |backend_xmin|state |sent_lsn |write_lsn|flush_lsn|replay_lsn|write_lag|flush_lag|replay_lag|sync_priority|sync_state
----+--------+-------+----------------+---------------+---------------+-----------+-----------------------------+------------+---------+---------+---------+---------+----------+---------+---------+----------+-------------+----------
2898| 16388|repmgr |192.168.152.101 |192.168.152.101| | 53602|2024-11-07 10:09:48.526447+08| |streaming|0/50007F0|0/50007F0|0/50007F0|0/50007F0 | | | | 0|async
###任何一个节点上:检查集群节点信息
select * from repmgr.nodes;
node_id|upstream_node_id|active|node_name |type |location|priority|conninfo |repluser|slot_name|config_file |
-------+----------------+------+---------------+-------+--------+--------+----------------------------------------------------------------------------------------------------+--------+---------+-------------------------------------+
100| |true |ubuntu01 |primary|default | 100|host=192.168.152.100 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
101| 100|true |192.168.152.101|standby|default | 100|host=192.168.152.101 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
witness 节点的操作步骤
###1 编译安装repmgr
与主节点一直,略
###2 编辑repmgr配置文件
node_id=102
node_name='192.168.152.102'
conninfo='host=192.168.152.102 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100'
data_directory='/usr/local/pgsql16/pg9000/data'
pg_bindir='/usr/local/pgsql16/server/bin'
failover=automatic
promote_command='/usr/local/pgsql16/server/bin/repmgr standby promote -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql16/server/bin/repmgr standby follow -f /usr/local/pgsql16/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/usr/local/pgsql16/repmgr/repmgr.log'
###3 编辑密码文件
###编辑密码文件
postgres@ubuntu03:/usr/local/pg_install_package$ vi ~/.pgpass
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:repmgr:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.100:9000:replication:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:repmgr:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.101:9000:replication:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:repmgr:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password123456
192.168.152.102:9000:replication:repmgr:password123456
###4 编辑配置文件
pg_hba.conf postgresql.conf 参考主节点即可
###5 重启数据库
systemdctl restart postgresql9000
###6 创建repmgr用户
create user repmgr replication password 'password123456';
--alter user repmgr replication password 'password123456';
alter user repmgr superuser;
create database repmgr owner repmgr;
ALTER USER repmgr SET search_path TO repmgr, "$user", public;
alter user repmgr superuser ;
###7 注册witness
postgres@ubuntu03:/usr/local/pg_install_package$ /usr/local/pgsql16/server/bin/repmgr -h 192.168.152.100 -U repmgr -d repmgr -p9000 -f /usr/local/pgsql16/repmgr/repmgr.conf witness register
INFO: connecting to witness node "192.168.152.102" (ID: 102)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "192.168.152.102" (ID: 102) successfully registered
postgres@ubuntu03:/usr/local/pg_install_package$
###8 检查节点信息以及复制状态
###主节点上:检查主从复制状态
pid |usesysid|usename|application_name|client_addr |client_hostname|client_port|backend_start |backend_xmin|state |sent_lsn |write_lsn|flush_lsn|replay_lsn|write_lag|flush_lag|replay_lag|sync_priority|sync_state
----+--------+-------+----------------+---------------+---------------+-----------+-----------------------------+------------+---------+---------+---------+---------+----------+---------+---------+----------+-------------+----------
2898| 16388|repmgr |192.168.152.101 |192.168.152.101| | 53602|2024-11-07 10:09:48.526447+08| |streaming|0/5000FA8|0/5000FA8|0/5000FA8|0/5000FA8 | | | | 0|async
###任何一个节点上:检查集群节点信息
select * from repmgr.nodes;
node_id|upstream_node_id|active|node_name |type |location|priority|conninfo |repluser|slot_name|config_file |
-------+----------------+------+---------------+-------+--------+--------+----------------------------------------------------------------------------------------------------+--------+---------+-------------------------------------+
100| |true |ubuntu01 |primary|default | 100|host=192.168.152.100 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
101| 100|true |192.168.152.101|standby|default | 100|host=192.168.152.101 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
102| 100|true |192.168.152.102|witness|default | 0|host=192.168.152.102 user=repmgr dbname=repmgr password=password123456 port=9000 connect_timeout=100|repmgr | |/usr/local/pgsql16/repmgr/repmgr.conf|
至此,一个最简单的基于repmgr的PostgreSQL流复制高可用环境搭建完成,整个过程虽然不复杂,但是比较繁琐。
另外,本文是1主1从1witness的集群结构,如果是1主多从,可以重复步骤2中的从节点上操作步骤即可。