随笔- 101 文章- 0 评论- 4 阅读- 11万

repmgr+pg16实现自动切换

一、环境配置

1.1 操作系统

麒麟V10 SP2 高级服务器版：GUI安装

1.2 软件准备

软件下载地址：
- PostgreSQL：https://www.postgresql.org/ftp/source/，版本为 postgresql-16.6.tar.gz
- repmgr：https://www.repmgr.org/，版本为repmgr-5.5.0.tar.gz

1.3 配置selinux

setenforce 0
sed -i 's/^SELINUX=./SELINUX=disabled/' /etc/selinux/config

1.4 关闭防火墙

systemctl stop firewalld.service
systemctl disable firewalld

1.5 安装软件依赖包

yum -y install readline readline-devel

1.6 配置hosts文件

echo "192.168.128.130 node01" >> /etc/hosts
echo "192.168.128.131 node02" >> /etc/hosts
echo "192.168.128.132 node03" >> /etc/hosts

说明：以上操作需要在所有服务器操作。

二、安装PostgreSQL数据库

2.1 创建用户及目录

创建用户：

useradd postgres
echo 'rdjc12#$' | passwd --stdin postgres

mkdir -p /data/pg_archive
mkdir -p /data/pg_data
mkdir -p /data/pg_log
mkdir -p /data/repmgr_log
chown -R postgres:postgres /data

2.2 安装数据库软件

tar -xf postgresql-16.6.tar.gz
cd postgresql-16.6
./configure --prefix=/usr/local/pgsql
make -j 4 && make install

2.3 修改环境变量

在/root/.bashrc和/home/postgres/.bashrc配置：

export PGPORT=5432
export PGDATA=/data/pg_data
export PGHOME=/usr/local/pgsql
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGUSER=postgres
export PGDATABASE=postgres

三、安装repmgr软件

安装依赖包

下载json-c-devel，ISO镜像中没有：
https://update.cs2c.com.cn/NS/V10/V10SP2/os/adv/lic/base/x86_64/Packages/json-c-devel-0.15-1.ky10.x86_64.rpm

yum install -y flex curl-devel json-c-devel

安装repmgr软件

tar -xf repmgr-5.5.0.tar.gz
cd repmgr-5.5.0/
./configure  PG_CONFIG=/usr/local/pgsql/bin/pg_config
make -j 4 && make install

查看安装的软件

在[root@node01 ~]$下执行：

ls /usr/local/pgsql/bin | grep repmgr

显示repmgr、repmgrd。

四、修改配置文件

4.1 PostgreSQL数据库配置

4.1.1 初始化数据库

仅在主节点node01初始化数据库：

[root@node01 ~]# su - postgres 
[postgres@node01 ~]$ initdb -D /data/pg_data -U postgres -W

4.1.2 配置postgresql.conf文件

shared_preload_libraries = 'repmgr'
listen_addresses = '*'
archive_mode = on
archive_command = 'test! -f /data/pg_archive/%f && cp %p /data/pg_archive/%f '
max_wal_senders = 10
max_replication_slots = 10
wal_level = replica
hot_standby = on
wal_log_hints = on
logging_collector = on
log_statement=ddl
log_destination=stderr
log_directory='/data/pg_log'
log_filename='postgres-%d.log'
log_truncate_on_rotation=on
log_rotation_age=1d
log_rotation_size=10MB
log_line_prefix='%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h'
log_checkpoints=on
log_lock_waits=on
log_autovacuum_min_duration=0
log_temp_files=0
lc_messages='C'

4.1.3 创建用户

createuser -s repmgr
createdb repmgr -O repmgr
psql -Upostgres -c 'ALTER USER repmgr SET search_path TO repmgr, "$user", public;

4.1.4 配置pg_hba.conf文件

local   all        all                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      192.168.128.0/24          trust
local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      192.168.128.0/24          trust

pg_ctl reload

4.2 repmgr配置

主备库都创建配置文件。

主库配置文件（node01）：

vim /usr/local/pgsql/repmgr.conf
# 基本信息
node_id=1                                                             # 节点ID，高可用集群各节点标识
node_name='node01'                                                    # 节点名称，高可用集群各节点名称
conninfo='host=node01 user=repmgr dbname=repmgr connect_timeout=2'    # 本节点数据库连接信息
data_directory='/data/pg_data/'                                       
replication_user='repmgr'                                            
repmgr_bindir='/usr/local/pgsql/bin/'                                  
pg_bindir='/usr/local/pgsql/bin/'                                      
#shutdown_check_timeout=10      

# 日志管理
log_level=INFO
log_file='/data/repmgr_log/repmgrd.log'
log_status_interval=10

# failover设置
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'

# 高可用参数设置
location='location1'                   # 多数据中心时标记服务器的位置，在故障转移期间用于检查当前主节点的可见性
priority=100                                                         # 节点优先级，选主时可能使用到（lsn > priority > node_id），0代表该节点不会被提升为主节点
monitoring_history=yes                                               # 是否将监控数据写入“monitoring_history”表
reconnect_interval=5                                                 # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=3                                                 # 故障转移之前，尝试重新连接的次数
monitor_interval_secs=2
use_replication_slots=true
connection_check_type=ping          # ping: repmgr 使用PQPing() 方法测试连接
                                    # connection: 尝试与节点建立新的连接
                                    # query: 通过现有连接在节点上执行 SQL 语句
#primary_visibility_consensus=false # 主机可见性共识，轮询每个备机（假如还有witness的话）最后一次看到主库的时间，如果有任何备库最近看到了主库，就可以合理地推断出主库仍然可用，不应该启动故障转移

 # pg、repmgr服务管理命令
service_start_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data start -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_stop_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data stop'
service_restart_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data restart -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_reload_command='su - postgres -c \'/usr/local/pgsql/bin/pg_ctl -D /data/pg_data reload\' '
repmgrd_pid_file='/tmp/repmgrd.pid'       
repmgrd_service_start_command='/usr/local/pgsql/bin/repmgrd -f /usr/local/pgsql/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`'

备库配置文件（node02）：

vim /usr/local/pgsql/repmgr.conf
# 基本信息
node_id=2                                                             # 节点ID，高可用集群各节点标识
node_name='node02'                                                    # 节点名称，高可用集群各节点名称
conninfo='host=node02 user=repmgr dbname=repmgr connect_timeout=2'    # 本节点数据库连接信息
data_directory='/data/pg_data/'                                       
replication_user='repmgr'                                            
repmgr_bindir='/usr/local/pgsql/bin/'                                  
pg_bindir='/usr/local/pgsql/bin/'                                      
#shutdown_check_timeout=10      

# 日志管理
log_level=INFO
log_file='/data/repmgr_log/repmgrd.log'
log_status_interval=10

# failover设置
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'

# 高可用参数设置
location='location1'                   # 多数据中心时标记服务器的位置，在故障转移期间用于检查当前主节点的可见性
priority=100                                                         # 节点优先级，选主时可能使用到（lsn > priority > node_id），0代表该节点不会被提升为主节点
monitoring_history=yes                                               # 是否将监控数据写入“monitoring_history”表
reconnect_interval=5                                                 # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=3                                                 # 故障转移之前，尝试重新连接的次数
monitor_interval_secs=2
use_replication_slots=true
connection_check_type=ping          # ping: repmgr 使用PQPing() 方法测试连接
                                    # connection: 尝试与节点建立新的连接
                                    # query: 通过现有连接在节点上执行 SQL 语句
#primary_visibility_consensus=false # 主机可见性共识，轮询每个备机（假如还有witness的话）最后一次看到主库的时间，如果有任何备库最近看到了主库，就可以合理地推断出主库仍然可用，不应该启动故障转移
# pg、repmgr服务管理命令
service_start_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data start -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_stop_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data stop'
service_restart_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data restart -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_reload_command='su - postgres -c \'/usr/local/pgsql/bin/pg_ctl -D /data/pg_data reload\' '
repmgrd_pid_file='/tmp/repmgrd.pid'       
repmgrd_service_start_command='/usr/local/pgsql/bin/repmgrd -f /usr/local/pgsql/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`'

备库配置文件（node03）：

vim /usr/local/pgsql/repmgr.conf
# 基本信息
node_id=3                                                             # 节点ID，高可用集群各节点标识
node_name='node03'                                                    # 节点名称，高可用集群各节点名称
conninfo='host=node03 user=repmgr dbname=repmgr connect_timeout=2'    # 本节点数据库连接信息
data_directory='/data/pg_data/'                                       
replication_user='repmgr'                                            
repmgr_bindir='/usr/local/pgsql/bin/'                                  
pg_bindir='/usr/local/pgsql/bin/'                                      
#shutdown_check_timeout=10      

# 日志管理
log_level=INFO
log_file='/data/repmgr_log/repmgrd.log'
log_status_interval=10

# failover设置
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /usr/local/pgsql/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /usr/local/pgsql/repmgr.conf --log-to-file --upstream-node-id=%n'

# 高可用参数设置
location='location1'                   # 多数据中心时标记服务器的位置，在故障转移期间用于检查当前主节点的可见性
priority=100                                                         # 节点优先级，选主时可能使用到（lsn > priority > node_id），0代表该节点不会被提升为主节点
monitoring_history=yes                                               # 是否将监控数据写入“monitoring_history”表
reconnect_interval=5                                                 # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=3                                                 # 故障转移之前，尝试重新连接的次数
monitor_interval_secs=2
use_replication_slots=true
connection_check_type=ping          # ping: repmgr 使用PQPing() 方法测试连接
                                    # connection: 尝试与节点建立新的连接
                                    # query: 通过现有连接在节点上执行 SQL 语句
#primary_visibility_consensus=false # 主机可见性共识，轮询每个备机（假如还有witness的话）最后一次看到主库的时间，如果有任何备机最近看到了主库，就可以合理地推断出主库仍然可用，不应该启动故障转移
# pg、repmgr服务管理命令
service_start_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data start -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_stop_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data stop'
service_restart_command='/usr/local/pgsql/bin/pg_ctl -D /data/pg_data restart -o \'-c config_file=/data/pg_data/postgresql.conf\' '
service_reload_command='su - postgres -c \'/usr/local/pgsql/bin/pg_ctl -D /data/pg_data reload\' '
repmgrd_pid_file='/tmp/repmgrd.pid'       
repmgrd_service_start_command='/usr/local/pgsql/bin/repmgrd -f /usr/local/pgsql/repmgr.conf start'
repmgrd_service_stop_command='kill -9 `cat /tmp/repmgrd.pid`'

4.3 配置互信

postgres用户配置免密（node01、node02、node03）：

ssh-keygen -t rsa
ssh-copy-id -i.ssh/id_rsa.pub postgres@node01
ssh-copy-id -i.ssh/id_rsa.pub postgres@node02
ssh-copy-id -i.ssh/id_rsa.pub postgres@node03

验证免密成功：

ssh postgres@node01 date
ssh postgres@node02 date
ssh postgres@node03 date

五、repmgr集群配置

5.1 注册主节点

为使repmgr支持复制集群，需将主节点注册到repmgr，这会安装扩展、元数据对象并添加主服务器元数据记录。在node1上执行以下操作：

切换用户并注册主库：

[root@node01 ~]# su - postgres
[postgres@node01 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf primary register --force

注册成功后会有相应提示，如扩展安装成功、主节点记录注册等信息。
2. 查看集群状态：

[postgres@node01 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show

此时只有node1这一主节点，展示信息包含节点ID、名称、角色、状态等内容。
3. 查看repmgr元数据表数据：

[postgres@node01 ~]$ psql  -Urepmgr repmgr
psql (14.8)
Type "help" for help.
repmgr=# \x
repmgr=# SELECT * FROM repmgr.nodes;

每个服务器在复制集群中都有自己记录，使用repmgrd时，部分字段会随节点状态或角色变化而更新。

5.2 克隆备库（node02）

测试克隆（dry run）：
在备节点node02上先测试能否克隆主库数据：

[root@node02 ~]# su - postgres
[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -h node01 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone --dry-run

若测试没问题，会提示相关信息，如walsenders数量足够、能建立复制连接等，表示具备“standby clone”的前提条件。
2. 实际克隆数据：

[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -h node01 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone

执行后会有相应过程提示，如开始备份、备份完成等，实际是用pg_basebackup命令克隆主节点的数据目录文件，主节点配置文件也会复制到备节点。
3. 启动备节点数据库服务：

[postgres@node02 ~]$ pg_ctl start

若无需针对备节点定制修改配置，即可启动服务。

5.3 克隆备库（node03）

步骤与克隆node02类似：

测试克隆（dry run）：

[root@node03 ~]# su - postgres
[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -h node01 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone --dry-run

测试通过会显示满足“standby clone”前提条件相关信息。
2. 实际克隆数据：

[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -h node01 -U repmgr -d repmgr -f /usr/local/pgsql/repmgr.conf standby clone

同样有备份过程提示，完成后可按需启动备节点数据库服务（如执行pg_ctl start）。

5.4 查看流复制

在数据库中执行以下命令查看流复制情况：

postgres=# select * from pg_stat_replication;

会展示如pid、应用名称、客户端地址、状态、各阶段LSN等相关流复制的详细信息。

5.5 注册备节点

node02注册为备节点：

[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby register --force

注册完成会有相应成功提示信息。
2. node03注册为备节点：

[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby register --force

同样注册成功后有对应提示，之后可查看集群状态（执行/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show），能看到node02、node03作为备机加入集群后的相关信息。

六、主备切换测试

6.1 手动切换（no --siblings-follow）

切换前集群状态查看：
先查看当前集群状态，各节点角色、状态等信息可通过以下命令查看：

[postgres@node01 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show

切换检测（dry run）：
在备节点node02上执行切换操作前先进行检测：

[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover   --dry-run

检测会显示如SSH连接情况、walsenders数量、能否执行切换等相关信息。
3. 执行node02切换为主：

[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover

切换过程有详细步骤提示，包括停止原主节点、提升备节点为主节点等操作，切换成功后可查看集群状态，此时node03的上游节点指向可能需要更改。
4. node03执行folow选项：

[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby follow

执行后可再次查看集群状态，确认各节点状态及上游节点指向情况。

6.2 手动切换（--siblings-follow）

切换前集群状态查看：
同样先查看当前集群状态：

[postgres@node02 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show

切换检测（使用--siblings-follow）：
在备节点node03上执行切换操作前进行检测：

[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover  --siblings-follow --dry-run

检测信息包含SSH连接、各节点相关资源情况等是否满足切换条件的内容。
3. 切换node03为主库（使用--siblings-follow）：

[postgres@node03 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf standby switchover --siblings-follow

执行切换过程有相应步骤提示，成功后查看集群状态，确认各节点角色、状态及上游节点指向等情况。

6.3 使用repmgrd实现自动failover

6.3.1 配置repmgrd

修改 postgresql.conf 文件：
添加shared_preload_libraries = 'repmgr'配置。
重启数据库：

pg_ctl stop 
pg_ctl start

创建日志文件：

su - postgres 
touch /data/repmgr_log/repmgrd.log

启动 repmgrd 服务：

su - postgres 
repmgrd start -f /usr/local/pgsql/repmgr.conf

以上操作需在所有服务器执行。

6.3.2 关闭主库自动failover测试

查看初始集群状态：

[postgres@node01 ~]$ /usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show

关闭主库：

pg_ctl stop

关闭主库后查看集群状态及相关日志：
查看集群状态（执行/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show），还可查看主库、备库的repmgrd日志了解详细情况，如主库日志中会记录连接丢失、尝试重连等相关信息，备库日志会记录节点状态变化等内容。
启动原来主库及查看后续集群状态：
启动原主库（执行pg_ctl start），可查看其日志了解恢复情况，最后再次查看集群状态确认各节点状态等信息（执行/usr/local/pgsql/bin/repmgr -f /usr/local/pgsql/repmgr.conf cluster show）。

6.4 将关闭原主库重新添加到集群中,恢复集群

关闭主库：

pg_ctl stop

注意:原主库处于关闭状态才能重新加入到集群

测试将关闭的原主库重新添加到集群中：

repmgr -f /usr/local/pgsql/repmgr.conf node rejoin -d 'host=node02 user=repmgr dbname=repmgr' --force-rewind --config-files=/data/pg_data/postgresql.conf,/data/pg_data/postgresql.auto.conf --verbose --dry-run

将关闭的原主库重新添加到集群中：

repmgr -f /usr/local/pgsql/repmgr.conf node rejoin -d 'host=node02 user=repmgr dbname=repmgr' --force-rewind --config-files=/data/pg_data/postgresql.conf,/data/pg_data/postgresql.auto.conf --verbose

参考链接：https://www.cnblogs.com/happy-0824/p/17648599.html

posted @ 2024-12-27 11:59 正在努力的BOY 阅读(65) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· postgresql学习

· mysql主从配置（1主2从）

· repmgr+pg14实现自动切换

· PostgreSQL Repmgr集群

· repmgr搭建一主+一从+一witness的PostgreSQL高可用集群

阅读排行：
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布：重大改进与新特性概览！
· AI与.NET技术实操系列（二）：开始使用ML.NET
· .NET10 - 预览版1新功能体验（一）

公告

昵称：正在努力的BOY
园龄： 9年10个月
粉丝： 3
关注： 0

+加关注

2025年3月

日

一

二

三

四

五

六

正在努力的BOY

repmgr+pg16实现自动切换

repmgr+pg16实现自动切换

一、环境配置

1.1 操作系统

1.2 软件准备

1.3 配置selinux

1.4 关闭防火墙

1.5 安装软件依赖包

1.6 配置hosts文件

二、安装PostgreSQL数据库

2.1 创建用户及目录

2.2 安装数据库软件

2.3 修改环境变量

三、安装repmgr软件

安装依赖包

安装repmgr软件

查看安装的软件

四、修改配置文件

4.1 PostgreSQL数据库配置

4.1.1 初始化数据库

4.1.2 配置postgresql.conf文件

4.1.3 创建用户

4.1.4 配置pg_hba.conf文件

4.2 repmgr配置

4.3 配置互信

五、repmgr集群配置

5.1 注册主节点

5.2 克隆备库（node02）

5.3 克隆备库（node03）

5.4 查看流复制

5.5 注册备节点

六、主备切换测试

6.1 手动切换（no --siblings-follow）

6.2 手动切换（--siblings-follow）

6.3 使用repmgrd实现自动failover

6.3.1 配置repmgrd

6.3.2 关闭主库自动failover测试

6.4 将关闭原主库重新添加到集群中,恢复集群

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论