PostgreSQL集群运维案例之---构建repmgr集群管理
案例说明:
在构建流复制集群后,可以通过repmgr实现集群的管理,并通过repmgr实现集群节点状态监控、switchover、failover切换等管理。
数据库版本:
PostgreSQL 14
集群节点信息:
[root@node101 ~]# cat /etc/hosts
192.168.1.101 node101 ;primary节点
192.168.1.102 node102 ; standby节点
一、部署和配置repmgr
1、配置postgres用户环境变量
[postgres@node102 ~]$ cat .bashrc
export PATH=$PATH:/usr/local/pg14/bin
export PGDATA=/data/pg14/data
2、解压安装repmgr
[root@node101 soft]# chown -R postgres.postgres /data/soft/repmgr-5.3.3
[postgres@node101 repmgr-5.3.3]$ ./configure
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /bin/pg_config
configure: error: repmgr is not compatible with detected PostgreSQL version: 9.2
[postgres@node101 repmgr-5.3.3]$ /bin/pg_config --version
PostgreSQL 9.2.24
---如上所示,在安装过程中调用pg_config,但是默认/bin下pg_config版本不兼容。
配置pg_config:
[root@node101 ~]# mv /bin/pg_config /bin/pg_config.92
# 链接pg14版本
[root@node101 ~]# ln -s /usr/local/pg14/bin/pg_config /bin/
[root@node101 ~]# ls -lh /bin/pg_config
lrwxrwxrwx 1 root root 29 May 17 15:19 /bin/pg_config -> /usr/local/pg14/bin/pg_config
[root@node102 ~]# /bin/pg_config --version
PostgreSQL 14.2
编译安装repmgr:
# 配置编译环境
[postgres@node101 repmgr-5.3.3]$ ./configure
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /bin/pg_config
configure: building against PostgreSQL 14.2
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h
# 安装
[postgres@node101 repmgr-5.3.3]$ make install
Building against PostgreSQL 14
......
'/usr/local/pg14/share/postgresql/extension/'
/usr/bin/install -c -m 755 repmgr repmgrd '/usr/local/pg14/bin/'
3、配置repmgr
1)查看可执行文件
[postgres@node101 repmgr-5.3.3]$ ls -l /usr/local/pg14/bin/repmgr*
-rwxr-xr-x 1 postgres postgres 528912 May 17 15:19 /usr/local/pg14/bin/repmgr
-rwxr-xr-x 1 postgres postgres 307960 May 17 15:19 /usr/local/pg14/bin/repmgrd
2)创建repmgr.conf文件
Tips:
编译安装默认不会生成repmgr.conf配置文件,可复制源码包的示例repmgr.conf.sample配置文件到指定目录。在使用repmgr命令需要指定repmgr.conf路径,通过-f/--config-file参数指定路径。如果找不到或无法读取该文件,则会引发错误,并且不会尝试检查默认位置。这是为了防止repmgr意外读取错误的配置文件。为了解决这个问题,可以将repmgr.con创建在pg_conf默认的sysconfidir路径下
[postgres@node101 repmgr-5.3.3]$ pg_config --sysconfdir
/usr/local/pg14/etc/postgresql
[postgres@node101 repmgr-5.3.3]$ mkdir -p /usr/local/pg14/etc/postgresql
[postgres@node101 repmgr-5.3.3]$ cp repmgr.conf.sample /usr/local/pg14/etc/postgresql/repmgr.conf
二、配置repmgr集群管理
1、配置repmgr extension
[postgres@node101 data]$ cat postgresql.conf|grep shared_pre
#shared_preload_libraries = '' # (change requires restart)
shared_preload_libraries = 'repmgr'
# 支持pg_rewind
[postgres@node101 data]$ cat postgresql.conf |grep wal_log
wal_log_hints = on # also do full page writes of non-critical updates
[postgres@node101 bin]$ ./pg_ctl restart -D /data/pg14/data/
[postgres=# show shared_preload_libraries ;
shared_preload_libraries
--------------------------
repmgr
(1 row)
二、配置repmgr.conf及主备节点注册
1、repmgr.conf基本配置
主库:
[postgres@node101 postgresql]$ cat repmgr.conf
node_id=1
node_name='node101'
conninfo='host=192.168.1.101 port=5432 user=postgre dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
备库:
[postgres@node102 postgresql]$ cat repmgr.conf
node_id=2
node_name='node102'
conninfo='host=192.168.1.102 port=5432 user=postgre dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
2、注册主备节点
[postgres@node101 bin]$ ./repmgr primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
[postgres@node102 bin]$ ./repmgr standby register
INFO: connecting to local node "node102" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node102" (ID: 2) successfully registered
[postgres@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
三、配置集群switchover和failover切换
1、配置repmgr.conf实现主备切换
主库:
node_id=1
node_name='node101'
conninfo='host=192.168.1.101 port=5432 user=postgre dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
failover='automatic'
promote_command='/usr/local/pg14/bin/repmgr standby promote -f /usr/local/pg14/etc/postgresql/repmgr.conf'
follow_command='/usr/local/pg14/bin/repmgr standby follow -f /usr/local/pg14/etc/postgresql/repmgr.conf --upstream-node-id=%n'
service_start_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data start'
service_stop_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop'
service_restart_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart'
service_reload_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data reload'
repmgrd_service_start_command = '/usr/local/pg14/bin/repmgrd -f /usr/local/pg14/etc/postgresql/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize'
repmgrd_service_stop_command = 'repmgr node service --list-actions --action=stop'
备库:
node_id=2
node_name='node102'
conninfo='host=192.168.1.102 port=5432 user=postgre dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
failover='automatic'
promote_command='/usr/local/pg14/bin/repmgr standby promote -f /usr/local/pg14/etc/postgresql/repmgr.conf'
follow_command='/usr/local/pg14/bin/repmgr standby follow -f /usr/local/pg14/etc/postgresql/repmgr.conf --upstream-node-id=%n'
service_start_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data start'
service_stop_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop'
service_restart_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart'
service_reload_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data reload'
repmgrd_service_start_command = '/usr/local/pg14/bin/repmgrd -f /usr/local/pg14/etc/postgresql/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize'
repmgrd_service_stop_command = 'repmgr node service --list-actions --action=stop'
配置repmgr.conf后重启数据库服务:(主备库)
[postgres@node101 bin]$ repmgr node service --action=restart
DETAIL: executing server command "/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart"
2、启动repmgrd服务(主备库)
Tips:
repmgrd服务会对本节点及primary节点的数据库服务状态执行监控,并在primary节点数据库服务出现故障时,执行failover的切换。
# primary节点
[postgres@node101 bin]$ ./repmgrd -d
[2023-05-17 17:18:25] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-05-17 17:18:25] [INFO] connecting to database "host=192.168.1.101 port=5432 user=postgre dbname=postgres"
[postgres@node101 bin]$ INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-05-17 17:18:25] [NOTICE] starting monitoring of node "node101" (ID: 1)
[2023-05-17 17:18:25] [INFO] "connection_check_type" set to "ping"
[2023-05-17 17:18:25] [NOTICE] monitoring cluster primary "node101" (ID: 1)
[2023-05-17 17:18:25] [INFO] child node "node102" (ID: 2) is attached
# standby节点
[postgres@node102 bin]$ ./repmgrd -d
[2023-05-17 17:18:39] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-05-17 17:18:39] [INFO] connecting to database "host=192.168.1.102 port=5432 user=postgre dbname=postgres"
[postgres@node102 bin]$ INFO: set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-05-17 17:18:39] [NOTICE] starting monitoring of node "node102" (ID: 2)
[2023-05-17 17:18:39] [INFO] "connection_check_type" set to "ping"
[2023-05-17 17:18:39] [INFO] monitoring connection to upstream node "node101" (ID: 1)
---如上所示,在standby节点启动repmgrd服务后,除了监控本节点状态,还要监控upstream节点(primary)节点状态。
检测repmgrd服务状态:
[postgres@node101 bin]$ ./repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | primary | * running | | running | 28236 | yes | n/a
2 | node102 | standby | running | node101 | running | 5739 | yes | 1 second(s) ago
[postgres@node101 bin]$ ./repmgr service unpause
NOTICE: node 1 (node101) unpaused
NOTICE: node 2 (node102) unpaused
[postgres@node101 bin]$ ./repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | primary | * running | | running | 28236 | no | n/a
2 | node102 | standby | running | node101 | running | 5739 | no | 1 second(s) ago
3、执行主备switchover切换
1)切换前集群状态
[postgres@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | standby | running | node102 | default | 100 | 2 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | primary | * running | | default | 100 | 2 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
2)执行switchover切换
切换前测试:
[postgres@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U postgre -d postgres --dry-run
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: checking switchover on node "node101" (ID: 1) in --dry-run mode
INFO: SSH connection to host "192.168.1.102" succeeded
INFO: able to execute "repmgr" on remote host "192.168.1.102"
INFO: 1 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node101" (ID: 1) would be promoted to primary; current primary "node102" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "node102":
"/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
执行切换:
[postgres@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U postgre -d postgres
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: executing switchover on node "node101" (ID: 1)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node101" (ID: 1) will be promoted to primary; current primary "node102" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node102" (ID: 2)
NOTICE: issuing CHECKPOINT on node "node102" (ID: 2)
DETAIL: executing server command "/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/16000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary
NOTICE: node "node101" (ID: 1) promoted to primary, node "node102" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node101" is now primary and node "node102" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
3)切换后集群状态
# 节点状态
[postgres@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 3 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | standby | running | node101 | default | 100 | 2 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
# 流复制状态
postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_
start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_
lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-----------------
--------------+--------------+-----------+------------+------------+------------+------------+-----------+-------
----+------------+---------------+------------+-------------------------------
24350 | 10 | postgre | node102 | 192.168.1.102 | | 10880 | 2023-05-18 15:27
:49.767836+08 | | streaming | 0/16005BB8 | 0/16005BB8 | 0/16005BB8 | 0/16005BB8 | |
| | 2 | sync | 2023-05-18 15:28:58.361704+08
(1 row)
# 复制槽信息
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-
------------+---------------------+------------+---------------+-----------
slot_node101 | | physical | | | f | f | | 758 | |
0/13004170 | | reserved | | f
slot_node102 | | physical | | | f | t | 24350 | 758 | |
0/16005BB8 | | reserved | | f
(2 rows)
4、执行failover切换
1)模拟主库数据库服务故障
[postgres@node101 bin]$ ./pg_ctl stop -D /data/pg14/data/
waiting for server to shut down.... done
server stopped
2)原备库被提升为新主库
[postgres@node102 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | primary | - failed | ? | default | 100 | | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | primary | * running | | default | 100 | 4 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
WARNING: following issues were detected
- unable to connect to node "node101" (ID: 1)
HINT: execute with --verbose option to see connection error messages
3)恢复原主库为新备库
# 创建standby.signal文件:
[postgres@node101 data]$ cat standby.signal
standby_mode=on
# 启动数据库服务:
[postgres@node101 data]$ cd /usr/local/pg14/bin
[postgres@node101 bin]$ ./pg_ctl start -D /data/pg14/data/
......
2023-05-18 16:13:55.215 CST [30199] LOG: started streaming WAL from primary at 0/1B000000 on timeline 3
2023-05-18 16:13:55.215 CST [30199] LOG: replication terminated by primary server
2023-05-18 16:13:55.215 CST [30199] DETAIL: End of WAL reached on timeline 3 at 0/1B0000A0.
2023-05-18 16:13:55.215 CST [30195] LOG: new target timeline is 4
2023-05-18 16:13:55.216 CST [30199] LOG: restarted WAL streaming at 0/1B000000 on timeline 4
2023-05-18 16:13:55.252 CST [30195] LOG: redo starts at 0/1B0000A0
done
server started
4)查看集群节点状态
[postgres@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | standby | running | | default | 100 | 3 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | primary | * running | | default | 100 | 4 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
---如上所示,备库node101的upstream为空。
# 重新注册备库节点
[postgres@node101 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node101" (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node101" (ID: 1) successfully registered
# 备库节点恢复正常
[postgres@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node101 | standby | running | node102 | default | 100 | 3 | host=192.168.1.101 port=5432 user=postgre dbname=postgres
2 | node102 | primary | * running | | default | 100 | 4 | host=192.168.1.102 port=5432 user=postgre dbname=postgres
5)查看原备库failover切换日志
如下图所示,在备库repmgr.log中,当备库repmgrd检测到primary数据库服务状态异常超过阈值(6次)后,提升当前standby节点为新主库。
6)failover切换失败案例
如下图所示,repmgrd服务处于pause状态:
四、查看repmgr元数据
postgres=# \d repmgr.*
Table "repmgr.events"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+-------------------
node_id | integer | | not null |
event | text | | not null |
successful | boolean | | not null | true
event_timestamp | timestamp with time zone | | not null | CURRENT_TIMESTAMP
details | text | | |
Index "repmgr.idx_monitoring_history_time"
Column | Type | Key? | Definition
-------------------+--------------------------+------+-------------------
last_monitor_time | timestamp with time zone | yes | last_monitor_time
standby_node_id | integer | yes | standby_node_id
btree, for table "repmgr.monitoring_history"
Table "repmgr.monitoring_history"
Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
primary_node_id | integer | | not null |
standby_node_id | integer | | not null |
last_monitor_time | timestamp with time zone | | not null |
last_apply_time | timestamp with time zone | | |
last_wal_primary_location | pg_lsn | | not null |
last_wal_standby_location | pg_lsn | | |
replication_lag | bigint | | not null |
apply_lag | bigint | | not null |
Indexes:
"idx_monitoring_history_time" btree (last_monitor_time, standby_node_id)
Table "repmgr.nodes"
Column | Type | Collation | Nullable | Default
------------------+-----------------------+-----------+----------+-----------------
node_id | integer | | not null |
upstream_node_id | integer | | |
active | boolean | | not null | true
node_name | text | | not null |
type | text | | not null |
location | text | | not null | 'default'::text
priority | integer | | not null | 100
conninfo | text | | not null |
repluser | character varying(63) | | not null |
slot_name | text | | |
config_file | text | | not null |
Indexes:
"nodes_pkey" PRIMARY KEY, btree (node_id)
Check constraints:
"nodes_type_check" CHECK (type = ANY (ARRAY['primary'::text, 'standby'::text, 'witness'::text, 'bdr'::text]))
Foreign-key constraints:
"nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE
Referenced by:
TABLE "repmgr.nodes" CONSTRAINT "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE
Index "repmgr.nodes_pkey"
Column | Type | Key? | Definition
---------+---------+------+------------
node_id | integer | yes | node_id
primary key, btree, for table "repmgr.nodes"
View "repmgr.replication_status"
Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
primary_node_id | integer | | |
standby_node_id | integer | | |
standby_name | text | | |
node_type | text | | |
active | boolean | | |
last_monitor_time | timestamp with time zone | | |
last_wal_primary_location | pg_lsn | | |
last_wal_standby_location | pg_lsn | | |
replication_lag | text | | |
replication_time_lag | interval | | |
apply_lag | text | | |
communication_time_lag | interval | | |
View "repmgr.show_nodes"
Column | Type | Collation | Nullable | Default
--------------------+---------+-----------+----------+---------
node_id | integer | | |
node_name | text | | |
active | boolean | | |
upstream_node_id | integer | | |
upstream_node_name | text | | |
type | text | | |
priority | integer | | |
conninfo | text | | |
Table "repmgr.voting_term"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
term | integer | | not null |
Indexes:
"voting_term_restrict" UNIQUE, btree ((true))
Rules:
voting_term_delete AS
ON DELETE TO repmgr.voting_term DO INSTEAD NOTHING
Index "repmgr.voting_term_restrict"
Column | Type | Key? | Definition
--------+---------+------+------------
bool | boolean | yes | (true)
unique, btree, for table "repmgr.voting_term"
postgres=# select * from repmgr.nodes;
node_id | upstream_node_id | active | node_name | type | location | priority | conni
nfo | repluser | slot_name | config_file
---------+------------------+--------+-----------+---------+----------+----------+-------------------------------
-----------------------------+----------+-----------+--------------------------------------------
1 | | t | node101 | primary | default | 100 | host=192.168.1.101 port=5432 u
ser=postgre dbname=postgres | postgre | | /usr/local/pg14/etc/postgresql/repmgr.conf
2 | 1 | t | node102 | standby | default | 100 | host=192.168.1.102 port=5432 u
ser=postgre dbname=postgres | postgre | | /usr/local/pg14/etc/postgresql/repmgr.conf
(2 rows)
postgres=# select * from repmgr.events;
node_id | event | successful | event_timestamp |
details
---------+----------------------------+------------+-------------------------------+-----------------------------
--------------------------------------------------------------------------------------------
1 | cluster_created | t | 2023-05-17 15:55:40.78454+08 |
1 | primary_register | t | 2023-05-17 15:55:40.788429+08 |
2 | standby_register | t | 2023-05-17 15:58:14.183045+08 | standby registration succeed
ed; upstream node ID is 1
1 | repmgrd_start | t | 2023-05-17 17:18:25.048943+08 | monitoring cluster primary "
node101" (ID: 1)
2 | repmgrd_start | t | 2023-05-17 17:18:39.60289+08 | monitoring connection to ups
tream node "node101" (ID: 1)
2 | repmgrd_upstream_reconnect | t | 2023-05-17 17:27:55.253956+08 | reconnected to upstream node
"node101" (ID: 1) after 189 seconds, resuming monitoring
1 | repmgrd_local_reconnect | t | 2023-05-17 17:27:55.658594+08 | reconnected to primary node
after 190 seconds, resuming monitoring
2 | standby_failure | t | 2023-05-17 17:38:12.140075+08 | unable to connect to local n
ode "node102" (ID: 2), marking inactive
2 | standby_register | t | 2023-05-17 17:44:40.272532+08 | standby registration succeed
ed; upstream node ID is 1 (-F/--force option was used)
1 | repmgrd_start | t | 2023-05-17 17:45:57.278488+08 | monitoring cluster primary "
node101" (ID: 1)
2 | repmgrd_start | t | 2023-05-17 17:46:40.726297+08 | monitoring connection to ups
tream node "node101" (ID: 1)
1 | repmgrd_local_reconnect | t | 2023-05-17 17:49:27.452806+08 | reconnected to primary node
after 94 seconds, resuming monitoring
2 | repmgrd_upstream_reconnect | t | 2023-05-17 17:49:28.283533+08 | reconnected to upstream node
"node101" (ID: 1) after 94 seconds, resuming monitoring
.......
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
2022-05-19 KingbaseES V8R6集群维护案例之--集群环境wal日志清理