PostgreSQL集群运维案例之---构建repmgr集群管理

案例说明：
在构建流复制集群后，可以通过repmgr实现集群的管理，并通过repmgr实现集群节点状态监控、switchover、failover切换等管理。

数据库版本：
PostgreSQL 14

集群节点信息：

[root@node101 ~]# cat /etc/hosts
192.168.1.101   node101    ；primary节点
192.168.1.102   node102    ； standby节点

一、部署和配置repmgr

1、配置postgres用户环境变量

[postgres@node102 ~]$ cat .bashrc
export PATH=$PATH:/usr/local/pg14/bin
export PGDATA=/data/pg14/data

2、解压安装repmgr

[root@node101 soft]# chown -R postgres.postgres /data/soft/repmgr-5.3.3
[postgres@node101 repmgr-5.3.3]$ ./configure
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /bin/pg_config
configure: error: repmgr is not compatible with detected PostgreSQL version: 9.2

[postgres@node101 repmgr-5.3.3]$ /bin/pg_config --version
PostgreSQL 9.2.24

---如上所示，在安装过程中调用pg_config，但是默认/bin下pg_config版本不兼容。


配置pg_config：
[root@node101 ~]# mv /bin/pg_config /bin/pg_config.92

# 链接pg14版本
[root@node101 ~]# ln -s /usr/local/pg14/bin/pg_config /bin/
[root@node101 ~]# ls -lh /bin/pg_config
lrwxrwxrwx 1 root root 29 May 17 15:19 /bin/pg_config -> /usr/local/pg14/bin/pg_config
[root@node102 ~]# /bin/pg_config --version
PostgreSQL 14.2


编译安装repmgr：
# 配置编译环境
[postgres@node101 repmgr-5.3.3]$ ./configure
checking for a sed that does not truncate output... /bin/sed
checking for pg_config... /bin/pg_config
configure: building against PostgreSQL 14.2
checking for gnused... no
checking for gsed... no
checking for sed... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating Makefile.global
config.status: creating config.h

# 安装
[postgres@node101 repmgr-5.3.3]$ make install
Building against PostgreSQL 14
......
'/usr/local/pg14/share/postgresql/extension/'
/usr/bin/install -c -m 755 repmgr repmgrd '/usr/local/pg14/bin/'

3、配置repmgr

1）查看可执行文件

[postgres@node101 repmgr-5.3.3]$ ls -l /usr/local/pg14/bin/repmgr*
-rwxr-xr-x 1 postgres postgres 528912 May 17 15:19 /usr/local/pg14/bin/repmgr
-rwxr-xr-x 1 postgres postgres 307960 May 17 15:19 /usr/local/pg14/bin/repmgrd

2）创建repmgr.conf文件

Tips：
编译安装默认不会生成repmgr.conf配置文件,可复制源码包的示例repmgr.conf.sample配置文件到指定目录。在使用repmgr命令需要指定repmgr.conf路径，通过-f/--config-file参数指定路径。如果找不到或无法读取该文件，则会引发错误，并且不会尝试检查默认位置。这是为了防止repmgr意外读取错误的配置文件。为了解决这个问题，可以将repmgr.con创建在pg_conf默认的sysconfidir路径下

[postgres@node101 repmgr-5.3.3]$  pg_config --sysconfdir
/usr/local/pg14/etc/postgresql

[postgres@node101 repmgr-5.3.3]$ mkdir -p /usr/local/pg14/etc/postgresql
[postgres@node101 repmgr-5.3.3]$ cp repmgr.conf.sample /usr/local/pg14/etc/postgresql/repmgr.conf

二、配置repmgr集群管理
1、配置repmgr extension

[postgres@node101 data]$ cat postgresql.conf|grep shared_pre
#shared_preload_libraries = ''  # (change requires restart)
shared_preload_libraries = 'repmgr'

# 支持pg_rewind
[postgres@node101 data]$ cat postgresql.conf |grep wal_log
wal_log_hints = on                      # also do full page writes of non-critical updates

[postgres@node101 bin]$ ./pg_ctl restart -D /data/pg14/data/

[postgres=# show shared_preload_libraries ;
 shared_preload_libraries
--------------------------
 repmgr
(1 row)

二、配置repmgr.conf及主备节点注册
1、repmgr.conf基本配置

主库：
[postgres@node101 postgresql]$ cat repmgr.conf
node_id=1
node_name='node101'
conninfo='host=192.168.1.101 port=5432 user=postgre  dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10


备库：
[postgres@node102 postgresql]$ cat repmgr.conf
node_id=2
node_name='node102'
conninfo='host=192.168.1.102 port=5432 user=postgre  dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10

2、注册主备节点

[postgres@node101 bin]$ ./repmgr primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[postgres@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string             
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres

[postgres@node102 bin]$ ./repmgr standby register
INFO: connecting to local node "node102" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node102" (ID: 2) successfully registered

[postgres@node102 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string             
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres

三、配置集群switchover和failover切换

1、配置repmgr.conf实现主备切换

主库：
node_id=1
node_name='node101'
conninfo='host=192.168.1.101 port=5432 user=postgre  dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
failover='automatic'
promote_command='/usr/local/pg14/bin/repmgr standby promote -f /usr/local/pg14/etc/postgresql/repmgr.conf'
follow_command='/usr/local/pg14/bin/repmgr standby follow -f /usr/local/pg14/etc/postgresql/repmgr.conf --upstream-node-id=%n'
service_start_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data start'
service_stop_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop'
service_restart_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart'
service_reload_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data reload'
repmgrd_service_start_command = '/usr/local/pg14/bin/repmgrd -f /usr/local/pg14/etc/postgresql/repmgr.conf  --pid-file /tmp/repmgrd.pid  --daemonize'
repmgrd_service_stop_command = 'repmgr node service --list-actions --action=stop'

备库：
node_id=2
node_name='node102'
conninfo='host=192.168.1.102 port=5432 user=postgre  dbname=postgres'
data_directory='/data/pg14/data'
log_file='/usr/local/pg14/etc/postgresql/repmgr.log'
connection_check_type=ping
reconnect_attempts=6
reconnect_interval=10
failover='automatic'
promote_command='/usr/local/pg14/bin/repmgr standby promote -f /usr/local/pg14/etc/postgresql/repmgr.conf'
follow_command='/usr/local/pg14/bin/repmgr standby follow -f /usr/local/pg14/etc/postgresql/repmgr.conf --upstream-node-id=%n'
service_start_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data start'
service_stop_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop'
service_restart_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart'
service_reload_command = '/usr/local/pg14/bin/pg_ctl -D /data/pg14/data reload'
repmgrd_service_start_command = '/usr/local/pg14/bin/repmgrd -f /usr/local/pg14/etc/postgresql/repmgr.conf  --pid-file /tmp/repmgrd.pid  --daemonize'
repmgrd_service_stop_command = 'repmgr node service --list-actions --action=stop'

配置repmgr.conf后重启数据库服务：（主备库）
[postgres@node101 bin]$ repmgr node service --action=restart
DETAIL: executing server command "/usr/local/pg14/bin/pg_ctl -D /data/pg14/data restart"

2、启动repmgrd服务（主备库）

Tips：
repmgrd服务会对本节点及primary节点的数据库服务状态执行监控，并在primary节点数据库服务出现故障时，执行failover的切换。

# primary节点
[postgres@node101 bin]$ ./repmgrd -d
[2023-05-17 17:18:25] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-05-17 17:18:25] [INFO] connecting to database "host=192.168.1.101 port=5432 user=postgre  dbname=postgres"
[postgres@node101 bin]$ INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-05-17 17:18:25] [NOTICE] starting monitoring of node "node101" (ID: 1)
[2023-05-17 17:18:25] [INFO] "connection_check_type" set to "ping"
[2023-05-17 17:18:25] [NOTICE] monitoring cluster primary "node101" (ID: 1)
[2023-05-17 17:18:25] [INFO] child node "node102" (ID: 2) is attached

# standby节点
[postgres@node102 bin]$ ./repmgrd -d
[2023-05-17 17:18:39] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
[2023-05-17 17:18:39] [INFO] connecting to database "host=192.168.1.102 port=5432 user=postgre  dbname=postgres"
[postgres@node102 bin]$ INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
[2023-05-17 17:18:39] [NOTICE] starting monitoring of node "node102" (ID: 2)
[2023-05-17 17:18:39] [INFO] "connection_check_type" set to "ping"
[2023-05-17 17:18:39] [INFO] monitoring connection to upstream node "node101" (ID: 1)

---如上所示，在standby节点启动repmgrd服务后，除了监控本节点状态，还要监控upstream节点（primary）节点状态。

检测repmgrd服务状态：

[postgres@node101 bin]$ ./repmgr service status
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | primary | * running |          | running | 28236 | yes     | n/a
 2  | node102 | standby |   running | node101  | running | 5739  | yes     | 1 second(s) ago

[postgres@node101 bin]$ ./repmgr service unpause
NOTICE: node 1 (node101) unpaused
NOTICE: node 2 (node102) unpaused

[postgres@node101 bin]$ ./repmgr service status
 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | primary | * running |          | running | 28236 | no      | n/a
 2  | node102 | standby |   running | node101  | running | 5739  | no      | 1 second(s) ago

3、执行主备switchover切换

1）切换前集群状态

[postgres@node102 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string              
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | standby |   running | node102  | default  | 100      | 2        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | primary | * running |          | default  | 100      | 2        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres

2）执行switchover切换

切换前测试：

[postgres@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U postgre -d postgres --dry-run
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: checking switchover on node "node101" (ID: 1) in --dry-run mode
INFO: SSH connection to host "192.168.1.102" succeeded
INFO: able to execute "repmgr" on remote host "192.168.1.102"
INFO: 1 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node101" (ID: 1) would be promoted to primary; current primary "node102" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "node102":
  "/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met

执行切换：

[postgres@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U postgre -d postgres
WARNING: following problems with command line parameters detected:
  database connection parameters not required when executing STANDBY SWITCHOVER
NOTICE: executing switchover on node "node101" (ID: 1)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node101" (ID: 1) will be promoted to primary; current primary "node102" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node102" (ID: 2)
NOTICE: issuing CHECKPOINT on node "node102" (ID: 2)
DETAIL: executing server command "/usr/local/pg14/bin/pg_ctl -D /data/pg14/data stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/16000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary
NOTICE: node "node101" (ID: 1) promoted to primary, node "node102" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node101" is now primary and node "node102" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

3）切换后集群状态

# 节点状态
[postgres@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string              
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 3        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | standby |   running | node101  | default  | 100      | 2        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres
 
 # 流复制状态
 postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_
start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_
lag | replay_lag | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-----------------
--------------+--------------+-----------+------------+------------+------------+------------+-----------+-------
----+------------+---------------+------------+-------------------------------
 24350 |       10 | postgre | node102          | 192.168.1.102 |                 |       10880 | 2023-05-18 15:27
:49.767836+08 |              | streaming | 0/16005BB8 | 0/16005BB8 | 0/16005BB8 | 0/16005BB8 |           |
    |            |             2 | sync       | 2023-05-18 15:28:58.361704+08
(1 row)

# 复制槽信息
postgres=# select * from pg_replication_slots;
  slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |
restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-
------------+---------------------+------------+---------------+-----------
 slot_node101 |        | physical  |        |          | f         | f      |            |  758 |              |
0/13004170  |                     | reserved   |               | f
 slot_node102 |        | physical  |        |          | f         | t      |      24350 |  758 |              |
0/16005BB8  |                     | reserved   |               | f
(2 rows)

4、执行failover切换

1）模拟主库数据库服务故障

[postgres@node101 bin]$ ./pg_ctl stop -D /data/pg14/data/
waiting for server to shut down.... done
server stopped

2）原备库被提升为新主库

[postgres@node102 ~]$ repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string              
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | primary | - failed  | ?        | default  | 100      |          | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | primary | * running |          | default  | 100      | 4        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres

WARNING: following issues were detected
  - unable to connect to node "node101" (ID: 1)
HINT: execute with --verbose option to see connection error messages

3）恢复原主库为新备库

# 创建standby.signal文件：
[postgres@node101 data]$ cat standby.signal
standby_mode=on

# 启动数据库服务：
[postgres@node101 data]$ cd /usr/local/pg14/bin
[postgres@node101 bin]$ ./pg_ctl start -D /data/pg14/data/
......
2023-05-18 16:13:55.215 CST [30199] LOG:  started streaming WAL from primary at 0/1B000000 on timeline 3
2023-05-18 16:13:55.215 CST [30199] LOG:  replication terminated by primary server
2023-05-18 16:13:55.215 CST [30199] DETAIL:  End of WAL reached on timeline 3 at 0/1B0000A0.
2023-05-18 16:13:55.215 CST [30195] LOG:  new target timeline is 4
2023-05-18 16:13:55.216 CST [30199] LOG:  restarted WAL streaming at 0/1B000000 on timeline 4
2023-05-18 16:13:55.252 CST [30195] LOG:  redo starts at 0/1B0000A0
 done
server started

4）查看集群节点状态

[postgres@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string              
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | standby |   running |          | default  | 100      | 3        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | primary | * running |          | default  | 100      | 4        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres

---如上所示，备库node101的upstream为空。

# 重新注册备库节点
[postgres@node101 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node101" (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "node101" (ID: 1) successfully registered

# 备库节点恢复正常
[postgres@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string              
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
 1  | node101 | standby |   running | node102  | default  | 100      | 3        | host=192.168.1.101 port=5432 user=postgre  dbname=postgres
 2  | node102 | primary | * running |          | default  | 100      | 4        | host=192.168.1.102 port=5432 user=postgre  dbname=postgres

5）查看原备库failover切换日志
如下图所示，在备库repmgr.log中，当备库repmgrd检测到primary数据库服务状态异常超过阈值（6次）后，提升当前standby节点为新主库。

6）failover切换失败案例

如下图所示，repmgrd服务处于pause状态：

四、查看repmgr元数据

postgres=# \d repmgr.*
                                 Table "repmgr.events"
     Column      |           Type           | Collation | Nullable |      Default
-----------------+--------------------------+-----------+----------+-------------------
 node_id         | integer                  |           | not null |
 event           | text                     |           | not null |
 successful      | boolean                  |           | not null | true
 event_timestamp | timestamp with time zone |           | not null | CURRENT_TIMESTAMP
 details         | text                     |           |          |

               Index "repmgr.idx_monitoring_history_time"
      Column       |           Type           | Key? |    Definition
-------------------+--------------------------+------+-------------------
 last_monitor_time | timestamp with time zone | yes  | last_monitor_time
 standby_node_id   | integer                  | yes  | standby_node_id
btree, for table "repmgr.monitoring_history"

                           Table "repmgr.monitoring_history"
          Column           |           Type           | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
 primary_node_id           | integer                  |           | not null |
 standby_node_id           | integer                  |           | not null |
 last_monitor_time         | timestamp with time zone |           | not null |
 last_apply_time           | timestamp with time zone |           |          |
 last_wal_primary_location | pg_lsn                   |           | not null |
 last_wal_standby_location | pg_lsn                   |           |          |
 replication_lag           | bigint                   |           | not null |
 apply_lag                 | bigint                   |           | not null |
Indexes:
    "idx_monitoring_history_time" btree (last_monitor_time, standby_node_id)

                               Table "repmgr.nodes"
      Column      |         Type          | Collation | Nullable |     Default
------------------+-----------------------+-----------+----------+-----------------
 node_id          | integer               |           | not null |
 upstream_node_id | integer               |           |          |
 active           | boolean               |           | not null | true
 node_name        | text                  |           | not null |
 type             | text                  |           | not null |
 location         | text                  |           | not null | 'default'::text
 priority         | integer               |           | not null | 100
 conninfo         | text                  |           | not null |
 repluser         | character varying(63) |           | not null |
 slot_name        | text                  |           |          |
 config_file      | text                  |           | not null |
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (node_id)
Check constraints:
    "nodes_type_check" CHECK (type = ANY (ARRAY['primary'::text, 'standby'::text, 'witness'::text, 'bdr'::text]))
Foreign-key constraints:
    "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE
Referenced by:
    TABLE "repmgr.nodes" CONSTRAINT "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE

       Index "repmgr.nodes_pkey"
 Column  |  Type   | Key? | Definition
---------+---------+------+------------
 node_id | integer | yes  | node_id
primary key, btree, for table "repmgr.nodes"

                           View "repmgr.replication_status"
          Column           |           Type           | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
 primary_node_id           | integer                  |           |          |
 standby_node_id           | integer                  |           |          |
 standby_name              | text                     |           |          |
 node_type                 | text                     |           |          |
 active                    | boolean                  |           |          |
 last_monitor_time         | timestamp with time zone |           |          |
 last_wal_primary_location | pg_lsn                   |           |          |
 last_wal_standby_location | pg_lsn                   |           |          |
 replication_lag           | text                     |           |          |
 replication_time_lag      | interval                 |           |          |
 apply_lag                 | text                     |           |          |
 communication_time_lag    | interval                 |           |          |

                   View "repmgr.show_nodes"
       Column       |  Type   | Collation | Nullable | Default
--------------------+---------+-----------+----------+---------
 node_id            | integer |           |          |
 node_name          | text    |           |          |
 active             | boolean |           |          |
 upstream_node_id   | integer |           |          |
 upstream_node_name | text    |           |          |
 type               | text    |           |          |
 priority           | integer |           |          |
 conninfo           | text    |           |          |

            Table "repmgr.voting_term"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 term   | integer |           | not null |
Indexes:
    "voting_term_restrict" UNIQUE, btree ((true))
Rules:
    voting_term_delete AS
    ON DELETE TO repmgr.voting_term DO INSTEAD NOTHING

 Index "repmgr.voting_term_restrict"
 Column |  Type   | Key? | Definition
--------+---------+------+------------
 bool   | boolean | yes  | (true)
unique, btree, for table "repmgr.voting_term"



postgres=# select * from  repmgr.nodes;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                          conni
nfo                          | repluser | slot_name |                config_file
---------+------------------+--------+-----------+---------+----------+----------+-------------------------------
-----------------------------+----------+-----------+--------------------------------------------
       1 |                  | t      | node101   | primary | default  |      100 | host=192.168.1.101 port=5432 u
ser=postgre  dbname=postgres | postgre  |           | /usr/local/pg14/etc/postgresql/repmgr.conf
       2 |                1 | t      | node102   | standby | default  |      100 | host=192.168.1.102 port=5432 u
ser=postgre  dbname=postgres | postgre  |           | /usr/local/pg14/etc/postgresql/repmgr.conf
(2 rows)

postgres=# select * from repmgr.events;
 node_id |           event            | successful |        event_timestamp        |
                            details
---------+----------------------------+------------+-------------------------------+-----------------------------
--------------------------------------------------------------------------------------------
       1 | cluster_created            | t          | 2023-05-17 15:55:40.78454+08  |
       1 | primary_register           | t          | 2023-05-17 15:55:40.788429+08 |
       2 | standby_register           | t          | 2023-05-17 15:58:14.183045+08 | standby registration succeed
ed; upstream node ID is 1
       1 | repmgrd_start              | t          | 2023-05-17 17:18:25.048943+08 | monitoring cluster primary "
node101" (ID: 1)
       2 | repmgrd_start              | t          | 2023-05-17 17:18:39.60289+08  | monitoring connection to ups
tream node "node101" (ID: 1)
       2 | repmgrd_upstream_reconnect | t          | 2023-05-17 17:27:55.253956+08 | reconnected to upstream node
 "node101" (ID: 1) after 189 seconds, resuming monitoring
       1 | repmgrd_local_reconnect    | t          | 2023-05-17 17:27:55.658594+08 | reconnected to primary node
after 190 seconds, resuming monitoring
       2 | standby_failure            | t          | 2023-05-17 17:38:12.140075+08 | unable to connect to local n
ode "node102" (ID: 2), marking inactive
       2 | standby_register           | t          | 2023-05-17 17:44:40.272532+08 | standby registration succeed
ed; upstream node ID is 1 (-F/--force option was used)
       1 | repmgrd_start              | t          | 2023-05-17 17:45:57.278488+08 | monitoring cluster primary "
node101" (ID: 1)
       2 | repmgrd_start              | t          | 2023-05-17 17:46:40.726297+08 | monitoring connection to ups
tream node "node101" (ID: 1)
       1 | repmgrd_local_reconnect    | t          | 2023-05-17 17:49:27.452806+08 | reconnected to primary node
after 94 seconds, resuming monitoring
       2 | repmgrd_upstream_reconnect | t          | 2023-05-17 17:49:28.283533+08 | reconnected to upstream node
 "node101" (ID: 1) after 94 seconds, resuming monitoring

.......

posted @ 2023-05-19 16:33 天涯客1224 阅读(299) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

天涯客1224

PostgreSQL集群运维案例之---构建repmgr集群管理

公告