首页  :: 新随笔  :: 管理

PostgreSQL 14.5复制流(同步复制)

Posted on 2022-09-22 21:57  高&玉  阅读(1981)  评论(0编辑  收藏  举报

1 介绍

        PostgresSQL在9.0版本实现的复制功能只能进行异步复制,在PostgreSQL 9.1版本中可以支持同步复制

        PostgreSQL流复制是基于WAL日志传输实现的:主库发送WAL日志,备库接收WAL日志并进行回放。

 

流复制的3个主要进程:

  • 主库的walsender进程:负责发送WAL日志给备库。
  • 备库的walreceiver进程:负责主库发送的WAL日志。
  • 备库的startup进程:重放接收的WAL日志。

 

主从之间是怎么通信的?

(1)primary端 后端进程通过执行函数XLogInsert()和XLogFlush(),将WAL数据写入并刷新到WAL段文件中。

(2)primary端 WAL发送器进程将写入WAL段文件的WAL数据发送到WAL接收器进程。

(3)primary端 在发送WAL数据之后,后端进程继续等待来自备库的ACK响应。更确切地说,后端进程通过执行内部函数SyncRepWaitForLSN()来获取锁存器,并等待它被释放。

(4)standby端 上的WAL接收器通过write()系统函数调用,将接收到的WAL数据写入备库的WAL段,并向WAL发送器返回ACK响应。

(5)standby端 WAL接收器通过调用(如fsync()函数)将WAL数据刷新到WAL段中,向WAL发送器返回另一个ACK响应,并通知启动进程相关WAL数据已更新。

(6)standby端 启动进程重放已写入WAL段的WAL数据。

(7)primary端 WAL发送器在收到来自WAL接收器的ACK响应后释放后端进程的锁存器,然后后端进程完成commit或abort动作。锁存器释放的时间取决于参数synchronous_commit。如果它是‘on’(默认),那么当接收到步骤(5)的ACK时,锁存器被释放;如果它是'remote_write',当接收到步骤(4)的ACK时,其被释放。

 

每个ACK响应将备库的内部信息通知给主库,包含以下4个项目:

  • 已写入最新WAL数据的LSN位置。
  • 已刷新最新WAL数据的LSN位置。
  • 启动进程已经重放最新的WAL数据的LSN。
  • 发送此响应的时间戳。

2 安装PostgreSQL 14.5

CentOS 7.6

PostgreSQL 14.5

 

安装步骤请参考:PostgreSQL 14.5 for CentOS安装

3 配置复制流(同步复制)

同步流复制primary数据库要求wal日志写入standby数据库commit后才会返回成功,所以当standby与primary断开时,primary端会hang住。为了解决这个问题需要与primary配置至少两个standby,确保primary数据库的wal日志写入任意一个standby数据库并commit。

3.1 配置“复制槽”

        standby端如果长时间停机,重启后standby可能因缺少相应的WAL日志无法连接primary。此时可以通过启用max_replication_slots参数启用复制槽来解决此问题。

        primary端实例会一直保留预写日志(WAL)文件,直到所有备库所需的插槽都确认已接收到特定段为止。只有完成此操作后,主库实例才会移除相应的WAL文件。

 

Primary端创建复制槽

SELECT * FROM pg_create_physical_replication_slot('pg_primary');

 

查看复制槽

postgres=# select * from pg_replication_slots;
 slot_name  | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
 pg_primary |        | physical  |        |          | t      |     116077 |      |              | 89/A40003E0 | 

3.2 开启归档

开启归档模式

创建归档目录

[postgres]# mkdir $PGDATA/arch_log

配置归档脚本

[postgres]# vi $PGDATA/archive.sh
cp --preserve=timestamps $1 $PGDATA/arch_log/$2 ; find $PGDATA/arch_log -type f -mtime +30 | xargs rm -fr;

修改归档相关参数

[postgres]# vi $PGDATA/postgresql.conf
archive_mode = on
archive_command = '/bin/bash archive.sh %p %f'
archive_timeout = 1800

重启postgres使归档参数生效

[postgres]# pg_ctl restart

查看参数是否已经生效

postgres=# select name,setting from pg_settings where name in ('archive_mode','archive_command','archive_timeout');
      name       |          setting           
-----------------+----------------------------
 archive_command | /bin/bash archive.sh %p %f
 archive_mode    | on
 archive_timeout | 1800

手动归档

postgres=# select pg_switch_xlog();
 pg_switch_wal 
---------------
 0/4000160

查看归档目录是否已存在归档文件

[postgres]# ls $PGDATA/arch_log
000000010000000000000004

3.3 primary端配置

 创建复制用户

postgres=# create role repl login replication encrypted password 'Your_passwd';
CREATE ROLE

配置权限配置文件

[postgres]# vi pg_hba.conf
添加如下:
host replication repl 0.0.0.0/0        md5

 配置参数(这个必须配置)

[postgres]# vi postgresql.conf
listen_addresses = '*'
wal_level = replica
hot_standby = on    #指定备库可读模式
synchronous_commit = on    #配置同步模式
synchronous_standby_names = 'host72,host73'    #只要primary端wal传到任意一台standby并commit就OK(此处是application_name)

重启PG使参数生效

[postgres]# pg_ctl restart

 3.4 配置standby端

 删除standby端数据

[postgres]# pg_ctl stop
[postgres]# rm -fr $PGDATA/*

复制primary端到standby端,在$PGDATA路径下会生成standby.signal文件

[postgres]# pg_basebackup -h 192.168.2.221 -U repl -D $PGDATA -X stream -P -R

 

配置主从参数

[postgres]# vi $PGDATA/postgresql.auto.conf
primary_conninfo = 'application_name=host72 user=repl password=Your_passwd channel_binding=disable host=192.168.1.71 port=5432 sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'

[postgres]# vi $PGDATA/postgresql.conf
#primary_conninfo = 'host=192.168.1.71 port=5432 user=repl password=Your_passwd'    #我前面配置了postgresql.auto.conf此处不配置
recovery_target_timeline = 'latest'
primary_slot_name = 'pg_primary'
hot_standby = on    #配置standby端可读

3.5 启动流复制

standby端启动postgres

[postgres]# pg_ctl start

 

primary端查看流复制详情

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 22930
usesysid         | 16384
usename          | repl
application_name | host72
client_addr      | 192.168.1.72
client_hostname  | 
client_port      | 54718
backend_start    | 2022-09-22 21:42:14.120187+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6F0003B8
write_lsn        | 0/6F0003B8
flush_lsn        | 0/6F0003B8
replay_lsn       | 0/6F0003B8
write_lag        | 00:00:00.000336
flush_lag        | 00:00:00.000336
replay_lag       | 00:00:00.000336
sync_priority    | 1
sync_state       | sync
reply_time       | 2022-09-22 21:42:14.229005+08
-[ RECORD 1 ]----+------------------------------
pid              | 22930
usesysid         | 16384
usename          | repl
application_name | host72
client_addr      | 192.168.1.73
client_hostname  | 
client_port      | 54718
backend_start    | 2022-09-22 21:42:14.120187+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/6F0003B8
write_lsn        | 0/6F0003B8
flush_lsn        | 0/6F0003B8
replay_lsn       | 0/6F0003B8
write_lag        | 00:00:00.000336
flush_lag        | 00:00:00.000336
replay_lag       | 00:00:00.000336
sync_priority    | 1
sync_state       | sync
reply_time       | 2022-09-22 21:42:14.229005+08

4 监控复制流

Primary端查看已连接的Standby端信息,如果为空说明复制流异常

postgres=# \x on;
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 4007
usesysid         | 16384
usename          | repl
application_name | walreceiver
client_addr      | 192.168.1.72
client_hostname  | host72
client_port      | 34596
backend_start    | 2022-09-21 17:47:09.629914+08
backend_xmin     | 
state            | streaming
sent_location    | 1/F4000060
write_location   | 1/F4000060
flush_location   | 1/F4000060
replay_location  | 1/F4000060
sync_priority    | 0
sync_state       | sync
----+------------------------------
pid              | 116097
usesysid         | 16385
usename          | repl
application_name | walreceiver
client_addr      | 192.168.1.73
client_hostname  | 
client_port      | 45984
backend_start    | 2021-12-20 16:17:58.493769+08
backend_xmin     | 
state            | streaming
sent_location    | 89/A2BE8E10
write_location   | 89/A2BE8E10
flush_location   | 89/A2BE8E10
replay_location  | 89/A2BE8E10
sync_priority    | 0
sync_state       | sync

 

Standby端查看是否处于recovery状态,值为“t”说明是当前数据库是备库,值为“f”说明当前数据库是主库(也可能复制流出现异常出现双主)

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t

 

通过pg_controldata命令查看Database cluster state,值为“in archive recovery”说明是当前数据库是备库,值为“in production”说明当前数据库是主库

[postgres@host72 postgres]$ pg_controldata $DATA
pg_control version number:            960
Catalog version number:               201608131
Database system identifier:           7122063347532860770
Database cluster state:               in archive recovery
pg_control last modified:             Wed 21 Sep 2022 05:52:09 PM CST
Latest checkpoint location:           1/F4000098
Prior checkpoint location:            1/F0000060
Latest checkpoint's REDO location:    1/F4000060
Latest checkpoint's REDO WAL file:    00000001000000010000003D
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:1762
Latest checkpoint's NextOID:          24576
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        1753
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  1762
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Wed 21 Sep 2022 05:51:43 PM CST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     1/F4000108
Min recovery ending loc's timeline:   1
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              500
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                67108864
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

 

在standby查看wal接收情况

postgres=# \x on;
Expanded display is on.
postgres=#  select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 34474
status                | streaming
receive_start_lsn     | 0/70000000
receive_start_tli     | 2
written_lsn           | 0/70000250
flushed_lsn           | 0/70000250
received_tli          | 2
last_msg_send_time    | 2022-09-22 21:55:55.905271+08
last_msg_receipt_time | 2022-09-22 21:55:55.904179+08
latest_end_lsn        | 0/70000250
latest_end_time       | 2022-09-22 21:55:25.866616+08
slot_name             | pg_primary
sender_host           | 192.168.1.71
sender_port           | 5432
conninfo              | user=repl password=******** channel_binding=disable dbname=replication host=192.168.1.71 port=5432 application_name=host72 fallback_application_name=walreceiver sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

 

primary端查看备库落后主库wal的字节数,可以通过sysbench压力测试并观察下

postgres=# select pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) from pg_stat_replication;
 pg_wal_lsn_diff 
-----------------
               0