postgresql13源码安装部署并部署主从同步和切换+pg_rewind

环境:
Os:Centos 7
DB:13.8
主库:192.168.1.134
从库:192.168.1.135

 

########################################主从安装PG############################################

分别在主从库上安装pg,安装连接如下:

https://www.cnblogs.com/hxlasky/p/16804949.html

  

########################################主从部署############################################

1.主库创建流复制的用户
postgres=# CREATE ROLE replica login replication encrypted password 'replica';

 

2.主库修改pg_hba.conf文件,允许备库IP通过复制用户访问数据库

vi /opt/pg13/data/pg_hba.conf
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     replica         192.168.1.0/24          md5 ## 新增的,我这里整个网段开放

 

或是具体指定ip

 

# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     replica         192.168.1.135/32        md5 ## 具体指定ip

 

 

 

 

需要重新reload,否则报错连接不了
[postgres@host134 ~]$ pg_ctl -D /opt/pg13/data reload

 

3.停掉从库
su - postgres
pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop

 

4.从库准备data目录
从库安装完成后,不初始化,若已经初始化,删除其data目录
若之前安装的pg有data目录的话需要将其删除掉,并创建一个空的相同的目录
su - postgres
[postgres@host135 ~]$ cd /opt/pg13
[postgres@host135 pg13]$ mv data bakdata
[postgres@host135 pg13]$ mkdir data

 

创建归档目录,保持与主库一致

[postgres@host135 pg13]$mkdir -p /opt/pg13/archivelog


注意权限要正确,不对的话需要进行修改,root用户下修改权限
[root@host135 ~]# chown -R postgres:postgres /opt/pg13
[root@host135 ~]# chmod 0700 /opt/pg13/data

 

5.备库上执行对于主库的基础备份
[postgres@host135 pg13]$pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
注意,备份选项上带有-R选项.

[postgres@host135 pg13]$ pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
Password:
pg_basebackup: error: FATAL: no pg_hba.conf entry for replication connection from host "192.168.1.135", user "replica", SSL off

原因是主库修改了pg_hba.conf,没有reload,执行如下reload即可
pg_ctl -D /opt/pg13/data reload

[postgres@host135 pg13]$ pg_basebackup -h 192.168.1.134 -p 5432 -U replica --password -X stream -Fp --progress -D /opt/pg13/data -R
Password:
32247/32247 kB (100%), 1/1 tablespace

 

执行了pg_basebackup命令,从库会把主库的 postgresql.conf,pg_hba.conf文件也拷贝过来了的
现在这两个文件的内容主从库是一致的.

若是在归档模式下的话,需要从库创建同样的归档目录 

 

6.备库就可以执行pg_ctl start启动了
这时,就可以看到备库服务器上自动生成了standby.signal文件,同时,也看到在$PGDATA路径下,数据库自动帮我们配置了关于流复制的主库的信息:

复制代码
[postgres@host135 data]$ ls -1
backup_label
backup_manifest
base
current_logfiles
global
log
pg_commit_ts
pg_dynshmem
pg_hba.conf
pg_ident.conf
pg_logical
pg_multixact
pg_notify
pg_replslot
pg_serial
pg_snapshots
pg_stat
pg_stat_tmp
pg_subtrans
pg_tblspc
pg_twophase
PG_VERSION
pg_wal
pg_xact
postgresql.auto.conf
postgresql.conf
standby.signal
复制代码

 

也看到在$PGDATA路径下,数据库会复制主库的pg_hba.conf,postgresql.conf这两个文件到从库,这个时候主从库配置文件保持了一致,若需要修改的,也可以修改,比如端口号.

同时postgresql.auto.conf,数据库自动帮我们配置了关于流复制的主库的信息
[postgres@host135 data]$ more postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replica password=replica channel_binding=disable host=192.168.1.134 port=5432 sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'

当然了,如果我们没有使用-R来备份主库的话.我们完全可以在备库上手工创建standby.signal文件,然后手工编辑postgresql.conf(不是postgresql.auto.conf文件),并在其内容中配置主库的信息.

 

7.启动从库
pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

报错:
2022-10-19 10:16:25 CST [32043]: [1-1] user=,db=,app=,client=LOG: redirecting log output to logging collector process
2022-10-19 10:16:25 CST [32043]: [2-1] user=,db=,app=,client=HINT: Future log output will appear in directory "/opt/pg13/log".
2022-10-19 10:57:31 CST [3551]: [1-1] user=,db=,app=,client=FATAL: data directory "/opt/pg13/data" has invalid permissions
2022-10-19 10:57:31 CST [3551]: [2-1] user=,db=,app=,client=DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

解决办法:
root用户下修改权限
[root@host135 ~]# chown -R postgres:postgres /opt/pg13
[root@host135 ~]# chmod 0700 /opt/pg13/data

 

8.主库查看数据库复制信息

复制代码
[postgres@host134 data]$ psql -xc "select * from pg_stat_replication"
-[ RECORD 1 ]----+------------------------------
pid              | 21407
usesysid         | 16397
usename          | replica
application_name | walreceiver
client_addr      | 192.168.1.135
client_hostname  | 
client_port      | 50736
backend_start    | 2022-10-19 10:59:43.465187+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/1B000148
write_lsn        | 0/1B000148
flush_lsn        | 0/1B000148
replay_lsn       | 0/1B000148
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2022-10-19 11:02:13.760907+08
复制代码

 

9.进程查看
从库进程

复制代码
[postgres@host135 data]$ ps -ef|grep postgres
postgres  3815     1  0 10:59 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data
postgres  3816  3815  0 10:59 ?        00:00:00 postgres: logger 
postgres  3817  3815  0 10:59 ?        00:00:00 postgres: startup recovering 00000001000000000000001B
postgres  3818  3815  0 10:59 ?        00:00:00 postgres: checkpointer 
postgres  3819  3815  0 10:59 ?        00:00:00 postgres: background writer 
postgres  3820  3815  0 10:59 ?        00:00:00 postgres: stats collector 
postgres  3821  3815  0 10:59 ?        00:00:00 postgres: walreceiver streaming 0/1B000148
postgres  3864 26618  0 11:00 pts/1    00:00:00 ps -ef
postgres  3865 26618  0 11:00 pts/1    00:00:00 grep --color=auto postgres
root     26617 25114  0 09:26 pts/1    00:00:00 su - postgres
postgres 26618 26617  0 09:26 pts/1    00:00:00 -bash
复制代码

主库进程

复制代码
[postgres@host134 data]$ ps -ef|grep postgres
postgres 11073     1  0 Oct18 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data
postgres 11074 11073  0 Oct18 ?        00:00:00 postgres: logger 
postgres 11077 11073  0 Oct18 ?        00:00:00 postgres: checkpointer 
postgres 11078 11073  0 Oct18 ?        00:00:00 postgres: background writer 
postgres 11079 11073  0 Oct18 ?        00:00:00 postgres: walwriter 
postgres 11080 11073  0 Oct18 ?        00:00:00 postgres: autovacuum launcher 
postgres 11081 11073  0 Oct18 ?        00:00:00 postgres: archiver last was 00000001000000000000001A.00000028.backup
postgres 11082 11073  0 Oct18 ?        00:00:01 postgres: stats collector 
postgres 11083 11073  0 Oct18 ?        00:00:00 postgres: logical replication launcher 
postgres 11294 11073  0 Oct18 ?        00:00:00 postgres: postgres postgres 192.168.1.134(40882) idle
postgres 21407 11073  0 10:59 ?        00:00:00 postgres: walsender replica 192.168.1.135(50736) streaming 0/1B000148
复制代码

 

 

主库
[postgres@host134 20221021]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in production

 

 

备库
[postgres@host135 bin]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in archive recovery

 

 

10.数据验证

登录从库

复制代码
[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres
Password for user postgres: 
psql (13.8)
Type "help" for help.

postgres=# \c db_test;
You are now connected to database "db_test" as user "postgres".
db_test=# select * from tb_test;
 id | name  |         createtime         |         modifytime         
----+-------+----------------------------+----------------------------
  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901
  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863
  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182
  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843
  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502
(5 rows)
复制代码

 

主库写入:

复制代码
[postgres@host134 data]$ psql -h 192.168.1.134 -U postgres 
Password for user postgres: 
psql (13.8)
Type "help" for help.

postgres=# \c db_test;
You are now connected to database "db_test" as user "postgres".
db_test=# select * from tb_test;
 id | name  |         createtime         |         modifytime         
----+-------+----------------------------+----------------------------
  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901
  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863
  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182
  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843
  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502
(5 rows)

db_test=# insert into tb_test(name) values('name6');
INSERT 0 1
复制代码

 

从库查询:

复制代码
[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres
Password for user postgres: 
psql (13.8)
Type "help" for help.

postgres=# \c db_test;
You are now connected to database "db_test" as user "postgres".
db_test=# select * from tb_test;
 id | name  |         createtime         |         modifytime         
----+-------+----------------------------+----------------------------
  1 | name1 | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901
  2 | name2 | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863
  3 | name3 | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182
  4 | name4 | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843
  5 | name5 | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502
  6 | name6 | 2022-10-19 11:04:56.543939 | 2022-10-19 11:04:56.543939
(6 rows)
复制代码

 

尝试从库写入数据
db_test=# insert into tb_test(name) values('name7');
ERROR: cannot execute INSERT in a read-only transaction

从库尝试归档
db_test=# select pg_switch_wal();
ERROR: recovery is in progress
HINT: WAL control functions cannot be executed during recovery.

 

 

#############################################主从切换###########################################

 

1.主库停止,模拟故障
192.168.1.134上执行
##查看状态
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data status
pg_ctl: server is running (PID: 24009)
/opt/pg13/bin/postgres "-D" "/opt/pg13/data"

 

[postgres@host134 data]$ pg_controldata /opt/pg13/data/| grep 'Database cluster state'
Database cluster state: in production

 

##停止数据库
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop -m fast
waiting for server to shut down.... done
server stopped

 

2.备库提升为新主库,对外提供服务
在备库192.168.1.135上执行
[postgres@host135 data]$ pg_ctl promote -D /opt/pg13/data
waiting for server to promote.... done
server promoted


重要1:启动备库为新主库的命令是pg_ctl promote。
提升备库为主库之后,可以看到,后台进程中不再有startup recovering,以及walreceiver streaming进程了.
同时,多了postgres: walwriter 写进程.

重要2:$PGDATA/standby.signal文件自动消失了. 这是告诉PostgreSQL,我现在不再是备库了,我的身份是主库了.

 

 

3.新主库删除primary_conninfo条目
192.168.1.135上操作

 

这里将之前主从同步的信息删除掉,postgresql.auto.conf文件中的 primary_conninfo

复制代码
[postgres@host135 data]$ psql -h 192.168.1.135 -U postgres -p 5432
Password for user postgres: 
psql (13.8)
Type "help" for help.
postgres=# show primary_conninfo;
                      primary_conninfo                      
------------------------------------------------------------
 user=replica password=replica host=192.168.1.135 port=5432
(1 row)

postgres=# alter system set primary_conninfo='';
ALTER SYSTEM
或者 alter system
set primary_conninfo=default; ##postgresql.auto.conf会删除条目,若postgresql.conf中定义了该参数,重启后会读取该文件的参数 重新 reload
[postgres@host135 data]$ pg_ctl
-D /opt/pg13/data reload [postgres@host135 data]$ psql -h 192.168.1.135 -U postgres -p 5432 postgres=# show primary_conninfo; primary_conninfo ------------------ (1 row)
复制代码

 

4.在新主库写入数据
192.168.1.135上执行

复制代码
[postgres@host135 data]$ psql -h 192.168.1.135 -U hxl -d db_test -p 5432

insert into tb_test(name) values('name9');
insert into tb_test(name) values('name10');
insert into tb_test(name) values('name11');
insert into tb_test(name) values('name12');
insert into tb_test(name) values('name13');
insert into tb_test(name) values('name14');
insert into tb_test(name) values('name15');
insert into tb_test(name) values('name16');
insert into tb_test(name) values('name17');
insert into tb_test(name) values('name18');
insert into tb_test(name) values('name19');
insert into tb_test(name) values('name20');


db_test=> select * from tb_test;
 id |  name  |         createtime         |         modifytime         
----+--------+----------------------------+----------------------------
  1 | name1  | 2022-10-18 11:32:33.649901 | 2022-10-18 11:32:33.649901
  2 | name2  | 2022-10-18 11:32:33.665863 | 2022-10-18 11:32:33.665863
  3 | name3  | 2022-10-18 11:32:33.691182 | 2022-10-18 11:32:33.691182
  4 | name4  | 2022-10-18 11:32:33.771843 | 2022-10-18 11:32:33.771843
  5 | name5  | 2022-10-18 11:32:34.496502 | 2022-10-18 11:32:34.496502
  6 | name6  | 2022-10-19 11:04:56.543939 | 2022-10-19 11:04:56.543939
  7 | name7  | 2022-10-19 11:25:52.236651 | 2022-10-19 11:25:52.236651
  8 | name8  | 2022-10-20 09:21:51.977815 | 2022-10-20 09:21:51.977815
 41 | name9  | 2022-10-20 14:22:26.326255 | 2022-10-20 14:22:26.326255
 42 | name10 | 2022-10-20 14:22:26.34316  | 2022-10-20 14:22:26.34316
 43 | name11 | 2022-10-20 14:22:26.359988 | 2022-10-20 14:22:26.359988
 44 | name12 | 2022-10-20 14:22:26.433694 | 2022-10-20 14:22:26.433694
 45 | name13 | 2022-10-20 14:22:26.451945 | 2022-10-20 14:22:26.451945
 46 | name14 | 2022-10-20 14:22:26.469966 | 2022-10-20 14:22:26.469966
 47 | name15 | 2022-10-20 14:22:26.482091 | 2022-10-20 14:22:26.482091
 48 | name16 | 2022-10-20 14:22:26.498319 | 2022-10-20 14:22:26.498319
 49 | name17 | 2022-10-20 14:22:26.524554 | 2022-10-20 14:22:26.524554
 50 | name18 | 2022-10-20 14:22:26.555449 | 2022-10-20 14:22:26.555449
 51 | name19 | 2022-10-20 14:22:26.591774 | 2022-10-20 14:22:26.591774
 52 | name20 | 2022-10-20 14:22:27.587955 | 2022-10-20 14:22:27.587955
复制代码

 

5.新主库修改pg_hba.conf文件
192.168.1.135上操作
修改新主库(原备库192.168.1.135)的$PGDATA/pg_hba.conf文件,在其中添加允许新备库(原主库192.168.1.134)可以通过replica用户访问数据库的条目信息。

vi /opt/pg13/data/pg_hba.conf

host replication all 192.168.1.134/32 md5

若之前就是以网段的方式开通的话,可以不需要修改,如下:
host replication replica 192.168.1.0/24 md5

修改了pg_hba.conf文件不需要重新启动,重新加载即可
[postgres@host135 data]$ pg_ctl -D /opt/pg13/data reload
server signaled

 

6.原主库新建$PGDATA/standby.signal文件
192.168.1.134上操作
[postgres@host134 data]$ cd /opt/pg13/data
[postgres@host134 data]$ touch standby.signal

[postgres@host134 data]$ pwd
/opt/pg13/data
[postgres@host134 data]$ ll standby.signal
-rw-rw-r-- 1 postgres postgres 0 Oct 20 14:27 standby.signal

注意:这一步骤非常非常重要,如果不配置该文件的话,那么原来的主库一旦重新启动话,就将成为了1个新的独立主库,脱离了主从数据库环境

 

 

 

7.原主库修改$PGDATA/postgresql.conf文件,添加复制条目
192.168.1.134上操作
[postgres@host134 data]$ vi postgresql.conf
添加如下项:
primary_conninfo='user=replica password=replica host=192.168.1.135 port=5432'

 

8.启动原主库,变为新备库
192.168.1.134上操作

复制代码
[postgres@host134 data]$pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

[postgres@host134 data]$ ps -ef|grep postgres
postgres  6975     1  2 15:34 ?        00:00:00 /opt/pg13/bin/postgres -D /opt/pg13/data
postgres  6976  6975  0 15:34 ?        00:00:00 postgres: logger 
postgres  6977  6975  0 15:34 ?        00:00:00 postgres: startup recovering 000000010000000000000007
postgres  6979  6975  0 15:34 ?        00:00:00 postgres: checkpointer 
postgres  6980  6975  0 15:34 ?        00:00:00 postgres: background writer 
postgres  6981  6975  0 15:34 ?        00:00:00 postgres: stats collector 
postgres  6982  6975  0 15:34 ?        00:00:00 postgres: walreceiver idle
复制代码

发现这里进程是:walreceiver idle,说明没有原来主库无法加入作为备库加入集群,看错误日志:

[postgres@host134 log]$ pwd
/opt/pg13/log
[postgres@host134 log]$ tail -2f postgresql-2022-10-21.log
2022-10-21 15:36:39 CST [6982]: [25-1] user=,db=,app=,client=LOG:  primary server contains no more WAL on requested timeline 1
2022-10-21 15:36:39 CST [6977]: [28-1] user=,db=,app=,client=LOG:  new timeline 2 forked off current database system timeline 1 before current recovery point 0/70000A0

 

解决办法:

复制代码
[postgres@host134 pg13]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log stop -m fast
waiting for server to shut down.... done
server stopped

[postgres@host134 pg13]$ pg_rewind -D /opt/pg13/data --source-server='host=192.168.1.135 port=5432 user=postgres dbname=postgres password=postgres'
pg_rewind: servers diverged at WAL location 0/7000000 on timeline 1
pg_rewind: error: could not open file "/opt/pg13/data/pg_wal/000000010000000000000006": No such file or directory
pg_rewind: fatal: could not find previous WAL record at 0/6000410
复制代码

这里提示wal日志不存在000000010000000000000006,将不存在的归档文件拷贝到wal目录,若还是提示wal日志文件不存在需要继续拷贝到wal目录

复制代码
[postgres@host134 20221021]$ pwd
/opt/pg13/archivelog/20221021

[postgres@host135 20221021]$ cp 000000010000000000000006 /opt/pg13/data/pg_wal/

[postgres@host134 20221021]$ pg_rewind -D /opt/pg13/data --source-server='host=192.168.1.135 port=5432 user=postgres dbname=postgres password=postgres'
pg_rewind: servers diverged at WAL location 0/7000000 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/5000060 on timeline 1
pg_rewind: Done!
复制代码

使用了 pg_rewind 后,系统会把主库的postgresql.auto.conf和postgresql.conf文件都拷贝过来了,这个时候需要重新修改postgresql.conf文件中的primary_conninfo,其他的参数看情况修改

 

9.原主库修改$PGDATA/postgresql.conf文件
192.168.1.134上操作

pg_rewind后添加,若没有pg_remind操作,上面的步骤7已结添加了条目,该步骤可以省略
[postgres@host134 data]$ vi postgresql.conf
添加如下项:
primary_conninfo='user=replica password=replica host=192.168.1.135 port=5432'

 

10.重新生成standby.signal文件
pg_rewind后没有了该文件standby.signal,需要重新生成
[postgres@host134 data]$ cd /opt/pg13/data
[postgres@host134 data]$ touch standby.signal

 

11.重启动新备库
[postgres@host134 data]$ pg_ctl -D /opt/pg13/data -l /opt/pg13/log/postgres.log start

 

12.数据验证
新从库
psql -h 192.168.1.134 -U hxl -d db_test -p 5432

新主库
psql -h 192.168.1.135 -U hxl -d db_test -p 5432

 

posted @   slnngk  阅读(730)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2020-10-20 linux客户端连接oracle服务器
点击右上角即可分享
微信分享提示