PostgreSQL HOT-Standby 的主备切换

这节将介绍下 PostgreSQL HOT-Standby 的主备切换，虽然 PostgreSQL 的主备切换不太方便，没能像 Oracle DataGuard 一样提供切换命令，但是仍然有方法实现这点在官网手册中有指出，但没给出详细的指导步骤。今天在测试过程中遇到不少问题，终于完成本实验，下面是详细过程。

--1 环境信息

PostgreSQL 版本： PostgreSQL 9.2.4
OS ： centos 7.5
硬件环境 : 笔记本上的两台虚拟机
主库 IP : 172.16.1.11
主库 PORT : 5432
备库 IP : 172.16.1.12
备库 PORT : 5432

备注：这节不详细介绍 HOT-Standby 的搭建，关于搭建的内容可以参考之前的BLOG, http://francs3.blog.163.com/blog/static/40576727201108864230/

--2 如何区别主库和备库

有时在论坛上会有人问，如何区分主库和备库，这里提供两种方法。

--2.1 方法一，根据主机进程判断

 [postgres@pg1 pg_root]$ ps -ef | grep "wal"
postgres 17715 17684  0 20:41 ?        00:00:00 postgres: wal writer process                 
postgres 17746 17684  0 20:43 ?        00:00:00 postgres: wal sender process repuser 192.168.1.26(43246) streaming 0/700178A8
postgres 17819 17590  0 21:00 pts/2    00:00:00 grep wal           
 
   备注：这里显示了wal 日志发送进程"wal sender process"，说明是主库。
 
[postgres@pgb pg_xlog]$ ps -ef | grep wal
postgres 29436 29386  0 20:43 ?        00:00:00 postgres: wal receiver process   streaming 0/700178A8
postgres 29460 29289  0 21:00 pts/3    00:00:00 grep wal
 
   备注：这里显示了 wal 日志接收进程 "wal receiver process" ，说明是备库；

--2.2 方法二，根据 pg_controldata 输出

      pg_controldata 输出数据库服务的当前状态，可以根据 "Database cluster state:  " 的信息来判断,
   如果值为 "in production" 说明是主库，如果值为 "in archive recovery" 说明是备库。
 
主库的 pg_controldata 输出     
[postgres@pg1 pg_root]$ pg_controldata
pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5640897481082175487
Database cluster state:               in production     
...
 
备库的 pg_controldata 输出 
[postgres@pgb pg_xlog]$ pg_controldata
pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5640897481082175487
Database cluster state:               in archive recovery
...

--3 recovery.conf 文件介绍

recovery.conf 是一个配置文件，用于主库，备库切换时的参数配置，可以从 $PGHOME/share 目录下复制一份 recovery.conf.sample 到备库 $PGDATA 目录，里面有众多参数，这里只介绍用于切换时的关键参数
standby_mode = '' --标记PG为STANDBY SERVER
primary_conninfo = '' --标识主库信息
trigger_file = '' --标识触发器文件

主备切换

--4.1 创建备库recovery.conf 文件（ On Slave ）

   cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf
 
配置以下参数 
standby_mode = 'on'  --标记PG为STANDBY SERVER
primary_conninfo = 'host=172.16.1.12 port=5432 user=repuser password=repuser '   --标识主库信息
trigger_file = '/home/postgres/data/postgresql.trigger.5432'     --标识触发器文件

重启从库数据库

--4.2 关闭主库(on Primary)

 [postgres@pg1 pg_root]$ pg_ctl stop -m fast -D $PGDATA
waiting for server to shut down....... done
server stopped

--4.3 激活备库到主库状态 ( on slave )

激活备库只要创建一个文件即可，根据备库 recovery.conf 配置文件的参数 trigger_file 值，
创建这个 trigger 文件即可。例如 "touch /home/postgres/data/postgresql.trigger.5432 "

 [postgres@pgb ]$ touch /home/postgres/data/postgresql.trigger.5432
 
过一会儿发现 recovery.conf 文件变成 recovery.done ，说明备库已经激活。
 
[postgres@pgb data ]$ ll
total 176K
-rw------- 1 postgres postgres  168 Aug 24 10:24 backup_label.old
drwx------ 5 postgres postgres 4.0K Aug 15 10:03 base
drwx------ 2 postgres postgres 4.0K Aug 24 20:50 global
drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_clog
-rw------- 1 postgres postgres 4.5K Aug 24 10:39 pg_hba.conf
-rw------- 1 postgres postgres 1.6K Aug 15 10:03 pg_ident.conf
drwx------ 4 postgres postgres 4.0K Aug 15 10:03 pg_multixact
drwx------ 2 postgres postgres 4.0K Aug 24 20:42 pg_notify
drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_serial
drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_stat_tmp
drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_subtrans
drwx------ 2 postgres postgres 4.0K Aug 21 20:21 pg_tblspc
drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_twophase
-rw------- 1 postgres postgres    4 Aug 15 10:03 PG_VERSION
drwx------ 3 postgres postgres 4.0K Aug 24 21:20 pg_xlog
-rw------- 1 postgres postgres  19K Aug 24 10:24 postgresql.conf
-rw------- 1 postgres postgres   51 Aug 24 20:42 postmaster.opts
-rw------- 1 postgres postgres   69 Aug 24 20:42 postmaster.pid
-rw-r--r-- 1 postgres postgres 4.7K Aug 24 20:42 recovery.done

--查看从库CSV日志(正在激活成主库)
2011-08-24 21:20:55.130 CST,,,29388,,4e54f1c5.72cc,11,,2011-08-24 20:42:45 CST,1/0,0,LOG,00000,"selected new timeline ID: 6",,,,,,,,,""
2011-08-24 21:20:58.119 CST,,,29388,,4e54f1c5.72cc,12,,2011-08-24 20:42:45 CST,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""
2011-08-24 21:20:58.495 CST,,,29386,,4e54f1c3.72ca,5,,2011-08-24 20:42:43 CST,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,""

说明从库已经为OPEN状态，可以进行读写操作。

--4.4 激活原来的主库，让其转变成从库 (在原来的主库上执行)

 --创建 $PGDATA/recovery.conf 文件，配置以下参数
recovery_target_timeline = 'latest'
standby_mode = 'on'  --标记PG为STANDBY SERVER
primary_conninfo = 'host=172.16.12 port=5432 user=repuser password=repuser '   --标识主库信息
trigger_file = '/home/postgres/data/postgresql.trigger.5432'     --标识触发器文件

--修改postgresql.conf(从库)原来的主库
max_connections = 从库的值要比主库大

--修改 pg_hba.conf (现在的主库上增加)，添加以下配置
host replication repuser 172.16.1.12/24 md5

--将原来的主库（现在的从库）启动
[postgres@pg1 pg_root]$ pg_ctl start -D $PGDATA
server starting

--查看从库日志，发现大量 FATAL 错误信息
2011-08-24 21:31:59.178 CST,,,17889,,4e54fd4f.45e1,1,,2011-08-24 21:31:59 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
2011-08-24 21:32:04.208 CST,,,17891,,4e54fd54.45e3,1,,2011-08-24 21:32:04 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
2011-08-24 21:32:09.135 CST,,,17892,,4e54fd59.45e4,1,,2011-08-24 21:32:09 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
2011-08-24 21:32:14.136 CST,,,17895,,4e54fd5e.45e7,1,,2011-08-24 21:32:14 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
备注：出现了大量 FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5”
估计是时间线有问题，网上查了下资料也没啥结果，后来咨询了下德哥，只要将从库 $PGDATA/pg_xlog一个文件考过来就行。

 --将主库文件 00000006.history 复制到从库
[postgres@pgb pg_xlog]$ scp 00000006.history postgres@172.16.1.12:/home/postgres/data/pg_xlog
postgres@172.16.1.12's password:
00000006.history

 --再次查看从库日志
2011-08-24 21:36:04.819 CST,,,17948,,4e54fe44.461c,1,,2011-08-24 21:36:04 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
2011-08-24 21:36:09.742 CST,,,17885,,4e54fd44.45dd,5,,2011-08-24 21:31:48 CST,1/0,0,LOG,00000,"new target timeline is 6",,,,,,,,,""
2011-08-24 21:36:09.824 CST,,,17977,,4e54fe49.4639,1,,2011-08-24 21:36:09 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
 
  备注：根据日志信息，说明从库已经恢复正常；

--4.5 测试

 主库上创建一张表 
postgres=# \c skytf skytf
You are now connected to database "skytf" as user "skytf".
skytf=> \d
               List of relations
 Schema |        Name        | Type  |  Owner   
--------+--------------------+-------+----------
 public | pg_stat_statements | view  | postgres
 skytf  | pgbench_accounts   | table | skytf
 skytf  | pgbench_branches   | table | skytf
 skytf  | pgbench_history    | table | skytf
 skytf  | pgbench_tellers    | table | skytf
 skytf  | test_stadnby       | table | skytf
(16 rows)
 
skytf=> create table test_11 (id integer,name varchar(32));
CREATE TABLE
 
skytf=> \d
               List of relations
 Schema |        Name        | Type  |  Owner   
--------+--------------------+-------+----------
 public | pg_stat_statements | view  | postgres
 skytf  | pgbench_accounts   | table | skytf
 skytf  | pgbench_branches   | table | skytf
 skytf  | pgbench_history    | table | skytf
 skytf  | pgbench_tellers    | table | skytf
 skytf  | test_11            | table | skytf
 skytf  | test_stadnby       | table | skytf
(17 rows)
 
从库上查询
[postgres@pgb ]$ psql
psql (9.1beta3)
Type "help" for help.
 
postgres=# \c skytf skytf
skytf=> \d
               List of relations
 Schema |        Name        | Type  |  Owner   
--------+--------------------+-------+----------
 public | pg_stat_statements | view  | postgres
 skytf  | pgbench_accounts   | table | skytf
 skytf  | pgbench_branches   | table | skytf
 skytf  | pgbench_history    | table | skytf
 skytf  | pgbench_tellers    | table | skytf
 skytf  | test_11            | table | skytf
 skytf  | test_stadnby       | table | skytf
 
   备注：可见表 test_11 迅速从主库上同步过来了，到此为止，库切换完成。

--5 总结：

1 Hot-Standby 切换步骤比较多，有些配置可以提前做好的，例如 .pgpass, pg_hba.conf 等;
2 主，备切换时，务必先将主库关闭，否则一旦从库被激活时，而主库尚未关闭，会有问题;
3 主，备切换可作为生产库迁移的一种方式，因为这最大限度减少了业务停机时间。

脚本：

 #!/bin/bash
#killall keepalived
#数据库端口
ABASE_PORT=5432
#数据库用户名
ABASE_USER=arterybase
#数据库数据文件路径
ABASE_DATA_FILE=/home/arterybase/data
if [ "`netstat  -apn | grep postgres | grep ${ABASE_PORT}`" == ""  ]
 then
      killall keepalived
fi
#备节点切换成主节点
if [  "`ps -ef | grep  postgres | grep 'receiver process'`" != ""   ]
then
  #停止数据库
  su - $ABASE_USER -c  " pg_ctl -D ${ABASE_DATA_FILE}  stop  -m fast"
  #修改配置文件 recovery.conf 为 recovery.bak
  mv ${ABASE_FILE}/recovery.conf ${ABASE_FILE}/recovery.bak
 #修改配置文件postgresql.conf  在 hot_standby = on 后追加 hot_standby = off
 sed -i -e "/hot_standby = on/a\hot_standby = off" ${ABASE_DATA_FILE}/pg_hba.conf
su - $ABASE_USER -c  " pg_ctl -D ${ABASE_DATA_FILE}  -l arterybase.log start"
fi

posted @ 2024-10-22 15:22 帅帅啊阅读(178) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· PGSQL流复制

· python基础

· postgresql 14 主备切换

· postgresql 主备或热备

· Docker 中 PostgreSql 主从热备，主从切换方案

阅读排行：
· 无需6万激活码！GitHub神秘组织3小时极速复刻Manus，手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· 葡萄城 AI 搜索升级：DeepSeek 加持，客户体验更智能
· 什么是nginx的强缓存和协商缓存
· 一文读懂知识蒸馏

PostgreSQL HOT-Standby 的主备切换

PostgreSQL HOT-Standby 的主备切换

--1 环境信息

--2 如何区别主库和备库

--2.1 方法一，根据主机进程判断

--2.2 方法二，根据 pg_controldata 输出

--3 recovery.conf 文件介绍

--4.1 创建备库recovery.conf 文件（ On Slave ）

--4.2 关闭主库(on Primary)

--4.3 激活备库到主库状态 ( on slave )

--4.4 激活原来的主库，让其转变成从库 (在原来的主库上执行)

--4.5 测试

--5 总结：

公告

搜索

最新随笔

随笔分类

文章分类

阅读排行榜

评论排行榜

最新评论

	[postgres@pg1 pg_root]$ ps -ef \| grep "wal"
	postgres 17715 17684 0 20:41 ? 00:00:00 postgres: wal writer process
	postgres 17746 17684 0 20:43 ? 00:00:00 postgres: wal sender process repuser 192.168.1.26(43246) streaming 0/700178A8
	postgres 17819 17590 0 21:00 pts/2 00:00:00 grep wal

	备注：这里显示了wal 日志发送进程"wal sender process"，说明是主库。

	[postgres@pgb pg_xlog]$ ps -ef \| grep wal
	postgres 29436 29386 0 20:43 ? 00:00:00 postgres: wal receiver process streaming 0/700178A8
	postgres 29460 29289 0 21:00 pts/3 00:00:00 grep wal

	备注：这里显示了 wal 日志接收进程 "wal receiver process" ，说明是备库；

	pg_controldata 输出数据库服务的当前状态，可以根据 "Database cluster state: " 的信息来判断,
	如果值为 "in production" 说明是主库，如果值为 "in archive recovery" 说明是备库。

	主库的 pg_controldata 输出
	[postgres@pg1 pg_root]$ pg_controldata
	pg_control version number: 903
	Catalog version number: 201105231
	Database system identifier: 5640897481082175487
	Database cluster state: in production
	...

	备库的 pg_controldata 输出
	[postgres@pgb pg_xlog]$ pg_controldata
	pg_control version number: 903
	Catalog version number: 201105231
	Database system identifier: 5640897481082175487
	Database cluster state: in archive recovery
	...

	cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf

	配置以下参数
	standby_mode = 'on' --标记PG为STANDBY SERVER
	primary_conninfo = 'host=172.16.1.12 port=5432 user=repuser password=repuser ' --标识主库信息
	trigger_file = '/home/postgres/data/postgresql.trigger.5432' --标识触发器文件

	[postgres@pg1 pg_root]$ pg_ctl stop -m fast -D $PGDATA
	waiting for server to shut down....... done
	server stopped

	[postgres@pgb ]$ touch /home/postgres/data/postgresql.trigger.5432

	过一会儿发现 recovery.conf 文件变成 recovery.done ，说明备库已经激活。

	[postgres@pgb data ]$ ll
	total 176K
	-rw------- 1 postgres postgres 168 Aug 24 10:24 backup_label.old
	drwx------ 5 postgres postgres 4.0K Aug 15 10:03 base
	drwx------ 2 postgres postgres 4.0K Aug 24 20:50 global
	drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_clog
	-rw------- 1 postgres postgres 4.5K Aug 24 10:39 pg_hba.conf
	-rw------- 1 postgres postgres 1.6K Aug 15 10:03 pg_ident.conf
	drwx------ 4 postgres postgres 4.0K Aug 15 10:03 pg_multixact
	drwx------ 2 postgres postgres 4.0K Aug 24 20:42 pg_notify
	drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_serial
	drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_stat_tmp
	drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_subtrans
	drwx------ 2 postgres postgres 4.0K Aug 21 20:21 pg_tblspc
	drwx------ 2 postgres postgres 4.0K Aug 15 10:03 pg_twophase
	-rw------- 1 postgres postgres 4 Aug 15 10:03 PG_VERSION
	drwx------ 3 postgres postgres 4.0K Aug 24 21:20 pg_xlog
	-rw------- 1 postgres postgres 19K Aug 24 10:24 postgresql.conf
	-rw------- 1 postgres postgres 51 Aug 24 20:42 postmaster.opts
	-rw------- 1 postgres postgres 69 Aug 24 20:42 postmaster.pid
	-rw-r--r-- 1 postgres postgres 4.7K Aug 24 20:42 recovery.done

	--创建 $PGDATA/recovery.conf 文件，配置以下参数
	recovery_target_timeline = 'latest'
	standby_mode = 'on' --标记PG为STANDBY SERVER
	primary_conninfo = 'host=172.16.12 port=5432 user=repuser password=repuser ' --标识主库信息
	trigger_file = '/home/postgres/data/postgresql.trigger.5432' --标识触发器文件

	--将主库文件 00000006.history 复制到从库
	[postgres@pgb pg_xlog]$ scp 00000006.history postgres@172.16.1.12:/home/postgres/data/pg_xlog
	postgres@172.16.1.12's password:
	00000006.history

	--再次查看从库日志
	2011-08-24 21:36:04.819 CST,,,17948,,4e54fe44.461c,1,,2011-08-24 21:36:04 CST,,0,FATAL,XX000,"timeline 6 of the primary does not match recovery target timeline 5",,,,,,,,,""
	2011-08-24 21:36:09.742 CST,,,17885,,4e54fd44.45dd,5,,2011-08-24 21:31:48 CST,1/0,0,LOG,00000,"new target timeline is 6",,,,,,,,,""
	2011-08-24 21:36:09.824 CST,,,17977,,4e54fe49.4639,1,,2011-08-24 21:36:09 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""

	备注：根据日志信息，说明从库已经恢复正常；

	主库上创建一张表
	postgres=# \c skytf skytf
	You are now connected to database "skytf" as user "skytf".
	skytf=> \d
	List of relations
	Schema \| Name \| Type \| Owner
	--------+--------------------+-------+----------
	public \| pg_stat_statements \| view \| postgres
	skytf \| pgbench_accounts \| table \| skytf
	skytf \| pgbench_branches \| table \| skytf
	skytf \| pgbench_history \| table \| skytf
	skytf \| pgbench_tellers \| table \| skytf
	skytf \| test_stadnby \| table \| skytf
	(16 rows)

	skytf=> create table test_11 (id integer,name varchar(32));
	CREATE TABLE

	skytf=> \d
	List of relations
	Schema \| Name \| Type \| Owner
	--------+--------------------+-------+----------
	public \| pg_stat_statements \| view \| postgres
	skytf \| pgbench_accounts \| table \| skytf
	skytf \| pgbench_branches \| table \| skytf
	skytf \| pgbench_history \| table \| skytf
	skytf \| pgbench_tellers \| table \| skytf
	skytf \| test_11 \| table \| skytf
	skytf \| test_stadnby \| table \| skytf
	(17 rows)

	从库上查询
	[postgres@pgb ]$ psql
	psql (9.1beta3)
	Type "help" for help.

	postgres=# \c skytf skytf
	skytf=> \d
	List of relations
	Schema \| Name \| Type \| Owner
	--------+--------------------+-------+----------
	public \| pg_stat_statements \| view \| postgres
	skytf \| pgbench_accounts \| table \| skytf
	skytf \| pgbench_branches \| table \| skytf
	skytf \| pgbench_history \| table \| skytf
	skytf \| pgbench_tellers \| table \| skytf
	skytf \| test_11 \| table \| skytf
	skytf \| test_stadnby \| table \| skytf

	备注：可见表 test_11 迅速从主库上同步过来了，到此为止，库切换完成。

	#!/bin/bash
	#killall keepalived
	#数据库端口
	ABASE_PORT=5432
	#数据库用户名
	ABASE_USER=arterybase
	#数据库数据文件路径
	ABASE_DATA_FILE=/home/arterybase/data
	if [ "`netstat -apn \| grep postgres \| grep ${ABASE_PORT}`" == "" ]
	then
	killall keepalived
	fi
	#备节点切换成主节点
	if [ "`ps -ef \| grep postgres \| grep 'receiver process'`" != "" ]
	then
	#停止数据库
	su - $ABASE_USER -c " pg_ctl -D ${ABASE_DATA_FILE} stop -m fast"
	#修改配置文件 recovery.conf 为 recovery.bak
	mv ${ABASE_FILE}/recovery.conf ${ABASE_FILE}/recovery.bak
	#修改配置文件postgresql.conf 在 hot_standby = on 后追加 hot_standby = off
	sed -i -e "/hot_standby = on/a\hot_standby = off" ${ABASE_DATA_FILE}/pg_hba.conf
	su - $ABASE_USER -c " pg_ctl -D ${ABASE_DATA_FILE} -l arterybase.log start"
	fi