postgresql 9.x 的base backup + restore + pitr(point in time recovery)
os: centos6.8
postgresql:9.1.22
base backup
最初使用 tar 花费4个小时,之后改为 pigz 只花费了30分钟。
# vi /backup_scripts/pg_basebackup.sh
#!/bin/bash
DATE=`date +%Y%m%d`
#删除14天前的物理备份
echo "#############################################"
echo "delete expire base backup start...`date` \n"
find /backup/* -mtime +14 -not -wholename "*lost+found*"|xargs -I '{}' rm {}
echo "delete expire base backup end...`date` \n"
#备份PG数据库
echo "#############################################"
echo "base backup start...`date` \n"
su - postgres -c "/usr/bin/psql -c \"select pg_start_backup('$DATE');\""
#tar is a single thread command,too slow
#tar --exclude=/var/lib/postgresql/9.1/main/postmaster.pid --exclude=/var/lib/postgresql/9.1/main/pg_xlog -Pzcf - /var/lib/postgresql/9.1/main/ /pg_tbs/ | split --bytes=10G - /backup/${DATE}_base.tar.gz.
tar --exclude=/var/lib/postgresql/9.1/main/postmaster.pid --exclude=/var/lib/postgresql/9.1/main/pg_xlog -Pcf - /var/lib/postgresql/9.1/main/ /pg_tbs/ | pigz -p 16 | split --bytes=10G - /backup/${DATE}_base.tgz.
su - postgres -c "/usr/bin/psql -c 'select pg_stop_backup();'"
echo "base backup end...`date` \n"
#gpg加密
echo "#############################################"
echo "gpg start...`date` \n"
#并发加密文件,缩短执行时间
for FILE in `ls -1 /backup/${DATE}_base.tgz.*`
do
(gpg -er zlfund -o ${FILE}.gpg ${FILE} )&
done
wait
echo "gpg end...`date` \n"
#rsync异机备份
echo "#############################################"
echo "rsync start...`date` \n"
#同步文件
for GPGFILE in `ls -1 /backup/${DATE}_base.tgz.*.gpg`
do
#拷贝到nas
#rsync -auzP --bwlimit=30.0M ${FILE}.gpg /mnt/nas_dbbackup/basebackup/
rsync -auP --bwlimit=60.0M ${GPGFILE} /mnt/nas_dbbackup/basebackup/
done
echo "rsync end...`date` \n"
#rm删除gpg加密文件
echo "#############################################"
echo "delete gpg start...`date` \n"
#删除加密文件
rm /backup/${DATE}_base.tgz*.gpg
echo "delete gpg end..`date` \n"
restore
restore database
# cd /u02/backup_gpg/
# cp /mnt/nas_dbbackup/basebackup/20180520* /u02/backup_gpg/
# chmod 777 ./*
开8个并发窗口同时解密
gpg -o /u02/backup/20180520_base.tgz.aa -d /u02/backup_gpg/20180520_base.tgz.aa.gpg
gpg -o /u02/backup/20180520_base.tgz.ab -d /u02/backup_gpg/20180520_base.tgz.ab.gpg
gpg -o /u02/backup/20180520_base.tgz.ac -d /u02/backup_gpg/20180520_base.tgz.ac.gpg
gpg -o /u02/backup/20180520_base.tgz.ad -d /u02/backup_gpg/20180520_base.tgz.ad.gpg
gpg -o /u02/backup/20180520_base.tgz.ae -d /u02/backup_gpg/20180520_base.tgz.ae.gpg
gpg -o /u02/backup/20180520_base.tgz.af -d /u02/backup_gpg/20180520_base.tgz.af.gpg
gpg -o /u02/backup/20180520_base.tgz.ag -d /u02/backup_gpg/20180520_base.tgz.ag.gpg
gpg -o /u02/backup/20180520_base.tgz.ah -d /u02/backup_gpg/20180520_base.tgz.ah.gpg
gpg -o /u02/backup/20180520_base.tgz.ai -d /u02/backup_gpg/20180520_base.tgz.ai.gpg
使用 gpg 解密时需要输入解密密码
开始解压
cat /u02/backup/20180520_base.tgz.a* | tar -Pxv -I pigz
结束后就会看到在如下两个路径被填充了文件
/var/lib/postgresql/9.1/main
/pg_tbs
recovery database
# su - postgres
$ cd /var/lib/postgresql/9.1/main
$ mkdir pg_xlog
$ chmod 700 ./archive_status
$ cd pg_xlog
$ mkdir archive_status
$ chmod 700 ./archive_status
配置 recovery.conf,基于wal的 pitr 恢复
$ cd $PGDATA
$ vi recovery.conf
restore_command = 'cp /mnt/nas_dbbackup/archivelog/%f %p'
#archive_cleanup_command = ''
#recovery_end_command = ''
recovery_target_time = '2018-05-24 20:00:00+08'
#recovery_target_xid = ''
recovery_target_inclusive = 'true'
recovery_target_timeline = 'latest'
pause_at_recovery_target = 'true'
standby_mode = on
trigger_file = '/tmp/postgresql.trigger.5432'
编辑启动,关闭脚本
$ vi /u01/pgsql/pgsql9.1_start.sh
pg_ctl start -D /var/lib/postgresql/9.1/main -o "-c config_file=/etc/postgresql/9.1/main/postgresql.conf"
$ vi /u01/pgsql/pgsql9.1_stop.sh
pg_ctl stop -m fast -D /var/lib/postgresql/9.1/main
启动pgsql
$ /u01/pgsql/pgsql9.1_start.sh
观察日志输出,会有类似下面的持续输出
2018-05-24 11:10:09.259 CST,,,5283,,59f2a370.14a3,28,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000053"" from archive",,,,,,,,,""
2018-05-24 11:10:10.792 CST,,,5283,,59f2a370.14a3,29,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000054"" from archive",,,,,,,,,""
2018-05-24 11:10:12.201 CST,,,5283,,59f2a370.14a3,30,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000055"" from archive",,,,,,,,,""
2018-05-24 11:10:13.592 CST,,,5283,,59f2a370.14a3,31,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000056"" from archive",,,,,,,,,""
2018-05-24 11:10:15.144 CST,,,5283,,59f2a370.14a3,32,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000057"" from archive",,,,,,,,,""
2018-05-24 11:10:16.074 CST,,,5283,,59f2a370.14a3,33,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD800000058"" from archive",,,,,,,,,""
达到指定的 recovery_target_time 后,数据库库会暂停住,类似如下日志输出
2018-05-24 11:11:40.809 CST,,,5283,,59f2a370.14a3,120,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"restored log file ""0000000200000CD8000000AF"" from archive",,,,,,,,,""
2018-05-24 11:11:40.961 CST,,,5283,,59f2a370.14a3,121,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"recovery stopping before commit of transaction 1348696203, time 2017-10-23 11:15:00.036417+08",,,,,,,,,""
2018-05-24 11:11:40.961 CST,,,5283,,59f2a370.14a3,122,,2018-05-24 11:09:36 CST,1/1,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,,""
2018-05-24 11:11:40.961 CST,,,5544,,59f2b79c.15a8,1,,2018-05-24 11:09:36 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
这个时候是可以连接上去查询使用,如果需要读写,则需要做如下操作
此时有两种选择
1)touch /tmp/postgresql.trigger.5432
2018-05-24 18:27:28.794 CST,,,6028,,5a8fec90.178c,1,,2018-05-24 18:27:28 CST,,0,LOG,00000,"database system was shut down in recovery at 2018-05-24 18:27:05 CST",,,,,,,,,""
2018-05-24 18:27:28.801 CST,,,6028,,5a8fec90.178c,2,,2018-05-24 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC4000000AB"" (log file 3780, segment 171): No such file or directory",,,,,,,,,""
2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,3,,2018-05-24 18:27:28 CST,,0,LOG,00000,"invalid primary checkpoint record",,,,,,,,,""
2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,4,,2018-05-24 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC400000039"" (log file 3780, segment 57): No such file or directory",,,,,,,,,""
2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,5,,2018-05-24 18:27:28 CST,,0,LOG,00000,"invalid secondary checkpoint record",,,,,,,,,""
2018-05-24 18:27:28.802 CST,,,6028,,5a8fec90.178c,6,,2018-05-24 18:27:28 CST,,0,PANIC,XX000,"could not locate a valid checkpoint record",,,,,,,,,""
2018-05-24 18:27:54.801 CST,,,6026,,5a8fec8f.178a,1,,2018-05-24 18:27:27 CST,,0,LOG,00000,"startup process (PID 6028) was terminated by signal 6: Aborted",,,,,,,,,""
2018-05-24 18:27:54.801 CST,,,6026,,5a8fec8f.178a,2,,2018-05-24 18:27:27 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
再启动时就会提示恢复需要若干个wal日志,可以把需要的wal日志拷贝到 $PGDATA/pg_xlog下。
2)关闭pgsql,然后执行 pg_resetxlog
$ /u01/pgsql/pgsql9.1_stop.sh
$ which pg_resetxlog
$ pg_resetxlog -f /var/lib/postgresql/9.1/main
$ cd /var/lib/postgresql/9.1/main
$ mv recovery.conf recovery.done
$ /u01/pgsql/pgsql9.1_start.sh
官方对pg_resetxlog 的描述还是很谨慎的。
pg_resetxlog会清除预写式日志(WAL)并且有选择地重置存储在 pg_control文件中的一些其他控制信息。如果这些文件已经被损坏, 某些时候就需要这个功能。当服务器由于这样的损坏而无法启动时, 这只应该被用作最后的手段。
在运行这个命令之后,就应该可以启动服务器, 但是记住数据库可能包含由于部分提交事务产生的不一致数据。 你应当立刻转储你的数据、运行initdb并且重新载入。重新载入后, 检查不一致并且根据需要修复之。
这个工具只能被安装服务器的用户运行,因为它要求对数据目录的读写访问。 出于安全原因,你必须在命令行中指定数据目录。pg_resetxlog 不使用环境变量 PGDATA。
如果 pg_resetxlog 抱怨它无法为 pg_control 决定合法数据,你可以通过指定-f(强制)选项强制它继续。 在这种情况下,丢失的数据将被替换为看似合理的值。可以期望大部分域是匹配的, 但是下一个 OID、下一个事务 ID 和纪元、下一个多事务 ID 和偏移以及 WAL 开始地址域可能还是需要人工协助。这些域可以使用下面讨论的选项设置。 如果你不能为所有这些域决定正确的值,-f还是可以被使用, 但是恢复的数据库还是值得怀疑:一次立即的转储和重新载入是势在必行的。 在你转储之前不要在该数据库中执行任何数据修改操作, 因为任何这样的动作都可能使破坏更严重。
$ pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.
Usage:
pg_resetxlog [OPTION]... DATADIR
Options:
-e XIDEPOCH set next transaction ID epoch
-f force update to be done
-l TLI,FILE,SEG force minimum WAL starting location for new transaction log
-m XID set next multitransaction ID
-n no update, just show extracted control values (for testing)
-o OID set next OID
-O OFFSET set next multitransaction offset
-x XID set next transaction ID
--help show this help, then exit
--version output version information, then exit
Report bugs to <pgsql-bugs@postgresql.org>.