postgresql 9.1 使用 pg_resetxlog 处理 invalid primary checkpoint record

postgresql 9.1 做异机 pitr 时,可以read only恢复到指定的时间点

“recovery stopping before commit of transaction 1413611332, time 2018-02-23 17:00:00.320187+08”
“recovery has paused”,,”Execute pg_xlog_replay_resume() to continue.”

2018-02-23 18:17:47.766 CST,,,3945,,5a8fe562.f69,1766,,2018-02-23 17:56:50 CST,1/1,0,LOG,00000,"restored log file ""0000000400000EC4000000AF"" from archive",,,,,,,,,""
2018-02-23 18:17:48.312 CST,,,3945,,5a8fe562.f69,1767,,2018-02-23 17:56:50 CST,1/1,0,LOG,00000,"restored log file ""0000000400000EC4000000B0"" from archive",,,,,,,,,""
2018-02-23 18:17:48.756 CST,,,3945,,5a8fe562.f69,1768,,2018-02-23 17:56:50 CST,1/1,0,LOG,00000,"restored log file ""0000000400000EC4000000B1"" from archive",,,,,,,,,""
2018-02-23 18:17:49.455 CST,,,3945,,5a8fe562.f69,1769,,2018-02-23 17:56:50 CST,1/1,0,LOG,00000,"recovery stopping before commit of transaction 1413611332, time 2018-02-23 17:00:00.320187+08",,,,,,,,,""
2018-02-23 18:17:49.455 CST,,,3945,,5a8fe562.f69,1770,,2018-02-23 17:56:50 CST,1/1,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,,""

但是当移除 recovery.conf ,以 read write 打开数据库是碰到了错误。

invalid primary checkpoint record
invalid secondary checkpoint record
could not locate a valid checkpoint record

2018-02-23 18:27:28.794 CST,,,6028,,5a8fec90.178c,1,,2018-02-23 18:27:28 CST,,0,LOG,00000,"database system was shut down in recovery at 2018-02-23 18:27:05 CST",,,,,,,,,""
2018-02-23 18:27:28.801 CST,,,6028,,5a8fec90.178c,2,,2018-02-23 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC4000000AB"" (log file 3780, segment 171): No such file or directory",,,,,,,,,""
2018-02-23 18:27:28.802 CST,,,6028,,5a8fec90.178c,3,,2018-02-23 18:27:28 CST,,0,LOG,00000,"invalid primary checkpoint record",,,,,,,,,""
2018-02-23 18:27:28.802 CST,,,6028,,5a8fec90.178c,4,,2018-02-23 18:27:28 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC400000039"" (log file 3780, segment 57): No such file or directory",,,,,,,,,""
2018-02-23 18:27:28.802 CST,,,6028,,5a8fec90.178c,5,,2018-02-23 18:27:28 CST,,0,LOG,00000,"invalid secondary checkpoint record",,,,,,,,,""
2018-02-23 18:27:28.802 CST,,,6028,,5a8fec90.178c,6,,2018-02-23 18:27:28 CST,,0,PANIC,XX000,"could not locate a valid checkpoint record",,,,,,,,,""
2018-02-23 18:27:54.801 CST,,,6026,,5a8fec8f.178a,1,,2018-02-23 18:27:27 CST,,0,LOG,00000,"startup process (PID 6028) was terminated by signal 6: Aborted",,,,,,,,,""
2018-02-23 18:27:54.801 CST,,,6026,,5a8fec8f.178a,2,,2018-02-23 18:27:27 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
2018-02-23 18:30:32.787 CST,,,6223,,5a8fed43.184f,2,,2018-02-23 18:30:27 CST,,0,LOG,00000,"database system was not properly shut down; automatic recovery in progress",,,,,,,,,""
2018-02-23 18:30:32.801 CST,,,6223,,5a8fed43.184f,3,,2018-02-23 18:30:27 CST,,0,LOG,00000,"redo starts at EC4/A8A9E1C0",,,,,,,,,""
2018-02-23 18:30:33.053 CST,,,6223,,5a8fed43.184f,4,,2018-02-23 18:30:27 CST,,0,LOG,58P01,"could not open file ""pg_xlog/0000000400000EC4000000A9"" (log file 3780, segment 169): No such file or directory",,,,,,,,,""
2018-02-23 18:30:33.054 CST,,,6223,,5a8fed43.184f,5,,2018-02-23 18:30:27 CST,,0,LOG,00000,"redo done at EC4/A8FFF7E0",,,,,,,,,""
2018-02-23 18:30:33.054 CST,,,6223,,5a8fed43.184f,6,,2018-02-23 18:30:27 CST,,0,LOG,00000,"last completed transaction was at log time 2018-02-23 16:48:59.558324+08",,,,,,,,,""
2018-02-23 18:30:33.090 CST,,,6223,,5a8fed43.184f,7,,2018-02-23 18:30:27 CST,,0,FATAL,XX000,"WAL ends before consistent recovery point",,,,,,,,,""
2018-02-23 18:30:33.092 CST,,,6221,,5a8fed43.184d,1,,2018-02-23 18:30:27 CST,,0,LOG,00000,"startup process (PID 6223) exited with exit code 1",,,,,,,,,""
2018-02-23 18:30:33.092 CST,,,6221,,5a8fed43.184d,2,,2018-02-23 18:30:27 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""

两种解决方法
1、源库如果开启了archive模式,手动拷贝相应的 wal日志
最后以read write打开数据库后,恢复后的时间点会比 recovery_target_time 期望的时间要晚。

2、使用 pg_resetxlog 重置

$ pg_resetxlog -f /var/lib/postgresql/9.1/main
$ mv recovery.conf recovery.done

官方对pg_resetxlog 的描述还是很谨慎的。

pg_resetxlog会清除预写式日志(WAL)并且有选择地重置存储在 pg_control文件中的一些其他控制信息。如果这些文件已经被损坏, 某些时候就需要这个功能。当服务器由于这样的损坏而无法启动时, 这只应该被用作最后的手段。

在运行这个命令之后,就应该可以启动服务器, 但是记住数据库可能包含由于部分提交事务产生的不一致数据。 你应当立刻转储你的数据、运行initdb并且重新载入。重新载入后, 检查不一致并且根据需要修复之。

这个工具只能被安装服务器的用户运行,因为它要求对数据目录的读写访问。 出于安全原因,你必须在命令行中指定数据目录。pg_resetxlog 不使用环境变量 PGDATA。

如果 pg_resetxlog 抱怨它无法为 pg_control 决定合法数据,你可以通过指定-f(强制)选项强制它继续。 在这种情况下,丢失的数据将被替换为看似合理的值。可以期望大部分域是匹配的, 但是下一个 OID、下一个事务 ID 和纪元、下一个多事务 ID 和偏移以及 WAL 开始地址域可能还是需要人工协助。这些域可以使用下面讨论的选项设置。 如果你不能为所有这些域决定正确的值,-f还是可以被使用, 但是恢复的数据库还是值得怀疑:一次立即的转储和重新载入是势在必行的。 在你转储之前不要在该数据库中执行任何数据修改操作, 因为任何这样的动作都可能使破坏更严重。

$ pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.

Usage:
  pg_resetxlog [OPTION]... DATADIR

Options:
  -e XIDEPOCH     set next transaction ID epoch
  -f              force update to be done
  -l TLI,FILE,SEG force minimum WAL starting location for new transaction log
  -m XID          set next multitransaction ID
  -n              no update, just show extracted control values (for testing)
  -o OID          set next OID
  -O OFFSET       set next multitransaction offset
  -x XID          set next transaction ID
  --help          show this help, then exit
  --version       output version information, then exit

Report bugs to <pgsql-bugs@postgresql.org>.

参考
http://postgres.cn/docs/9.6/app-pgresetxlog.html
https://www.postgresql.org/message-id/20060628110623.59543.qmail@web25607.mail.ukl.yahoo.com

posted @ 2018-02-24 10:49  peiybpeiyb  阅读(385)  评论(0编辑  收藏  举报