LOG: invalid primary checkpoint record/PANIC: could not locate a valid checkpoint record恢复方法
正常情况下lightdb/postgresql进程被杀或--mode=immediate模式停止的时候,是会进行正常恢复的。如下:
2022-04-23 08:27:28.085797C [unknown] zjh@postgres ::1(51378) walsender 57P03[2022-04-23 08:27:28 UTC] 0 [82739] FATAL: the database system is in recovery mode 2022-04-23 08:27:58.093597C [unknown] zjh@postgres ::1(52034) walsender 57P03[2022-04-23 08:27:58 UTC] 0 [82851] FATAL: the database system is in recovery mode 2022-04-23 08:28:28.098248C [unknown] zjh@postgres ::1(52690) walsender 57P03[2022-04-23 08:28:28 UTC] 0 [82974] FATAL: the database system is in recovery mode 2022-04-23 08:28:58.106009C [unknown] zjh@postgres ::1(53290) walsender 57P03[2022-04-23 08:28:58 UTC] 0 [83085] FATAL: the database system is in recovery mode 2022-04-23 08:29:28.112561C [unknown] zjh@postgres ::1(53948) walsender 57P03[2022-04-23 08:29:28 UTC] 0 [83212] FATAL: the database system is in recovery mode 2022-04-23 08:29:53.571639C @ postmaster 00000[2022-04-22 05:48:38 UTC] 0 [207357] LOG: all server processes terminated; reinitializing 2022-04-23 08:29:54.990799C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: database system was interrupted; last known up at 2022-04-22 12:23:26 UTC 2022-04-23 08:29:55.023361C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: database system was not properly shut down; automatic recovery in progress 2022-04-23 08:29:55.023954C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: redo starts at 19/7F15410 2022-04-23 08:29:55.023972C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: invalid record length at 19/7F15448: wanted 24, got 0 2022-04-23 08:29:55.023978C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: redo done at 19/7F15410 2022-04-23 08:29:55.024782C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: checkpoint starting: end-of-recovery immediate 2022-04-23 08:29:55.062822C @ startup 00000[2022-04-23 08:29:54 UTC] 0 [83315] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037 s, sync=0.001 s, total=0.039 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB 2022-04-23 08:29:55.067348C @ postmaster 00000[2022-04-22 05:48:38 UTC] 0 [207357] LOG: database system is ready to accept connections
但是某些情况下,异常终止后,可能会导致主检查点记录无效(断电、坏道都有可能),例如:
2022-04-19 19:05:26.104431T [unknown] lightdb@repmgr 127.0.0.1(39248) client backend 57P03[2022-04-19 19:05:26 CST] 0 [45004] FATAL: the database system is shutting down
2022-04-19 19:05:36.105421T [unknown] lightdb@repmgr 127.0.0.1(39250) client backend 57P03[2022-04-19 19:05:36 CST] 0 [45036] FATAL: the database system is shutting down
2022-04-19 19:07:59.027435T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: server process (PID 90426) was terminated by signal 9: 已杀死
2022-04-19 19:07:59.027435T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] DETAIL: Failed process was running: select pg_backend_pid();
2022-04-19 19:07:59.027818T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: terminating any other active server processes
2022-04-19 19:07:59.030308T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: archiver process (PID 47644) exited with exit code 2
2022-04-19 19:07:59.121224T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: abnormal database system shutdown
2022-04-19 19:07:59.212224T @ postmaster 00000[2022-04-18 11:52:58 CST] 0 [47554] LOG: database system is shut down
[lightdb@hs-10-19-36-28 log]$ tail -fn 100 lightdb-2022-04-19_190953.log
2022-04-19 19:09:53.964212T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: starting LightDB 13.3-22.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (NeoKylin 4.8.5-36), 64-bit
2022-04-19 19:09:53.964485T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-04-19 19:09:53.964500T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on IPv6 address "::", port 5432
2022-04-19 19:09:53.965481T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: listening on Unix socket "/tmp/.s.LIGHTDB.5432"
2022-04-19 19:09:53.999301T @ startup 00000[2022-04-19 19:09:53 CST] 0 [45593] LOG: database system was interrupted; last known up at 2022-04-19 18:19:48 CST
2022-04-19 19:09:55.838765T @ startup 00000[2022-04-19 19:09:53 CST] 0 [45593] LOG: invalid primary checkpoint record
2022-04-19 19:09:55.838796T @ startup XX000[2022-04-19 19:09:53 CST] 0 [45593] PANIC: could not locate a valid checkpoint record
2022-04-19 19:09:55.873920T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: startup process (PID 45593) was terminated by signal 6: Aborted
2022-04-19 19:09:55.873956T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: aborting startup due to startup process failure
2022-04-19 19:09:55.879035T @ postmaster 00000[2022-04-19 19:09:52 CST] 0 [45587] LOG: database system is shut down
此时可以通过lt_resetwal -f $PGDATA强制清理wal,然后启动,虽然这样会丢失部分数据(在业务量较大的系统中,总是具有高可用环境或至少使用lt_probackup增量备份到其它服务器是非常重要的)。
https://postgrespro.com/list/thread-id/2559322
https://cdmana.com/2022/03/202203241421490061.html
https://github.com/pgbackrest/pgbackrest/issues/839
https://dockerquestions.com/2020/11/15/docker-postgres-database-invalid-primary-checkpoint-record/
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
2019-04-23 MapReduce编程模型简介和总结