PostgreSQL 扩展之 WalMiner
1、简介
WalMiner是从PostgreSQL的WAL(write ahead logs)日志的解析工具,旨在挖掘wal日志所有的有用信息,从而提供PG的数据恢复支持,3.0支持PostgreSQL10及其以上版本(pg14.2版本暂不支持,已与作者沟通)。
WalMiner可以从waL日志中解析出SQL,解析出用户执行的DML语句,以及用户执行DDL语句对系统表产生的DML语句,解析出执行的SQL语句的工具,并能生成对应的undo SQL语句,与传统的logical decode插件相比,walminer不要求logical日志级别且解析方式较为灵活。
当数据库被执行了TRUNCATE等不被wal记录的数据清除操作,或者发生磁盘页损坏,可以使用此功能从wal日志中搜索数据,以期尽量挽回数据,达到数据页挽回的效果。
2、安装walminer插件
3、实例演示
(1)--测试表准备
postgres=# show full_page_writes ;
full_page_writes
------------------
on
(1 row)
postgres=# create table walminer_t(id int,col1 varchar(8));
CREATE TABLE
postgres=# insert into walminer_t values(1,'w1'),(2,'w2');
INSERT 0 2
postgres=# select * from walminer_t ;
id | col1
----+------
1 | w1
2 | w2
(2 rows)
postgres=# select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/166EFD8
(1 row)
postgres=# select pg_walfile_name(pg_current_wal_lsn());
pg_walfile_name
--------------------------
000000010000000000000001
(1 row)
(2)
(2)添加需要解析的wal文件
我们可以添加wal目录,也可以添加指定的wal文件(默认情况下此步骤可以省略,因为会直接加载当前数据字典和当前wal路径下的所有wal文件,但不建议省略)。
postgres=# select walminer_wal_add('/data/pgdata13/pg_wal');
walminer_wal_add
--------------------
1 file add success
(1 row)
postgres=# select walminer_wal_list();
walminer_wal_list
--------------------------------------------------
(/data/pgdata13/pg_wal/000000010000000000000001)
(1 row)
(3)
(3)生成及加载数据字典
此步骤常用于异地实例解析wal文件的场景,需要源端数据库与目标数据库的版本相同,源端数据库需要添加walminer扩展后再生成数据字典,将数据字典传输到目标端数据库进行导入。
--源端数据库
create extension walminer;
select walminer_build_dictionary('/tmp/store_dictionary');
--目标端数据库
create extension walminer;
select walminer_load_dictionary('/tmp/store_dictionary');
(4)解析wal文件
wal文件被解析之后,会被删除。
postgres=# select walminer_all();
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-17 23:46:01.345277+08
walminer_all
---------------------
pg_minerwal success
(1 row)
postgres=# select walminer_wal_list();
ERROR: wal list has not been loaded or has been removed.
(5)查看解析结果
walminer_contents是walminer自动生成的unlogged表,用来存储解析结果,在一次解析开始会先create或truncate此表,在表中我们看到这两条数据的xid、执行时间、执行操作、开始/结束lsn,甚至undo sql文本都写好了,很强大。
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
(2 rows)
(6)范围解析
生产上wal的数量可能非常多,解析全部wal文件可能不现实,walminer支持范围解析,这里做一下简单的展示。
1)基于时间范围解析
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
postgres=# select walminer_by_time('2022-03-17 23:39:00','2022-03-17 23:40:00');
NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:46:44.177991+08
walminer_by_time
---------------------
pg_minerwal success
(1 row)
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
(2 rows)
2)基于lsn范围解析
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
postgres=# select walminer_by_lsn('0/16A0828','0/16A08D0');
NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:50:12.706725+08
walminer_by_lsn
---------------------
pg_minerwal success
(1 row)
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(2 rows)
3)基于指定xid解析
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
postgres=# select walminer_by_xid('499');
NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:52:29.379106+08
walminer_by_xid
---------------------
pg_minerwal success
(1 row)
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
4)范围精确解析
在范围解析的基础上,解析的信息更加精确
select walminer_by_time(starttime, endtime,'true');
select walminer_by_lsn(startlsn, endlsn,'true');
select walminer_by_xid(xid,'true');
或
select wal2sql(starttime, endtime,'true');
select wal2sql(startlsn, endlsn,'true');
select wal2sql(xid,'true');
5)单表解析
通过指定reloid来解析指定的表
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
postgres=# select oid,relname from pg_class where relname='walminer_t';
oid | relname
-------+------------
16497 | walminer_t
(1 row)
postgres=# select walminer_by_time('2022-03-17 23:39:00','2022-03-18 01:40:00','true','16497');
NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 01:01:57.94789+08
walminer_by_time
---------------------
pg_minerwal success
(1 row)
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
(2 rows)
6)替身解析
当我们执行drop 或者 truncate 操作后,数据库的数据字典中的relfilenode可能已经被移除,无法解析出原来wal中的一些内容,在知道旧表表结构的基础上,我们可以使用替身的解析方法。
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
postgres=# select pg_relation_filenode('walminer_t'::regclass);
pg_relation_filenode
----------------------
16497
(1 row)
postgres=# drop table walminer_t;
DROP TABLE
postgres=# create table walminer_n(id int,col1 varchar(8));
CREATE TABLE
postgres=# select walminer_table_avatar('walminer_n', 16497);
walminer_table_avatar
------------------------------
MAP[walminer_n:16560]->16497
(1 row)
postgres=# select walminer_all();
NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce
NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 01:11:32.466401+08
walminer_all
---------------------
pg_minerwal success
(1 row)
postgres=# select * from walminer_contents ;
sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn
-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------
1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_n(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_n WHERE id=1 AND col1='w1' | t | public | walminer_n | 0/166EE10 | 0/166EEB8
2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_n(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_n WHERE id=2 AND col1='w2' | t | public | walminer_n | 0/166EE50 | 0/166EEB8
1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0
2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0
(4 rows)
7)删除wal文件
只能删除wal文件,不能像添加一样直接删除整个目录。
postgres=# select walminer_wal_list();
walminer_wal_list
--------------------------------------------------
(/data/pgdata13/pg_wal/000000010000000000000001)
(1 row)
postgres=# select walminer_wal_remove('/data/pgdata13/pg_wal/');
ERROR: Argument can be file only, an not be a directory
postgres=# select walminer_wal_remove('/data/pgdata13/pg_wal/000000010000000000000001');
walminer_wal_remove
-----------------------
1 file remove success
(1 row)
postgres=# select walminer_wal_list();
ERROR: wal list has not been loaded or has been removed.
8)结束walminer
该函数作用为释放内存,结束日志分析。
postgres=# select walminer_stop();
walminer_stop
------------------
walminer stoped!
4、结束语
这个插件现在的功能现在已经足够处理常规的误操作,达到闪回的效果,且此工具还在不断开发完善中,让我们一起期待一个更强大更完善的救火工具。