PostgreSQL 扩展之 WalMiner

1、简介

WalMiner是从PostgreSQL的WAL(write ahead logs)日志的解析工具,旨在挖掘wal日志所有的有用信息,从而提供PG的数据恢复支持,3.0支持PostgreSQL10及其以上版本(pg14.2版本暂不支持,已与作者沟通)。

WalMiner可以从waL日志中解析出SQL,解析出用户执行的DML语句,以及用户执行DDL语句对系统表产生的DML语句,解析出执行的SQL语句的工具,并能生成对应的undo SQL语句,与传统的logical decode插件相比,walminer不要求logical日志级别且解析方式较为灵活。

当数据库被执行了TRUNCATE等不被wal记录的数据清除操作,或者发生磁盘页损坏,可以使用此功能从wal日志中搜索数据,以期尽量挽回数据,达到数据页挽回的效果。

2、安装walminer插件

 3、实例演示

(1)--测试表准备

postgres=# show full_page_writes ;

full_page_writes

------------------

on

(1 row)

postgres=# create table walminer_t(id int,col1 varchar(8));

CREATE TABLE

postgres=# insert into walminer_t values(1,'w1'),(2,'w2');

INSERT 0 2

postgres=# select * from walminer_t ;

id | col1

----+------

1 | w1

2 | w2

(2 rows)

postgres=# select pg_current_wal_lsn();

pg_current_wal_lsn

--------------------

0/166EFD8

(1 row)

postgres=# select pg_walfile_name(pg_current_wal_lsn());

pg_walfile_name

--------------------------

000000010000000000000001

(1 row)

 (2)

(2)添加需要解析的wal文件

我们可以添加wal目录,也可以添加指定的wal文件(默认情况下此步骤可以省略,因为会直接加载当前数据字典和当前wal路径下的所有wal文件,但不建议省略)。

postgres=# select walminer_wal_add('/data/pgdata13/pg_wal');

walminer_wal_add

--------------------

1 file add success

(1 row)

postgres=# select walminer_wal_list();

walminer_wal_list

--------------------------------------------------

(/data/pgdata13/pg_wal/000000010000000000000001)

(1 row)

 (3)

(3)生成及加载数据字典

此步骤常用于异地实例解析wal文件的场景,需要源端数据库与目标数据库的版本相同,源端数据库需要添加walminer扩展后再生成数据字典,将数据字典传输到目标端数据库进行导入。

--源端数据库

create extension walminer;

select walminer_build_dictionary('/tmp/store_dictionary'); 

--目标端数据库

create extension walminer;

select walminer_load_dictionary('/tmp/store_dictionary');

(4)解析wal文件

wal文件被解析之后,会被删除。

postgres=# select walminer_all();

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-17 23:46:01.345277+08

walminer_all

---------------------

pg_minerwal success

(1 row)

postgres=# select walminer_wal_list();

ERROR: wal list has not been loaded or has been removed.

 (5)查看解析结果

walminer_contents是walminer自动生成的unlogged表,用来存储解析结果,在一次解析开始会先create或truncate此表,在表中我们看到这两条数据的xid、执行时间、执行操作、开始/结束lsn,甚至undo sql文本都写好了,很强大。

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

(2 rows)

(6)范围解析

生产上wal的数量可能非常多,解析全部wal文件可能不现实,walminer支持范围解析,这里做一下简单的展示。

1)基于时间范围解析

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

postgres=# select walminer_by_time('2022-03-17 23:39:00','2022-03-17 23:40:00');

NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:46:44.177991+08

walminer_by_time

---------------------

pg_minerwal success

(1 row) 

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

(2 rows)

 2)基于lsn范围解析

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

postgres=# select walminer_by_lsn('0/16A0828','0/16A08D0');

NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:50:12.706725+08

walminer_by_lsn

---------------------

pg_minerwal success

(1 row)

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(2 rows)

 3)基于指定xid解析

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

postgres=# select walminer_by_xid('499');

NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 00:52:29.379106+08

walminer_by_xid

---------------------

pg_minerwal success

(1 row)

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

 4)范围精确解析

在范围解析的基础上,解析的信息更加精确

select walminer_by_time(starttime, endtime,'true');

select walminer_by_lsn(startlsn, endlsn,'true');

select walminer_by_xid(xid,'true');

select wal2sql(starttime, endtime,'true');

select wal2sql(startlsn, endlsn,'true');

select wal2sql(xid,'true');

 5)单表解析

通过指定reloid来解析指定的表

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

postgres=# select oid,relname from pg_class where relname='walminer_t';

oid | relname

-------+------------

16497 | walminer_t

(1 row)

postgres=# select walminer_by_time('2022-03-17 23:39:00','2022-03-18 01:40:00','true','16497');

NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 01:01:57.94789+08

walminer_by_time

---------------------

pg_minerwal success

(1 row)

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+---------------------------------------------------------+--------------------------------------------------------+----------+--------+------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

(2 rows)

 6)替身解析

当我们执行drop 或者 truncate 操作后,数据库的数据字典中的relfilenode可能已经被移除,无法解析出原来wal中的一些内容,在知道旧表表结构的基础上,我们可以使用替身的解析方法。

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_t WHERE id=1 AND col1='w1' | t | public | walminer_t | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_t(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_t WHERE id=2 AND col1='w2' | t | public | walminer_t | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

postgres=# select pg_relation_filenode('walminer_t'::regclass);

pg_relation_filenode

----------------------

16497

(1 row)

postgres=# drop table walminer_t;

DROP TABLE

postgres=# create table walminer_n(id int,col1 varchar(8));

CREATE TABLE

postgres=# select walminer_table_avatar('walminer_n', 16497);

walminer_table_avatar

------------------------------

MAP[walminer_n:16560]->16497

(1 row)

postgres=# select walminer_all();

NOTICE: Add wal from current pg_wal directory, do not suggest use this way in produce

NOTICE: Switch wal to 000000010000000000000001 on time 2022-03-18 01:11:32.466401+08

walminer_all

---------------------

pg_minerwal success

(1 row)

postgres=# select * from walminer_contents ;

sqlno | xid | topxid | sqlkind | minerd | timestamp | op_text | undo_text | complete | schema | relation | start_lsn | commit_lsn

-------+-----+--------+---------+--------+-------------------------------+-----------------------------------------------------------+----------------------------------------------------------+----------+--------+-------------+-----------+------------

1 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_n(id ,col1) VALUES(1 ,'w1') | DELETE FROM public.walminer_n WHERE id=1 AND col1='w1' | t | public | walminer_n | 0/166EE10 | 0/166EEB8

2 | 499 | 0 | 1 | t | 2022-03-17 23:39:08.891413+08 | INSERT INTO public.walminer_n(id ,col1) VALUES(2 ,'w2') | DELETE FROM public.walminer_n WHERE id=2 AND col1='w2' | t | public | walminer_n | 0/166EE50 | 0/166EEB8

1 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(1 ,'ww1') | DELETE FROM public.walminer_t2 WHERE id=1 AND col1='ww1' | t | public | walminer_t2 | 0/16A0828 | 0/16A08D0

2 | 506 | 0 | 1 | t | 2022-03-18 00:43:31.689753+08 | INSERT INTO public.walminer_t2(id ,col1) VALUES(2 ,'ww2') | DELETE FROM public.walminer_t2 WHERE id=2 AND col1='ww2' | t | public | walminer_t2 | 0/16A0868 | 0/16A08D0

(4 rows)

7)删除wal文件

只能删除wal文件,不能像添加一样直接删除整个目录。

postgres=# select walminer_wal_list();

walminer_wal_list

--------------------------------------------------

(/data/pgdata13/pg_wal/000000010000000000000001)

(1 row)

postgres=# select walminer_wal_remove('/data/pgdata13/pg_wal/');

ERROR: Argument can be file only, an not be a directory

postgres=# select walminer_wal_remove('/data/pgdata13/pg_wal/000000010000000000000001');

walminer_wal_remove

-----------------------

1 file remove success

(1 row)

postgres=# select walminer_wal_list();

ERROR: wal list has not been loaded or has been removed.

 8)结束walminer

该函数作用为释放内存,结束日志分析。

postgres=# select walminer_stop();

walminer_stop

------------------

walminer stoped!

4、结束语

这个插件现在的功能现在已经足够处理常规的误操作,达到闪回的效果,且此工具还在不断开发完善中,让我们一起期待一个更强大更完善的救火工具。

posted @ 2023-09-11 14:34  数据库集中营  阅读(409)  评论(0编辑  收藏  举报