博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

记一次揪心的MySQL数据恢复过程

Posted on 2018-07-05 23:31  moss_tan_jun  阅读(1457)  评论(0编辑  收藏  举报

https://blog.csdn.net/poxiaonie/article/details/78304699

 

===

先说下背景,公司其中一个项目所有服务都部署在客户的机房内,机房较小,没有UPS。其中一个MySQL实例(单机,无主从,windows server 2008,MySQL5.6.19)存放大量的日志数据,每天几十G的数据,定期清除(保存大概四个月的数据),由于硬盘空间不够,所以没有定期的备份。机房突然断电,启动MySQL server,当时没有注意错误日志,但是访问其中一个表时,server自动挂掉,这才意识到数据库可能因为突然掉电导致无法正常启动,然后查看错误日志:

2017-10-12 18:05:22 bd0 InnoDB: Error: page 756 log sequence number 786184012016
InnoDB: is in the future! Current system log sequence number 786183991367.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

根据错误提示:数据文件的LSN比redo log的LSN要大,当系统尝试使用Redo Log去修复数据页面的时候,发现Redo Log LSN比数据页面还小,所以导致错误。数据页的LSN在一般情况下,都是小于Redo Log的,因为在事物提交或按照 innodb_trx_commit 设置的方式提交时,先将事物顺序写入Redo Log , 然后后台线程按照 max_prt_dirty_page 参数设置的比例刷新或当系统检测到当10秒内系统会执行刷新脏页操作,所以,数据页的LSN正常情况下永远会比Redo Log 的LSN 小。

因此通过设置innodb_force_recovery大于0 ,重启数据库服务,导出重要的数据,重建数据库。 
首先了解下innodb_force_recovery 设置为不同值对启动数据库服务过程的影响,大的数字包含前面所有数字的影响。 
- 1 (SRV_FORCE_IGNORE_CORRUPT): 忽略检查到的corrupt页。 
- 2 (SRV_FORCE_NO_BACKGROUND): 阻止主线程的运行,如主线程需要执行full purge操作,会导致crash。 
- 3 (SRV_FORCE_NO_TRX_UNDO): 不执行事务回滚操作。 
- 4 (SRV_FORCE_NO_IBUF_MERGE): 不执行插入缓冲的合并操作。 
- 5 (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存储引擎会将未提交的事务视为已提交。

当设置参数值大于0后,可以对表进行select,create,drop操作,但insert,update或者delete这类操作是不允许的。当然即使innodb_force_recovery>0 ,你也可以DROP或CREATE表。

以此设置innodb_force_recovery为1到6,启动服务使用mysqldump备份数据库,每次都提示同样的错误:

mysqldump -uuser -ppasswd --skip-ext
ended_insert --hex-blob -y -n -t --routines --events --triggers --databases db_name >> "d:/db_name.sql"
Warning: Using a password on the command line interface can be insecure.
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping
 table `tb_name` at row: 50548
  • 1
  • 2
  • 3
  • 4
  • 5

然后执行:

select id from tb_name limit 50548,1;
  • 1

同样提示:

ERROR 2013 (HY000): Lost connection to MySQL server during query
  • 1

但是执行如下能正常获取数据:

select id from tb_name limit 50547,1;
  • 1

查看错误日志,分析是BLOB字段超过768字节的存在溢出页上的部分数据已经损坏:

End of page dump
2017-10-12 18:16:41 258 InnoDB: uncompressed page, stored checksum in field1 3939709550, calculated checksums for field1: crc32 3646189668, innodb 3963718570, none 25210765039, stored checksum in field2 0, calculated checksums for field2: crc32 3646189668, innodb 1246618578, none 3735928559, page LSN 0 1201607135, low 4 bytes of LSN at page end 0, page number (if stored to page already) 125076, space id (if created with >= MySQL-4.1.1 and stored already) 77
InnoDB: Page may be a BLOB page
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 125076.
  • 1
  • 2
  • 3
  • 4
  • 5

现在无法通过MySQL服务进行正常的逻辑备份恢复了,只能通过工具对idb文件进行记录的解析来获取记录。

下面是使用undrop-for-innodb工具恢复表数据的过程(只可以在64位linux系统下运行,只能恢复MySQL5.6版本下的数据); 
下载地址: 
https://github.com/twindb/undrop-for-innodb

TwinDB Data Recovery Toolkit is a set of tools that operate with MySQL files at low level and allow to recover InnoDB databases after different failure scenarios.
The toolkit is also known as UnDrop for InnoDB, which is more accurate name because the toolkit works with InnoDB tables.
The tool recovers data when backups are not available. It supports recovery from following failures:

 - A table or database was dropped.
 - InnoDB table space corruption.
 - Hard disk failure.
 - File system corruption.
 - Records were deleted from a table.
 - A table was truncated.
 - InnoDB files were accidentally deleted.
 - A table was dropped and created empty one.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

注:percona公司开源的一款工具percona-data-recovery-tool-for-innodb,使用和undrop-for-innodb几乎一样;

安装undrop-for-innodb

  • 解压master.zip
  • 进入undrop-for-innodb-master,编译make,编译后文件中生成两个工具:c_parser和stream_parser
./stream_parser使用参数
[root@localhost undrop-for-innodb-master]# ./stream_parser -h
Usage: ./stream_parser -f <innodb_datafile> [-T N:M] [-s size] [-t size] [-V|-g]
  Where:
    -h         - Print this help
    -V or -g   - Print debug information
    -s size    - Amount of memory used for disk cache (allowed examples 1G 10M). Default 100M
    -T         - retrieves only pages with index id = NM (N - high word, M - low word of id)
    -t size    - Size of InnoDB tablespace to scan. Use it only if the parser can't determine it by himself.

c_parser使用参数
[root@localhost undrop-for-innodb-master]# ./c_parser -h
Error: Usage: ./c_parser -4|-5|-6 [-dDV] -f <InnoDB page or dir> -t table.sql [-T N:M] [-b <external pages directory>]
  Where
    -f <InnoDB page(s)> -- InnoDB page or directory with pages(all pages should have same index_id)
    -t <table.sql> -- CREATE statement of a table
    -o <file> -- Save dump in this file. Otherwise print to stdout
    -l <file> -- Save SQL statements in this file. Otherwise print to stderr
    -h  -- Print this help
    -d  -- Process only those pages which potentially could have deleted records (default = NO)
    -D  -- Recover deleted rows only (default = NO)
    -U  -- Recover UNdeleted rows only (default = YES)
    -V  -- Verbose mode (lots of debug information)
    -4  -- innodb_datafile is in REDUNDANT format
    -5  -- innodb_datafile is in COMPACT format
    -6  -- innodb_datafile is in MySQL 5.6 format
    -T  -- retrieves only pages with index id = NM (N - high word, M - low word of id)
    -b <dir> -- Directory where external pages can be found. Usually it is pages-XXX/FIL_PAGE_TYPE_BLOB/
    -i <file> -- Read external pages at their offsets from <file>.
    -p prefix -- Use prefix for a directory name in LOAD DATA INFILE command
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30

恢复

  • 创建frs_person_type.sql,将要恢复表的表定义写入frs_person_type.sql中
  • 首先拆分共享表空间ibdata1文件(目的是为了获取每个表中主键ID):
[root@localhost undrop-for-innodb-master]#./stream_parser -f /path/mysql/data/ibdata1
  • 1

此时目录中创建了pages-ibdata文件夹,里面包括了对各种表的索引信息等;

  • 然后拆分db_name.ibd:
[root@localhost undrop-for-innodb-master]# ./stream_parser -f /path/mysql/data/test_db/frs_person_type.ibd
  • 1

此时目录中创建里pages-frs_person_type.ibd文件夹

  • 恢复innodb目录 
    我们需要知道表test_db.frs_person_type的PRIMARY索引的index_id。 查看更多的InnoDB字典。 现在我们将得到test_db.frs_person_type的index_id:
[root@localhost undrop-for-innodb-master]# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000001.page -t dictionary/SYS_TABLES.sql |grep type
00000000060C    D8000001B30110  SYS_TABLES  "test_db/frs\_card\_type"   35  2   1   0   80  ""  21
00000000060F    DB000001780110  SYS_TABLES  " test_db/frs\_person\_type"    36  7   1   0   80  ""  22
00000000060C    D8000001B30110  SYS_TABLES  " test_db/frs\_card\_type"  35  2   1   0   80  ""  21

[root@localhost undrop-for-innodb-master]# ./c_parser -4f pages-ibdata1/FIL_PAGE_INDEX/0000000000000003.page -t dictionary/SYS_INDEXES.sql |grep 36
00000000060F    DB0000017801D9  SYS_INDEXES 36  64  "PRIMARY"   1   3   22  3
000000000615    610000017C01F9  SYS_INDEXES 36  65  "c\_id" 1   0   22  4
00000000061D    670000018001F9  SYS_INDEXES 36  66  "type"  1   0   22  5
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

所以,test_db.frs_person_type表的PRIMARY索引的index_id是64

  • 从表的PRIMARY索引恢复记录 
    c_parser读取InnoDB页面,将它们与给定的表结构进行匹配,并以制表符分隔的值格式转储记录。与InnoDB相对,当c_parser命中损坏的区域时,它会跳过它并继续阅读页面。我们从index_id 64读取记录,这是根据字典的PRIMARY索引。 
    “` 
    [root@localhost undrop-for-innodb-master]# time ./c_parser -5f pages-frs_person_type.ibd/FIL_PAGE_INDEX/0000000000000064.page -t frs_person_type.sql > frs_person_type 2> frs_person_type.sql
上述生成的frs_person_type为数据记录文件,frs_person_type.sql为导入frs_person_type文件数据的SQL语句;

 - 带有BLOB字段的表恢复
 如果表具有BLOB,TEXT或类似的大字段,一些值可能存储在外部页面中。 外部页面通常在stream_parser结果中的目录FIL_PAGE_TYPE_BLOB中:
 ```
[root@localhost undrop-for-innodb-master]# ll pages-frs_grab.ibd/
total 0
drwxr-xr-x 2 root root 42 Oct 20 14:01 FIL_PAGE_INDEX
drwxr-xr-x 2 root root 10 Oct 20 14:01 FIL_PAGE_TYPE_BLOB
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

如果使用上一步语句恢复的时候,frs_person_type.sql文件中会出现大量BLOB页找不到的情况:

#####CannotOpen_./0000000000002021.page;
— print_field_value_with_external(): open(): No such file or directory
  • 1
  • 2

这就需要使用参数-b指定溢出页位置:

time ./c_parser -5f pages-frs_grab/FIL_PAGE_INDEX/0000000000002067.page -b  pages-frs_grab.ibd/FIL_PAGE_ -t frs_grab.sql > dfrs_grab 2> frs_grab.sql
  • 1

在一些罕见的情况下,page-frs_grab.ibd / FIL_PAGE_TYPE_BLOB/为空,因为在某些MySQL版本上,外部页面的类型为FIL_PAGE_INDEX。 这是一种意想不到的行为,但有一个解决方法。 
To read external pages from a file (e.g. ibdata1) option -i is introduced:

-i <file> -- Read external pages at their offsets from <file>.