【转】基于RMAN实现坏块介质恢复(blockrecover)

 

     本文转自:乐沙弥的世界

 

 

对于物理损坏的数据块,我们可以通过RMAN块介质恢复(BLOCK MEDIA RECOVERY)功能来完成受损块的恢复,而不需要恢复整个数据库或所有文件来修复这些少量受损的数据块。恢复整个数据库或数据文件那不是大炮用来打蚊子,有点不值得!但前提条件是你得有一个可用的RMAN备份存在,因此,无论何时备份就是一切。本文演示了产生坏块即使用RMAN实现坏块恢复的全过程。

1、创建演示环境

SQL> select * from v$version where rownum<2;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

--创建用于演示的data file
SQL> create tablespace tbs_tmp datafile '/u02/database/usbo/oradata/tbs_tmp.dbf' size 10m autoextend on;

SQL> conn scott/tiger;

--基于新的数据文件创建对象tb_tmp
SQL> create table tb_tmp tablespace tbs_tmp as select * from dba_objects;

SQL> col file_name format a60
SQL> select file_id,file_name from dba_data_files where tablespace_name='TBS_TMP';

   FILE_ID FILE_NAME
---------- ------------------------------------------------------------
         6 /u02/database/usbo/oradata/tbs_tmp.dbf

--表对象tb_tmp上的信息,包含对应的文件信息,头部块,总块数
SQL> select segment_name , header_file , header_block,blocks      
  2  from dba_segments
  3  where segment_name = 'TB_TMP' and owner='SCOTT';

SEGMENT_NAME                   HEADER_FILE HEADER_BLOCK     BLOCKS
------------------------------ ----------- ------------ ----------
TB_TMP                                   6          130       1152

--首先使用rman备份对应的数据文件
$ $ORACLE_HOME/bin/rman target /
RMAN> backup datafile 6 tag=health;

Starting backup at 2013/08/28 17:03:15
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
channel ORA_DISK_1: starting piece 1 at 2013/08/28 17:03:16
channel ORA_DISK_1: finished piece 1 at 2013/08/28 17:03:17
piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2013/08/28 17:03:17
RMAN> exit

2、单块数据块损坏的恢复处理

--下面使用了linux自带的dd命令来损坏单块数据块
[oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=130 <<EOF
> Corrupted block!
> EOF
0+1 records in
0+1 records out
17 bytes (17 B) copied, 0.000184519 seconds, 92.1 kB/s

--清空buffer cache
SQL> alter system flush buffer_cache;

--查询表对相 tb_tmp,收到ORA-01578
SQL> select count(*) from tb_tmp;
select count(*) from tb_tmp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 130)
ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'

--查询视图v$database_block_corruption,提示有坏块,注意该视图可能不会返回任何数据,如无返回,先执行backup validate
SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         6        129          1                  0 CORRUPT

--也可以使用dbv工具来校验坏块,参考: http://blog.csdn.net/robinson_0612/article/details/6530890   

--下面使用blockrecover来恢复坏块      
RMAN> blockrecover datafile 6 block 130;

Starting recover at 2013/08/28 17:22:25
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=24 device type=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp
channel ORA_DISK_1: piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 2013/08/28 17:22:31

--再次查询表tb_emp正常
SQL> select count(*) from tb_tmp;

  COUNT(*)
----------
     72449

3、多块数据块损坏的恢复处理

--下面使用linux dd命令对不连续块损坏
[oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=133 <<EOF
> New corrupted block!
> EOF
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.000182835 seconds, 115 kB/s
[oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=143 <<EOF 
> New corrupted block!
> EOF
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.000115527 seconds, 182 kB/s
[oracle@linux1 ~]$ dd of=/u02/database/usbo/oradata/tbs_tmp.dbf bs=8192 conv=notrunc seek=153 <<EOF 
> New corrupted block!
> EOF
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.000335781 seconds, 62.5 kB/s

SQL> alter system flush buffer_cache;

--下面提示块133被损坏,注意我们损坏了多块数据块,但查询时,从块号最小的开始提示,如133被修复后还有坏块则继续提示133之后的坏块
SQL> select count(*) from scott.tb_tmp;
select count(*) from scott.tb_tmp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 133)
ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'

--查询视图v$database_block_corruption无任何记录
SQL> select * from v$database_block_corruption;

no rows selected

--下面使用backup validate来校验数据文件
RMAN> backup validate datafile 6;

Starting backup at 2013/08/29 09:42:04
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=22 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6    FAILED 0              223          1408            838489       --字段Status为FAILED
  File Name: /u02/database/usbo/oradata/tbs_tmp.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1029            
  Index      0              0               
  Other      3              156             --有3个Blocks Failing

validate found one or more corrupt blocks
See trace file /u02/database/usbo/diag/rdbms/usbo/usbo/trace/usbo_ora_27874.trc for details
Finished backup at 2013/08/29 09:42:06

--再次查询v$database_block_corruption,表明有3个损坏的块
SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         6        153          1                  0 CORRUPT
         6        143          1                  0 CORRUPT
         6        133          1                  0 CORRUPT

--下面直接使用blockrecover corruption list来恢复,如下所有刚刚被校验的坏块都会被恢复
RMAN> blockrecover corruption list;  

Starting recover at 2013/08/29 10:05:24
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00006
channel ORA_DISK_1: reading from backup piece /u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp
channel ORA_DISK_1: piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_28/o1_mf_nnndf_HEALTH_91vh6ntb_.bkp tag=HEALTH
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 2013/08/29 10:05:28

--校验结果
SQL> select count(*) from scott.tb_tmp;

  COUNT(*)
----------
     72449

4、坏块的对象定位与影响

--下面我们查询块号为163上的对象
SQL> select dbms_rowid.rowid_object(rowid) object_id,dbms_rowid.rowid_relative_fno(rowid) file_id,
  2  dbms_rowid.rowid_block_number(rowid) block_id,owner,object_name,object_id
  3  from scott.tb_tmp where dbms_rowid.rowid_block_number(rowid)=163 and rownum<=2;

 OBJECT_ID    FILE_ID   BLOCK_ID OWNER        OBJECT_NAME                     OBJECT_ID
---------- ---------- ---------- ------------ ------------------------------ ----------
     74555          6        163 SYS          GV_$QUEUEING_MTH                     2439
     74555          6        163 PUBLIC       GV$QUEUEING_MTH                      2440

--使用上面的方法,我们损块块163,173,此处不再列出

a、对于坏块对象无法进行聚合汇总等操作     
SQL> select count(*) from scott.tb_tmp;
select count(*) from scott.tb_tmp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'

b、对于坏块上的记录无法被查询
--我们使用基于之前查询到的OBJECT_ID来查询
SQL> select owner,object_name,object_id from scott.tb_tmp where object_id in(2439,2440);
select owner,object_name,object_id from scott.tb_tmp where object_id in(2439,2440)
                                              *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 6, block # 163)
ORA-01110: data file 6: '/u02/database/usbo/oradata/tbs_tmp.dbf'

--如下面的查询,位于损坏块上的数据无法被查询到,但对于未损坏的依旧可以查询。下面的查询时块161上的对象
SQL> select owner,object_name,object_id from scott.tb_tmp 
  2  where dbms_rowid.rowid_block_number(rowid)=161 and rownum<3;

OWNER                          OBJECT_NAME                     OBJECT_ID
------------------------------ ------------------------------ ----------
PUBLIC                         GV$RECOVERY_LOG                      2285
SYS                            GV_$ARCHIVE_GAP                      2286

--Author : Robinson Cheng
--Blog   : http://blog.csdn.net/robinson_0612
     
c、定位受损块所对应的对象
SQL> run get_obj_name_from_corrupt_block
  1  SELECT tablespace_name,
  2         segment_type,
  3         owner,
  4         segment_name,
  5         partition_name
  6    FROM dba_extents
  7*  WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
Enter value for file_id: 6
Enter value for block_id: 133
old   7:  WHERE file_id = &file_id AND &block_id BETWEEN block_id AND block_id + blocks - 1
new   7:  WHERE file_id = 6 AND 133 BETWEEN block_id AND block_id + blocks - 1

TABLESPACE_NAME                SEGMENT_TYPE       OWNER          SEGMENT_NAME      PARTITION_NAME
------------------------------ ------------------ -------------- ----------------- -----------------
TBS_TMP                        TABLE              SCOTT          TB_TMP 

d、对于损坏的数据文件,缺省情况下,不能对其进行备份,如下
RMAN> backup datafile 6 tag='corruption';                                                    
                                                                                                      
Starting backup at 2013/08/29 10:37:32                                                       
using channel ORA_DISK_1                                                                     
channel ORA_DISK_1: starting full datafile backup set                                        
channel ORA_DISK_1: specifying datafile(s) in backup set                                     
input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf                 
channel ORA_DISK_1: starting piece 1 at 2013/08/29 10:37:32                                  
RMAN-00571: ===========================================================                      
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============                      
RMAN-00571: ===========================================================                      
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/29/2013 10:37:33           
ORA-19566: exceeded limit of 0 corrupt blocks for file /u02/database/usbo/oradata/tbs_tmp.dbf 

--需要设定允许损坏块的数量之后才能进行备份
RMAN> run{
2> set maxcorrupt for datafile 6 to 2;
3> backup datafile 6 tag='corruption';
4> }

executing command: SET MAX CORRUPT

Starting backup at 2013/08/29 10:41:24
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00006 name=/u02/database/usbo/oradata/tbs_tmp.dbf
channel ORA_DISK_1: starting piece 1 at 2013/08/29 10:41:25
channel ORA_DISK_1: finished piece 1 at 2013/08/29 10:41:26
piece handle=/u02/database/usbo/fr_area/USBO/backupset/2013_08_29/o1_mf_nnndf_CORRUPTION_91xf6o18_.bkp tag=CORRUPTION comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2013/08/29 10:41:26       

--查看备份信息如下,应在修复坏块后重新备份以避免由于保留策略导致先前可用的备份被aged out
RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time     #Pieces #Copies Compressed Tag
------- -- -- - ----------- ------------------- ------- ------- ---------- ---
1       B  F  A DISK        2013/08/28 17:03:17 1       1       NO         HEALTH
3       B  F  A DISK        2013/08/29 10:41:25 1       1       NO         CORRUPTION

5、后记

a、对于受损的数据块,仅仅坏块上的数据无法被查询或读取,其余正常块的数据依旧可以使用。
b、对于受损的表对象进行聚合等相关运算时收到错误提示,因为坏块上的数据无法被统计。如果你聚合的是索引列,索引未损坏的情形则可正常返回。
c、可以基于RMAN可用的备份文件实现块介质恢复,其数据文件无需offline,开销最小,影响最小。
d、对于多个数据块的损坏,先执行backup validate校验数据库或相应的数据文件以便标记受损的坏块后,填充v$database_block_corruption以及后续恢复。
e、对于使用backup validate 校验后的情形,坏块恢复时可以直接使用blockrecover corruption list一次性恢复所有的坏块。
f、缺省情况下,存在坏块的数据文件无法成功备份,也会导致自动备份脚本失败。

posted @ 2017-01-09 12:03  三鹿专供  阅读(726)  评论(0编辑  收藏  举报