数据库恢复

数据库异常停止(26号23:46分左右)

Thu May 26 23:46:08 2011
Shutting down instance: further logons disabled
Thu May 26 23:46:10 2011
Stopping background process CJQ0
Thu May 26 23:46:10 2011
Stopping background process QMNC
Thu May 26 23:46:12 2011
Stopping background process MMNL
Thu May 26 23:46:13 2011
Stopping background process MMON
Thu May 26 23:46:14 2011
Shutting down instance (immediate)
License high water mark =


数据库启动后 (1:04分)
Fri May 27 01:04:33 2011
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.

从这里我们可以发现,其中宕机和起实例时间,间隔1个小时15分钟。

随后开始报坏块错误:

Fri May 27 01:04:41 2011
Hex dump of (file 1, block 60842) in trace file /oracle/admin/mqora/bdump/mqora_smon_1143592.trc
Corrupt block relative dba: 0x0040edaa (file 1, block 60842) 
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x0040edaa
 last change scn: 0x0000.07169190 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4eee0601
 check value in block header: 0x785d
 computed block checksum: 0x71d
Reread of rdba: 0x0040edaa (file 1, block 60842) found same corrupted data
Fri May 27 01:04:41 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1143592.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 60842)
ORA-01110: data file 1: '/orafs/oradata/mqora/system01.dbf'
Fri May 27 01:04:42 2011

Fri May 27 01:11:57 2011
Hex dump of (file 10, block 979852) in trace file /oracle/admin/mqora/udump/mqora_ora_1441920.trc
Corrupt block relative dba: 0x028ef38c (file 10, block 979852)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x028ef38c
 last change scn: 0x0000.071690c5 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xa3560601
 check value in block header: 0x1502
 computed block checksum: 0x3393
Reread of rdba: 0x028ef38c (file 10, block 979852) found same corrupted data
Fri May 27 01:11:57 2011

Fri May 27 01:12:11 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1168336.trc:
ORA-00600: internal error code, arguments: [2663], [0], [118897294], [0], [118925709], [], [], []
Fri May 27 01:12:12 2011
Flush retried for xcb 0x700000284552aa8, pmd 0x700000282f8d0a8
DEBUG: Restoring block headers for xcb 0x700000284552aa8, pmd 0x700000282f8d0a8
Fri May 27 01:12:12 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1168336.trc:
ORA-00600: internal error code, arguments: [2663], [0], [118897314], [0], [118925709], [], [], []
ORA-00600: internal error code, arguments: [2663], [0], [118897294], [0], [118925709], [], [], []
Fri May 27 01:12:13 2011


然后大概早上8:14分,有人把数据库拉起来了 但是错误不断:
Fri May 27 08:16:07 2011
Hex dump of (file 1, block 60842) in trace file /oracle/admin/mqora/bdump/mqora_smon_1475126.trc
Corrupt block relative dba: 0x0040edaa (file 1, block 60842)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x0040edaa
 last change scn: 0x0000.07169190 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x4eee0601
 check value in block header: 0x785d
 computed block checksum: 0x71d
Reread of rdba: 0x0040edaa (file 1, block 60842) found same corrupted data
Fri May 27 08:16:07 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1475126.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 60842)
ORA-01110: data file 1: '/orafs/oradata/mqora/system01.dbf'
Fri May 27 08:16:07 2011
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=35, OS id=1589542
Fri May 27 08:16:07 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1475126.trc:
ORA-01595: error freeing extent (3) of rollback segment (1))
ORA-01578: ORACLE data block corrupted (file # 2, block # 5710)
ORA-01110: data file 2: '/orafs/oradata/mqora/undotbs01.dbf'
Fri May 27 08:16:08 2011
Completed: alter database open
Fri May 27 08:16:08 2011
Hex dump of (file 3, block 2678) in trace file /oracle/admin/mqora/bdump/mqora_mmon_1458778.trc
Corrupt block relative dba: 0x00c00a76 (file 3, block 2678)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00c00a76
 last change scn: 0x0000.071621d6 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x1d220601
 check value in block header: 0x5787
 computed block checksum: 0x3cf4
Reread of rdba: 0x00c00a76 (file 3, block 2678) found same corrupted data
ORA-1578 encountered when generating server alert SMG-4120
Fri May 27 08:16:08 2011
Hex dump of (file 10, block 959007) in trace file /oracle/admin/mqora/bdump/mqora_j002_1155602.trc
Corrupt block relative dba: 0x028ea21f (file 10, block 959007)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x028ea21f
 last change scn: 0x0000.06f0a356 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x88560601
 check value in block header: 0x44af
 computed block checksum: 0x2b00
Reread of rdba: 0x028ea21f (file 10, block 959007) found same corrupted data
Fri May 27 08:16:08 2011
Errors in file /oracle/admin/mqora/bdump/mqora_j002_1155602.trc:
ORA-12012: error on auto execute of job 81
ORA-01578: ORACLE data block corrupted (file # 10, block # 959007)
ORA-01110: data file 10: '/orafs/oradata/mqora/ORAHR03.dbf'
ORA-06512: at "ORAHR.BACKUP_CHANGELOG", line 4
ORA-06512: at line 1
Fri May 27 08:16:09 2011
Hex dump of (file 3, block 39329) in trace file /oracle/admin/mqora/bdump/mqora_m001_1180480.trc
Corrupt block relative dba: 0x00c099a1 (file 3, block 39329)
Fractured block found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x00c099a1
 last change scn: 0x0000.0716223b seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x2a7a0604
 check value in block header: 0x487c
 computed block checksum: 0xa54
Reread of rdba: 0x00c099a1 (file 3, block 39329) found same corrupted data
Fri May 27 08:16:09 2011
ORA-1578 encountered when generating server alert SMG-4121
ORA-1578 encountered when generating server alert SMG-4121
Fri May 27 08:16:09 2011
Errors in file /oracle/admin/mqora/bdump/mqora_j000_745554.trc:
ORA-12012: error on auto execute of job 61
ORA-12048: error encountered while refreshing materialized view "ORAHR"."ADERRLOGNUM"
ORA-01578: ORACLE data block corrupted (file # 10, block # 959007)
ORA-01110: data file 10: '/orafs/oradata/mqora/ORAHR03.dbf'
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2251
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2457
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2426
ORA-06512: at "ORAHR.REFERSH_MV", line 4
ORA-06512: at line 1
Fri May 27 08:16:14 2011
Corrupt Block Found
         TSN = 2, TSNAME = SYSAUX
         RFN = 3, BLK = 2678, RDBA = 12585590
         OBJN = 8853, OBJD = 8853, OBJECT = WRI$_ALERT_HISTORY, SUBOBJECT =
         SEGMENT OWNER = SYS, SEGMENT TYPE = Table Segment
Corrupt Block Found
         TSN = 6, TSNAME = ORAHR
         RFN = 10, BLK = 959007, RDBA = 42902047
         OBJN = 64519, OBJD = 64519, OBJECT = MID_CHANGELOG, SUBOBJECT = HA
         SEGMENT OWNER = ORAHR, SEGMENT TYPE = Table Partition
Fri May 27 08:16:24 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1441938.trc:
ORA-00600: internal error code, arguments: [2032], [8394318], [8394318], [8192], [2], [255], [0], [767]
Fri May 27 08:16:26 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1441938.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2032], [8394318], [8394318], [8192], [2], [255], [0], [767]
Fri May 27 08:16:30 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1573130.trc:
ORA-00600: internal error code, arguments: [2032], [8394318], [8394318], [8192], [2], [255], [0], [767]
Fri May 27 08:16:31 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1573130.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [2032], [8394318], [8394318], [8192], [2], [255], [0], [767]
Fri May 27 08:18:08 2011
Errors in file /oracle/admin/mqora/bdump/mqora_j000_1397136.trc:
ORA-12012: error on auto execute of job 81
ORA-01578: ORACLE data block corrupted (file # 10, block # 959007)
ORA-01110: data file 10: '/orafs/oradata/mqora/ORAHR03.dbf'
ORA-06512: at "ORAHR.BACKUP_CHANGELOG", line 4
ORA-06512: at line 1
Fri May 27 08:18:13 2011

中间类似的错误,太多了 不一一列举。

我经过dbv检查,除undotbs datafile发现3个物理坏块外,其他数据文件均未发现物理坏块。

DBVERIFY - Verification starting : FILE = /orafs/oradata/mqora/undotbs01.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 131200
Total Pages Processed (Data) : 0
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 131105
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 95
Total Pages Marked Corrupt   : 3    ---一共有3个物理坏块
Total Pages Influx           : 0
Highest block SCN            : 119297785 (0.119297785)

根据alert log所报错坏块,类似错误如下:

Fri May 27 09:47:30 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1205144.trc:
ORA-01595: error freeing extent (3) of rollback segment (1))
ORA-00600: internal error code, arguments: [2032], [8394318], [8394318], [8192], [2], [255], [0], [767]
Fri May 27 09:48:15 2011

经查为如下几个回滚段:
(_SYSSMU1$,_SYSSMU5$,_SYSSMU7$)

根据oracle metalink文档的解释,导致坏块的原因主要有以下几种:

 1) Bad IO hardware / firmware
 2) OS problems
 3) Oracle problems
 3) Recovering through "UNRECOVERABLE" or "NOLOGGING" database actions
 
 
从目前的情况来看,可能的原因为前面2种,即硬件问题或操作系统问题。


处理的过程:

1.   在已经备份的情况下,尝试进行了全库的恢复,如下:

SQL> recover database;
Media recovery complete.
SQL>

Fri May 27 12:23:07 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 5212 Reading mem 0
  Mem# 0: /orafs/oradata/mqora/redo01.log
  Mem# 1: /orafs/oradata/mqora/redo11.log
Fri May 27 12:23:07 2011
Media Recovery Complete (mqora)
Completed: ALTER DATABASE RECOVER  database

 

2.  经过全库恢复成功以后,发现仍然有坏块的提示(undo的3个坏块仍然未恢复成功)。
Fri May 27 12:37:29 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1618192.trc:
ORA-01595: error freeing extent (3) of rollback segment (7))
ORA-01578: ORACLE data block corrupted (file # 2, block # 6098)
ORA-01110: data file 2: '/orafs/oradata/mqora/undotbs01.dbf'
Starting background process QMNC


3.  由于报错是undo block,故尝试使用隐含参数,将其中的3个回滚段offline

_offline_rollback_segments=(_SYSSMU1$,_SYSSMU5$,_SYSSMU7$)
 
 

4.  最后将隐含参数写入pfile中,用pfile启动db,数据库基本正常,但是过一段时间以后,仍有内部错误产生:

Fri May 27 13:11:22 2011
Completed: alter database open
Fri May 27 13:11:55 2011
Thread 1 advanced to log sequence 5220 (LGWR switch)
  Current log# 3 seq# 5220 mem# 0: /orafs/oradata/mqora/redo03.log
  Current log# 3 seq# 5220 mem# 1: /orafs/oradata/mqora/redo13.log
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1491076.trc:
。。。。
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Fri May 27 13:26:55 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1491076.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [518], [4198427], [202], [4198427], [4], []
Fri May 27 13:26:56 2011
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Fri May 27 13:41:18 2011
Errors in file /oracle/admin/mqora/bdump/mqora_m000_1589660.trc:
ORA-00600: internal error code, arguments: [ktspfupdst-1], [], [], [], [], [], [], []
Fri May 27 13:41:21 2011
Thread 1 advanced to log sequence 5221 (LGWR switch)
  Current log# 1 seq# 5221 mem# 0: /orafs/oradata/mqora/redo01.log
  Current log# 1 seq# 5221 mem# 1: /orafs/oradata/mqora/redo11.log
Fri May 27 13:58:10 2011

Fri May 27 14:48:05 2011
Errors in file /oracle/admin/mqora/bdump/mqora_m000_1126866.trc:
ORA-00600: internal error code, arguments: [kcbz_check_objd_typ], [128], [128], [132], [], [], [], []
Fri May 27 14:48:06 2011
Errors in file /oracle/admin/mqora/bdump/mqora_m000_1126866.trc:
ORA-00600: internal error code, arguments: [kcbgcur_3], [65398], [8], [4], [0], [], [], []
Fri May 27 14:53:53 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1515710.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [518], [4198427], [141], [4198427], [3], []
Fri May 27 14:53:54 2011
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
Fri May 27 14:53:55 2011
Errors in file /oracle/admin/mqora/bdump/mqora_smon_1515710.trc:
ORA-00600: internal error code, arguments: [13013], [5001], [518], [4198427], [148], [4198427], [3], []
Non-fatal internal error happenned while SMON was doing flushing of monitored table stats.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Fri May 27 14:53:57 2011

虽然说这样启动db以后,能跑1~2小时,不会宕机,但是错误不断,不过我观察,最后基本上的600错误和
ora-01555错误都是跟那几个offline的回滚段有关系,业务还可以继续跑,可以先不理会。

如下:

ORA-00600: internal error code, arguments: [13013], [5001], [518], [4198427], [152], [4198427], [3], []
ORA-01555 caused by SQL statement below (SQL ID: 4r6xw8nfy1jxd, Query Duration=0 sec, SCN: 0x0000.0721bc3a):
Fri May 27 16:09:27 2011
SELECT count(*) as count FROM mid_changelog where id = :"SYS_B_0"
Fri May 27 16:09:28 2011
Errors in file /oracle/admin/mqora/udump/mqora_ora_1151884.trc:
ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], []

在这里我详细查了一下,根据错误号518 我们定位到具体的object
SQL> select object_name,object_type,owner from dba_objects where data_object_id=518;

OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE         OWNER
------------------- ------------------------------
COL_USAGE$
TABLE               SYS


SQL>
SQL> select dbms_utility.data_block_address_file(4198427) Rfile# ,dbms_utility.data_block_address_block(4198427) "Block#" from dual;

    RFILE#     Block#
---------- ----------
         1       4123

然后dump file 1 block 4123;
*** 2011-05-27 16:37:34.704
Start dump data blocks tsn: 0 file#: 1 minblk 4123 maxblk 4123
buffer tsn: 0 rdba: 0x0040101b (1/4123)
scn: 0x0000.0721b9d1 seq: 0x02 flg: 0x04 tail: 0xb9d10602
frmt: 0x02 chkval: 0xd5bc type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

Block header dump:  0x0040101b
 Object id on Block? Y
 seg/obj: 0x206  csc: 0x00.721b9cf  itc: 2  flg: O  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000d.02c.0000023a  0x00812c4f.0084.04  ----    2  fsc 0x0000.00000000
0x02   0x0009.015.000199f5  0x008012f4.4622.13  ----    2  fsc 0x0000.00000000  --(未提交事务,每个事务锁定2条记录)
 
data_block_dump,data header at 0x11043f05c
===============
tsiz: 0x1fa0
hsiz: 0x1a4
pbl: 0x11043f05c
bdba: 0x0040101b
     76543210
flag=--------
ntab=1
nrow=201
frre=8
fsbo=0x1a4
fseo=0x888
avsp=0x11cd
tosp=0x11cd


根据上面的UBA,又定位到2个块,发现是2个有问题的undo block,继续dump这2个问题undo block:
SQL> select dbms_utility.data_block_address_file(TO_NUMBER('812c4f', 'XXXXXXXX')) file_id,
  2  dbms_utility.data_block_address_block(TO_NUMBER('812c4f', 'XXXXXXXX')) block_id from dual;

   FILE_ID   BLOCK_ID
---------- ----------
         2      76879

SQL>
SQL> select dbms_utility.data_block_address_file(TO_NUMBER('8012f4', 'XXXXXXXX')) file_id,
  2  dbms_utility.data_block_address_block(TO_NUMBER('8012f4', 'XXXXXXXX')) block_id from dual;

   FILE_ID   BLOCK_ID
---------- ----------
         2       4852


Start dump data blocks tsn: 1 file#: 2 minblk 76879 maxblk 76879
buffer tsn: 1 rdba: 0x00812c4f (2/76879)
scn: 0x0000.0721ba25 seq: 0x02 flg: 0x04 tail: 0xba250202
frmt: 0x02 chkval: 0xdaf9 type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1

End dump data blocks tsn: 1 file#: 2 minblk 76879 maxblk 76879
Start dump data blocks tsn: 1 file#: 2 minblk 4852 maxblk 4852
buffer tsn: 1 rdba: 0x008012f4 (2/4852)
scn: 0x0000.07220e37 seq: 0x02 flg: 0x04 tail: 0x0e370202
frmt: 0x02 chkval: 0xe0c0 type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1


从上看,很明显这2个block是逻辑坏块。

难道说就这里就卡住了 ? 到此为止,我个人感觉通过处理undo 表空间,应该可以解决该问题。

posted @ 2011-07-19 15:50  dbblog  阅读(1963)  评论(0编辑  收藏  举报