[原]Oracle Control File 意外情况研究

差不多4年前吧,那时候才刚接触Oracle,只记得一位Oracle老师说过“在你的Oracle DBA生涯中不应该丢失Control File”,可见Control File是如何的重要。

近日在泡ITPUB的时候发现一贴:能否在不停数据库情况下换掉损坏的控制文件 ,经过多位高现身说法,自己也做了点实验,与大家分享一下:

首先,看一下我的机器情况:

OS 情况:

[root@mailserver ~]# uname -a ; cat /etc/redhat-release ; 
Linux mailserver 2.6.9-67.ELsmp #1 SMP Fri Nov 16 12:49:06 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 4.6 (Final)

Oracle 信息:

sys@mydb(10.168.0.202) SQL> select INSTANCE_NAME,version,database_status from v$instance;
INSTANCE_NAME    VERSION           DATABASE_STATUS
---------------- ----------------- -----------------
mydb             10.2.0.1.0        ACTIVE

sys@mydb(10.168.0.202) SQL> select name,open_mode from v$database;
NAME      OPEN_MODE
--------- ----------
MYDB      READ WRITE

sys@mydb(10.168.0.202) SQL> select name,status from v$controlfile;       
NAME                                               STATUS
-------------------------------------------------- -------
/u02/oradata/mydb/controlfile/control01.ctl
/u02/oradata/mydb/controlfile/control02.ctl
/u02/oradata/mydb/controlfile/control03.ctl
/u02/oradata/mydb/controlfile/control04.ctl

从查询中可以看到我的Oracle有 4 个control file。

在OS中查一下control file的情况:

[oracle@mailserver controlfile]$ pwd    
/u02/oradata/mydb/controlfile
[oracle@mailserver controlfile]$ ls -lth
总用量 31M
-rw-r-----  1 oracle oinstall 7.8M 12月  2 14:57 control01.ctl
-rw-r-----  1 oracle oinstall 7.8M 12月  2 14:57 control02.ctl
-rw-r-----  1 oracle oinstall 7.8M 12月  2 14:57 control03.ctl
-rw-r-----  1 oracle oinstall 7.8M 12月  2 14:57 control04.ctl
[oracle@mailserver controlfile]$ watch -n1 md5sum * 
Every 1.0s: md5sum control01.ctl control02.ctl control03.ctl control04.ctl                                  Wed Dec  2 15:11:32 2009

1cbca423c5c18fea3cb1fc13401c49d6  control01.ctl
b6a02759b5896da4305d0f2d324073fb  control02.ctl
b6a02759b5896da4305d0f2d324073fb  control03.ctl
1cbca423c5c18fea3cb1fc13401c49d6  control04.ctl

观察 control file 更新情况,这是个空闲库,我把listener也关掉了。我一直以为control file应该都是一样的内容,但是从md5sum的结果来看不是,第1和第4始终保持一样,第二和第三始终保持一样,很有趣的发现

开始做实验: 实验一,模拟Control File 丢失:

[oracle@mailserver controlfile]$ ls
control01.ctl  control02.ctl  control03.ctl  control04.ctl
[oracle@mailserver controlfile]$ rm -rf control04.ctl 
[oracle@mailserver controlfile]$ ls
control01.ctl  control02.ctl  control03.ctl
[oracle@mailserver controlfile]$ watch -n2 md5sum * 
Every 2.0s: md5sum control01.ctl control02.ctl control03.ctl                                                Wed Dec  2 15:21:12 2009

2ccb97d9a9ab834d75a3a503960a0cbd  control01.ctl
a1fc984f27925f0b1ab2b17928289963  control02.ctl
a1fc984f27925f0b1ab2b17928289963  control03.ctl

可以观察到 control file 还是在更新的,在session#0 中做一些操作:

sys@mydb(10.168.0.202) SQL> alter system checkpoint ;

System altered.

sys@mydb(10.168.0.202) SQL> alter system checkpoint ;

System altered.

sys@mydb(10.168.0.202) SQL> alter system switch logfile ;

System altered.

在session#1 中做一下操作:

zhyw@mydb(10.168.0.202) SQL> update ip_state set state=2 where state=1;   

130 rows updated.

zhyw@mydb(10.168.0.202) SQL> commit;

Commit complete.

zhyw@mydb(10.168.0.202) SQL> select count(*) from ip_state;                      

  COUNT(*)
----------
       189

zhyw@mydb(10.168.0.202) SQL> create table ip_state_bak as select * from ip_state;
create table ip_state_bak as select * from ip_state
                                           *
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

终于看到丢失一个控制文件的影响了 ^_^

尝试复制一个control file 出来,Oracle还在跑啊

[oracle@mailserver controlfile]$ pwd ; ls;
/u02/oradata/mydb/controlfile
control01.ctl  control02.ctl  control03.ctl
[oracle@mailserver controlfile]$ cp control01.ctl  control04.ctl 
[oracle@mailserver controlfile]$ watch -n2 md5sum * 
Every 2.0s: md5sum control01.ctl control02.ctl control03.ctl control04.ctl                                  Wed Dec  2 15:30:53 2009

2998c5d3ffe493312284efc5f87c95e7  control01.ctl
f77690d016d0f1cbd5256cf0bdda0d93  control02.ctl
f77690d016d0f1cbd5256cf0bdda0d93  control03.ctl
49a9c4f48732cded164f474c8464b999  control04.ctl

[oracle@mailserver controlfile]$ watch -n2 md5sum * 
Every 2.0s: md5sum control01.ctl control02.ctl control03.ctl control04.ctl                                  Wed Dec  2 15:31:13 2009

ae6ae06f1395ac9c6783d783be566fb4  control01.ctl
7e56cfb3412a0dc0ed15c55c7e128283  control02.ctl
7e56cfb3412a0dc0ed15c55c7e128283  control03.ctl
49a9c4f48732cded164f474c8464b999  control04.ctl

可见“自作聪明”地Copy一个Control File 出来是“冻结”不动的。过了一阵,Oracle 挂掉了,alert log中出现:

Wed Dec  2 15:32:35 2009
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
Instance terminated by USER, pid = 2335

看来丢失control file不可怕,最怕control file不一致啊。

此时Oracle挂掉了,control file 当然也不会更新了,做以下操作:

[oracle@mailserver controlfile]$ pwd ; ls ;
/u02/oradata/mydb/controlfile
control01.ctl  control02.ctl  control03.ctl  control04.ctl
[oracle@mailserver controlfile]$ rm -rf control04.ctl 
[oracle@mailserver controlfile]$ cp control01.ctl control04.ctl  
[oracle@mailserver controlfile]$ md5sum *
f513df299a0789b6e527a2ef6c571d25  control01.ctl
28fd3396e22604ba55b6cbe9d2e8fd40  control02.ctl
28fd3396e22604ba55b6cbe9d2e8fd40  control03.ctl
f513df299a0789b6e527a2ef6c571d25  control04.ctl

然后尝试启动Oracle

还是可以起来的 :)

sys@mydb(10.168.0.202) SQL> select name,open_mode from v$database;
NAME      OPEN_MODE
--------- ----------
MYDB      READ WRITE

sys@mydb(10.168.0.202) SQL> select name,status from v$controlfile;       
NAME                                               STATUS
-------------------------------------------------- -------
/u02/oradata/mydb/controlfile/control01.ctl
/u02/oradata/mydb/controlfile/control02.ctl
/u02/oradata/mydb/controlfile/control03.ctl
/u02/oradata/mydb/controlfile/control04.ctl

再做一个实验,叫实验二,这次是破坏掉其中一个control file,不是“弄丢”噢 :-)

[oracle@mailserver controlfile]$ date +%x%X ; date > control04.ctl 
2009年12月02日15时57分32秒
###### 在 2009年12月02日15时57分32秒 “搞坏”一个Control File
###### alert file 中出现:
Wed Dec  2 15:57:34 2009
Hex dump of (file 0, block 1) in trace file /u01/app/admin/mydb/bdump/mydb_arc0_2935.trc
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Completely zero block found during control file header read
Wed Dec  2 15:57:34 2009
Errors in file /u01/app/admin/mydb/bdump/mydb_arc0_2935.trc:
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'
Wed Dec  2 15:57:34 2009
Errors in file /u01/app/admin/mydb/bdump/mydb_arc0_2935.trc:
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'

这证明,我“的确搞坏”了一个控制文件

在一个普通用户中执行一些语句:

zhyw@mydb(10.168.0.202) SQL> update ip_state set state=2 where state=1; 

130 rows updated.

zhyw@mydb(10.168.0.202) SQL> commit;

Commit complete.

可以更新,可以提交.

建表

zhyw@mydb(10.168.0.202) SQL> create table a01 as select * from ip_state;
create table a01 as select * from ip_state
                                  *
ERROR at line 1:
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'

看来Control File出问题后,建表是有问题的,不知其他DDL语句有没有影响。

做一个查询:

zhyw@mydb(10.168.0.202) SQL> select sysdate,ip,state from ip_state where rownum<10;    

SYSDATE             IP                   STATE
------------------- --------------- ----------
2009-12-02 16:00:52 10.168.10.132            2
2009-12-02 16:00:52 10.168.10.133            2
2009-12-02 16:00:52 10.168.10.134            2
2009-12-02 16:00:52 10.168.10.135            0
2009-12-02 16:00:52 10.168.10.136            0
2009-12-02 16:00:52 10.168.10.137            0
2009-12-02 16:00:52 10.168.10.138            0
2009-12-02 16:00:52 10.168.10.139            2
2009-12-02 16:00:52 10.168.10.140            0

大家可以看到,Oracle在坏掉一个控制文件的情况下坚持近3分钟,从“2009年12月02日15时57分32秒” 到“2009-12-02 16:00:52”

等不及了,强制checkpoint 一下,让Oracle挂掉

sys@mydb(10.168.0.202) SQL> select sysdate from dual;
SYSDATE
-------------------
2009-12-02 16:00:56

sys@mydb(10.168.0.202) SQL> alter system checkpoint; 
ERROR:
ORA-03114: not connected to ORACLE


alter system checkpoint
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

此时,alert file 出现:

Hex dump of (file 0, block 1) in trace file /u01/app/admin/mydb/bdump/mydb_ckpt_2909.trc
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Completely zero block found during control file header read
Wed Dec  2 16:00:57 2009
Errors in file /u01/app/admin/mydb/bdump/mydb_ckpt_2909.trc:
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'
Wed Dec  2 16:00:57 2009
Errors in file /u01/app/admin/mydb/bdump/mydb_ckpt_2909.trc:
ORA-00227: corrupt block detected in control file: (block 1, # blocks 1)
ORA-00202: control file: '/u02/oradata/mydb/controlfile/control04.ctl'
CKPT: terminating instance due to error 227
Instance terminated by CKPT, pid = 2909
总结一下:
1。Control File丢失(当然不是全部丢失啦),系统还是可以checkpoing 和 switch logfile 的。
2。Control File不一致,Oracle马上就挂掉了。
3。Control File破坏掉后,Oracle还是能够坚持一阵子的,select update都没有问题,checkpoint的时候就挂掉了。
posted @ 2009-12-08 12:35  killkill  阅读(3730)  评论(0编辑  收藏  举报