人为删除控制文件故障模拟
2016-05-14 16:26 abce 阅读(330) 评论(0) 编辑 收藏 举报对于linux和unix环境,当前数据处于run的时候,某个controlfile人为删除是不影响数据库运行的,如下:
#### 删除controlfile
$ rm control01.ctl
删除后,alert日志并没有报错,数据库正常运行
在数据库执行以下操作:
SQL> alter system checkpoint; SQL> alter system switch logfile; SQL> alter system switch logfile; SQL> alter system switch logfile;
alert日志对应的内容,数据库仍然能正常运行:
Sat May 14 07:52:39 2016 Thread 1 advanced to log sequence 64 (LGWR switch) Current log# 1 seq# 64 mem# 0: /u01/app/oracle/oradata/db11/redo01.log Thread 1 advanced to log sequence 65 (LGWR switch) Current log# 2 seq# 65 mem# 0: /u01/app/oracle/oradata/db11/redo02.log Thread 1 advanced to log sequence 66 (LGWR switch) Current log# 3 seq# 66 mem# 0: /u01/app/oracle/oradata/db11/redo03.log
因为其进程持有的句柄并有释放,如下:
$ ps -ef|grep ckpt|grep -v grep ora11 4616 1 0 07:51 ? 00:00:00 ora_ckpt_db11 $ cd /proc/4616/fd $ ls -ltr |grep control lrwx------ 1 ora11 oinstall 64 May 14 07:55 257 -> /u01/app/oracle/oradata/db11/control02.ctl lrwx------ 1 ora11 oinstall 64 May 14 07:55 256 -> /u01/app/oracle/oradata/db11/control01.ctl (deleted)
#### session 1 trace跟踪
$ strace -fr -o /tmp/4616.log -p 4616 Process 4616 attached - interrupt to quit 进程会一直hang在这个状态
#### session 2 进行redo切换
SQL> alter system switch logfile; SQL> alter system switch logfile; 日志切换正常完成 Sat May 14 07:58:33 2016 Thread 1 advanced to log sequence 67 (LGWR switch) Current log# 1 seq# 67 mem# 0: /u01/app/oracle/oradata/db11/redo01.log Thread 1 advanced to log sequence 68 (LGWR switch) Current log# 2 seq# 68 mem# 0: /u01/app/oracle/oradata/db11/redo02.log
#### 终止session 1 trace跟踪(crtl+c)
$ strace -fr -o /tmp/4616.log -p 4616 Process 4616 attached - interrupt to quit Process 4616 detached
#### 下面观察session 1产生的日志/tmp/4616.log
... 4616 0.000036 gettimeofday({1463183881, 895560}, NULL) = 0 4616 0.000035 pwrite(256, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4\214C\0\0\2\0\0\0\0\0\0\0\32\0\0\0"..., 16384, 49152) = 16384 4616 0.040894 gettimeofday({1463183881, 936492}, NULL) = 0 4616 0.000044 gettimeofday({1463183881, 936533}, NULL) = 0 4616 0.000079 pwrite(257, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4\214C\0\0\2\0\0\0\0\0\0\0\32\0\0\0"..., 16384, 49152) = 16384 4616 0.003029 gettimeofday({1463183881, 939643}, NULL) = 0 4616 0.000042 gettimeofday({1463183881, 939697}, NULL) = 0 4616 0.000057 gettimeofday({1463183881, 939740}, NULL) = 0 4616 0.000071 gettimeofday({1463183881, 939815}, NULL) = 0 4616 0.000076 gettimeofday({1463183881, 939888}, NULL) = 0 4616 0.000035 gettimeofday({1463183881, 939922}, NULL) = 0 4616 0.000038 pread(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\212\343\0\0\0\0\0\0\0\4 \v~\227\300U"..., 16384, 16384) = 16384 ...
其中:
4616是对应的进程号
第二列是时间,如0.000036
在看下面这行:
pread(256, "\25\302\0\0\1\0\0\0\0\0\0\0\0\0\1\4\212\343\0\0\0\0\0\0\0\4 \v~\227\300U"..., 16384, 16384) = 16384
256表示文件描述符
$ ls -ltr |grep control lrwx------ 1 ora11 oinstall 64 May 14 07:55 257 -> /u01/app/oracle/oradata/db11/control02.ctl lrwx------ 1 ora11 oinstall 64 May 14 07:55 256 -> /u01/app/oracle/oradata/db11/control01.ctl (deleted)
第一个16384表示块大小 第二个16384表示偏移量 第三个16384表示写入数据的大小
通过上面的进程跟踪,我们可以得到什么:
1. 进程信息可以在/proc下看到,例如: /proc/4616/stat
2. 对于linux,对于文件的读写,是通过调用函数read,pwrite64 来实现的。
3. 对于pwrite64的操作,是通过写fd (256,257)2个文件来完成的