怎样用调试工具Dump Oracle系统状态

http://www.laoxiong.net/how_debugger_dump_oracle_state.html

如果Oracle数据库hang住了,对Oracle做system dump,或做hang analyze,是研究和解决问题的有效办法,至少在提交SR时能够有更多的有用信息。如果能够连接数据库,并能够进行操作,那么用oradebug是简单快捷的办法。

但有的时候,数据库由于hang住,sqlplus不能连接时(在10g可以尝试用sqlplus -prelim连接数据库),可以使用操作系统上的调试工具来dump oracle系统状态。在记一次Oracle数据库无响应(hang住)故障的处理一文中,就曾使用dbx做systemstate dump,并发现问题所在,并最终解决了问题。下面是当时用dbx做dump的过程:

 

 1 # dbx -a 446910
 2 Waiting to attach to process 446910 3 Successfully attached to oracle.
 4 Type ‘help’ for help.
 5 reading symbolic information …
 6 stopped in iosl.select at 0×9000000000c94d8 ($t2)
 7 0×9000000000c94d8 (select+0xfffffffffff06318) e8410028 ld r2,0×28(r1)
 8 (dbx) print ksudss(10)
 9 Segmentation fault in slrac at 0×100083aa0 ($t2)
10 0×100083aa0 (slrac+0xe4) 88030000 lbz r0,0×0(r3)
11 (dbx) detach

从上面可以看到,使用dbx做dump的过程为:

  • 找到有异常的进程号,比如CPU非常高,HANG住的进程等。如果做系统范围的systemstate dump,可以是其他的进程。
  • dbx -a < 进程号>
  • print ksudss(10) --这里是直接调用ORACLE程序中的ksudss函数,dump level为10,就等同于在sqlplus 中用oradebug dump systemstate 10
  • detach
  • quit

在LINUX下可以使用gdb,下面是一个例子:

 1 [oracle@xty ~]$ ps -ef | grep LOCAL
 2 oracle 3765 3764 1 05:55 ? 00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
 3 oracle 3767 3668 0 05:55 pts/2 00:00:00 grep LOCAL
 4 [oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3765
 5 GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
 6 Copyright 2004 Free Software Foundation, Inc.
 7 GDB is free software, covered by the GNU General Public License, and you are
 8 welcome to change it and/or distribute copies of it under certain conditions.
 9 Type “show copying” to see the conditions.
10 There is absolutely no warranty for GDB. Type “show warranty” for details.
11 This GDB was configured as “i386-redhat-linux-gnu”…(no debugging symbols found)…Using host libthread_db library “/lib/tls/libthread_db.so.1″.
12 Attaching to program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765
13 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so…(no debugging symbols found)…done.
14 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxp10.so
15 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so…done.
16 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libhasgen10.so
17 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so…done.
18 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libskgxn2.so
19 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so…done.
20 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocr10.so
21 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so…done.
22 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrb10.so
23 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so…done.
24 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libocrutl10.so
25 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so…done.
26 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libjox10.so
27 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so…done.
28 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libclsra10.so
29 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so…done.
30 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libdbcfg10.so
31 Reading symbols from /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so…done.
32 Loaded symbols for /u01/app/oracle/product/10.1.0/db_1/lib/libnnz10.so
33 Reading symbols from /usr/lib/libaio.so.1…done.
34 Loaded symbols for /usr/lib/libaio.so.1
35 Reading symbols from /lib/libdl.so.2…done.
36 Loaded symbols for /lib/libdl.so.2
37 Reading symbols from /lib/tls/libm.so.6…done.
38 Loaded symbols for /lib/tls/libm.so.6
39 Reading symbols from /lib/tls/libpthread.so.0…done.
40 [Thread debugging using libthread_db enabled]
41 [New Thread -1219938624 (LWP 3765)]
42 Loaded symbols for /lib/tls/libpthread.so.0
43 Reading symbols from /lib/libnsl.so.1…done.
44 Loaded symbols for /lib/libnsl.so.1
45 Reading symbols from /lib/tls/libc.so.6…done.
46 Loaded symbols for /lib/tls/libc.so.6
47 Reading symbols from /lib/ld-linux.so.2…done.
48 Loaded symbols for /lib/ld-linux.so.2
49 Reading symbols from /lib/libnss_files.so.2…done.
50 Loaded symbols for /lib/libnss_files.so.2
51 0×006967a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
52 (gdb) print ksudss(10)
53 [Switching to Thread -1219938624 (LWP 3765)]
54 $1 = 213658428
55 (gdb) detach
56 Detaching from program: /u01/app/oracle/product/10.1.0/db_1/bin/oracle, process 3765
57 (gdb) quit

然后我们可以找到有dump结果的trace文件:

 

[oracle@xty ~]$ cd $ORACLE_BASE/admin/xty/udump
[oracle@xty udump]$ ls -lrt | grep 3765
-rw-r----- 1 oracle oinstall 599705 Nov 21 05:56 xty_ora_3765.trc

根据debugger工具attach进程的不同,trace文件一般在user_dump_dest或background_dump_dest目录下。

在LINUX下用gdb,在AIX下用dbx,那么在HP-UX下呢,可以用HP的wdb(可以到HP WDB查看HP WDB的详细信息和下载最新的版本。在solaris上,也会有dbx或gdb(各个平台有多种不同的debugger,其他还有adb,mdb等等)。有兴趣的朋友可以用用。

除了上面提到的systemstate dump,还能不能够做其他的dump?答案是肯定的,以下是一些dump相关的函数:

print ksdhng(3,1,0) 相当于oradebug hanganalyze 3
print ksudps(10) 相当于oradebug dump processstate 10
print curdmp() 相当于oradebug call curdmp(也就是oradebug dump cursordump)
print ksdtrc(4) 相当于oradebug dump events 4(这里参数表示level,1--session,2--process,4--system)

以上列出的,不一定对处理HANG有意义,只是这里觉得有些意思。^_^.其他还有意思的包括:

print ksdsel(10046,12) --相当于为attach的进程设置10046事件level 12
print skdxipc() --相当于oradebug ipc
print skdxprst() --相当于oradebug procstat

注意:不要在正常运行的生产系统上运行和测试。

当然,如果能用oradebug,那么就用oradebug,毕竟方便很多,也更安全。这里只是对使用debugger做dump一些扩充,供有兴趣研究的朋友参考。

补充:在HP-UX上使用gdb进行system dump,可能会更复杂,可以参考Oracle Metalink Doc 273324.1

posted @ 2013-03-06 12:10  taowang2016  阅读(279)  评论(0编辑  收藏  举报