[原]记一次处理Oracle死会话的过程

今天检查Oracle的时候发现有一台Oracle的CPU占用率不太正常,现象是某一个进程会话占用很多CPU而且占用时间很长:

[root@mailserver ~]# top -u oracle 
top - 22:24:54 up 186 days,  6:50,  4 users,  load average: 2.57, 2.53, 2.33
Tasks: 179 total,   2 running, 177 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.3% us,  0.0% sy,  0.0% ni, 98.3% id,  1.3% wa,  0.0% hi,  0.0% si
Cpu1  : 16.9% us,  6.0% sy,  0.0% ni,  0.0% id, 77.1% wa,  0.0% hi,  0.0% si
Cpu2  : 73.8% us, 24.5% sy,  0.0% ni,  1.7% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu3  :  2.7% us,  0.3% sy,  0.0% ni, 94.0% id,  3.0% wa,  0.0% hi,  0.0% si
Mem:   8165004k total,  8124392k used,    40612k free,     7520k buffers
Swap:  2031608k total,   113680k used,  1917928k free,  6181492k cached

  PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S COMMAND
21752 oracle    25   0   99  37633:19 12.2 2187m 970m 963m R oracle 
27111 oracle    15   0    4   3720:27  4.1 2205m 324m 307m D oracle 
 2045 oracle    16   0    0   0:01.29  0.8 2184m  67m  61m S oracle 
 5456 oracle    16   0    0   0:00.04  0.2 2182m  16m  12m S oracle 
 6057 oracle    16   0    0   4:30.34  0.1 65188   9m 6236 S tnslsnr

从top可以看到PID为21752的进程占用了很多的CPU,而运行的时间很长了。

使用以下语句找出对应的会话:

sys$mydb@mailserver SQL> l
  1  select ses.sid,ses.serial#,ses.username
  2  from v$session ses,v$process pro
  3* where pro.spid=&spid and ses.paddr=pro.addr
sys$mydb@mailserver SQL> /
Enter value for spid: 21752
old   3: where pro.spid=&spid and ses.paddr=pro.addr
new   3: where pro.spid=21752 and ses.paddr=pro.addr

       SID    SERIAL# USERNAME
---------- ---------- ------------------------------
       129      12900 DCB_USER

Elapsed: 00:00:00.01

我们看看这个会话在搞什么飞机,做个10046的跟踪吧。

先到 bdump 目录中将一些无用的日志(*.trc)文件删除。

exec sys.dbms_system.set_sql_trace_in_session(129,12900,true); 
-- 等一段时间
exec sys.dbms_system.set_sql_trace_in_session(129,12900,false); 

郁闷的是竟然没有生成一个trc文件。

再看看该会话正在跑的SQL:

sys$mydb@mailserver SQL> SELECT /*+ ORDERED*/
REPLACE(SQL_TEXT, CHR(13), CHR(10)||CHR(13)) 
  3  FROM v$sqltext a
  4  WHERE (a.HASH_VALUE,a.ADDRESS) IN (
  5           SELECT decode(sql_hash_value,
  6                         0,prev_hash_value,
  7                         sql_hash_value
  8                         ),
  9                  decode(sql_hash_value,0,prev_sql_addr,sql_address)
 10           FROM v$session b
 11           where b.sid=&sid and b.serial#=&serial
 12  )
 13  /
Enter value for sid: 129
Enter value for serial: 12900
old  11:          where b.sid=&sid and b.serial#=&serial
new  11:          where b.sid=129 and b.serial#=12900

REPLACE(SQL_TEXT,CHR(13),CHR(10)||CHR(13))
----------------------------------------------------------------------------------
edball_33_7 where id=:i    ) ) group by vid having count(*)>=6
 from  (select vid from redball where num in ( select num from r
insert into redball_a(id, vid, total) select :id , vid, count(*)

这个是我同事写的一段SQL,嵌套在一个很大的循环中跑的,据说差不多一个月前就已经停止了,看来真是传说中的死而不僵啊。

下一步就是把这个会话kill掉。

sys$mydb@mailserver SQL> alter system kill session '&sid,&serial' ; 
Enter value for sid: 129
Enter value for serial: 12900
old   1: alter system kill session '&sid,&serial'
new   1: alter system kill session '129,12900'

System altered.

Elapsed: 00:00:01.02

再用top看一下:

[root@mailserver ~]# top -u oracle
top - 22:54:55 up 186 days,  7:20,  4 users,  load average: 0.63, 0.96, 1.23
Tasks: 186 total,   1 running, 185 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  :  1.7% us,  0.0% sy,  0.0% ni, 90.0% id,  8.3% wa,  0.0% hi,  0.0% si
Cpu2  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu3  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   8165004k total,  8102144k used,    62860k free,    16052k buffers
Swap:  2031608k total,   113440k used,  1918168k free,  6208748k cached

  PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S COMMAND 
 2045 oracle    16   0    0   0:01.29  0.8 2184m  67m  61m S oracle  
 5456 oracle    16   0    0   0:00.04  0.2 2182m  16m  12m S oracle  
 6057 oracle    16   0    0   4:30.39  0.1 65188   9m 6236 S tnslsnr 
 6652 oracle    16   0    0   0:59.26  0.1  7800 6224 1556 S perl   

整个世界都安静了。

posted @ 2010-06-21 23:08  killkill  阅读(1358)  评论(0编辑  收藏  举报