[原]记一次处理Oracle死会话的过程
今天检查Oracle的时候发现有一台Oracle的CPU占用率不太正常,现象是某一个进程会话占用很多CPU而且占用时间很长:
[root@mailserver ~]# top -u oracle top - 22:24:54 up 186 days, 6:50, 4 users, load average: 2.57, 2.53, 2.33 Tasks: 179 total, 2 running, 177 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3% us, 0.0% sy, 0.0% ni, 98.3% id, 1.3% wa, 0.0% hi, 0.0% si Cpu1 : 16.9% us, 6.0% sy, 0.0% ni, 0.0% id, 77.1% wa, 0.0% hi, 0.0% si Cpu2 : 73.8% us, 24.5% sy, 0.0% ni, 1.7% id, 0.0% wa, 0.0% hi, 0.0% si Cpu3 : 2.7% us, 0.3% sy, 0.0% ni, 94.0% id, 3.0% wa, 0.0% hi, 0.0% si Mem: 8165004k total, 8124392k used, 40612k free, 7520k buffers Swap: 2031608k total, 113680k used, 1917928k free, 6181492k cached PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND 21752 oracle 25 0 99 37633:19 12.2 2187m 970m 963m R oracle 27111 oracle 15 0 4 3720:27 4.1 2205m 324m 307m D oracle 2045 oracle 16 0 0 0:01.29 0.8 2184m 67m 61m S oracle 5456 oracle 16 0 0 0:00.04 0.2 2182m 16m 12m S oracle 6057 oracle 16 0 0 4:30.34 0.1 65188 9m 6236 S tnslsnr
从top可以看到PID为21752的进程占用了很多的CPU,而运行的时间很长了。
使用以下语句找出对应的会话:
sys$mydb@mailserver SQL> l 1 select ses.sid,ses.serial#,ses.username 2 from v$session ses,v$process pro 3* where pro.spid=&spid and ses.paddr=pro.addr sys$mydb@mailserver SQL> / Enter value for spid: 21752 old 3: where pro.spid=&spid and ses.paddr=pro.addr new 3: where pro.spid=21752 and ses.paddr=pro.addr SID SERIAL# USERNAME ---------- ---------- ------------------------------ 129 12900 DCB_USER Elapsed: 00:00:00.01
我们看看这个会话在搞什么飞机,做个10046的跟踪吧。
先到 bdump 目录中将一些无用的日志(*.trc)文件删除。
exec sys.dbms_system.set_sql_trace_in_session(129,12900,true); -- 等一段时间 exec sys.dbms_system.set_sql_trace_in_session(129,12900,false);
郁闷的是竟然没有生成一个trc文件。
再看看该会话正在跑的SQL:
sys$mydb@mailserver SQL> SELECT /*+ ORDERED*/ REPLACE(SQL_TEXT, CHR(13), CHR(10)||CHR(13)) 3 FROM v$sqltext a 4 WHERE (a.HASH_VALUE,a.ADDRESS) IN ( 5 SELECT decode(sql_hash_value, 6 0,prev_hash_value, 7 sql_hash_value 8 ), 9 decode(sql_hash_value,0,prev_sql_addr,sql_address) 10 FROM v$session b 11 where b.sid=&sid and b.serial#=&serial 12 ) 13 / Enter value for sid: 129 Enter value for serial: 12900 old 11: where b.sid=&sid and b.serial#=&serial new 11: where b.sid=129 and b.serial#=12900 REPLACE(SQL_TEXT,CHR(13),CHR(10)||CHR(13)) ---------------------------------------------------------------------------------- edball_33_7 where id=:i ) ) group by vid having count(*)>=6 from (select vid from redball where num in ( select num from r insert into redball_a(id, vid, total) select :id , vid, count(*)
这个是我同事写的一段SQL,嵌套在一个很大的循环中跑的,据说差不多一个月前就已经停止了,看来真是传说中的死而不僵啊。
下一步就是把这个会话kill掉。
sys$mydb@mailserver SQL> alter system kill session '&sid,&serial' ; Enter value for sid: 129 Enter value for serial: 12900 old 1: alter system kill session '&sid,&serial' new 1: alter system kill session '129,12900' System altered. Elapsed: 00:00:01.02
再用top看一下:
[root@mailserver ~]# top -u oracle top - 22:54:55 up 186 days, 7:20, 4 users, load average: 0.63, 0.96, 1.23 Tasks: 186 total, 1 running, 185 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 1.7% us, 0.0% sy, 0.0% ni, 90.0% id, 8.3% wa, 0.0% hi, 0.0% si Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 8165004k total, 8102144k used, 62860k free, 16052k buffers Swap: 2031608k total, 113440k used, 1918168k free, 6208748k cached PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND 2045 oracle 16 0 0 0:01.29 0.8 2184m 67m 61m S oracle 5456 oracle 16 0 0 0:00.04 0.2 2182m 16m 12m S oracle 6057 oracle 16 0 0 4:30.39 0.1 65188 9m 6236 S tnslsnr 6652 oracle 16 0 0 0:59.26 0.1 7800 6224 1556 S perl
整个世界都安静了。