项目中的一个dead lock分析

发现存在deadlock,并且发现一个有趣的现象,在trace文件中有如下片断(blocker和waiter具有相同的session id)
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00050006-00008b95        24     105     X             24     105           X
session 105: DID 0001-0018-00076390    session 105: DID 0001-0018-00076390
Rows waited on:
Session 105: obj - rowid = 0005E3B8 - AABeO4AAFAAAN2oAAp
  (dictionary objn - 385976, file - 5, block - 56744, slot - 41)

    从这个地方发现,这个deadlock是发生在一个session中的,和自己的理解有差异。以前一直认为,
deadlock应该存在不同的session中,这样才会有两个事务产生依赖。如果在一个session中,不会
同时存在两个不同的事务。一般情况下,trace文件应该是下面这样的(blocker和waiter是不同的)
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00050025-000125d5        64     101     X             72     151           X
TX-0009005e-000124aa        72     151     X             64     101           X
session 101: DID 0001-0040-0000092F     session 151: DID 0001-0048-000005B9
session 151: DID 0001-0048-000005B9     session 101: DID 0001-0040-0000092F
Rows waited on:
Session 151: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAB
  (dictionary objn - 1364748, file - 7, block - 21011, slot - 1)
Session 101: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAA
  (dictionary objn - 1364748, file - 7, block - 21011, slot - 0)
   
    经过分析和测试,发现原来是自治事务产生了另一个事务。于是写了一段代码来模拟
------------------------------------------------------------------------------
drop table ttttt;
create table ttttt(x int, y int);
truncate table ttttt;
insert into ttttt select rownum, rownum from dual connect by rownum <=5;

-- create an autonomous transaction procedure
create or replace procedure ap(p int) is
  pragma autonomous_transaction;
begin
  update ttttt set y = 3 where x = 2;
  update ttttt set y = 5 where x = 1;

  commit;
end;
/

-----------------------------------------------------------------------
-- demonstrate a deadlock causeb by autonomous transaction
-----------------------------------------------------------------------
begin
  dbms_application_info.set_module('Demonstrate','Deadlock with AT');
 
  -- begin the parent transaction
  update ttttt set y = 4 where x = 1;

  -- TAG_1
 
  -- here call the AP, and a child transacion will be spawned
  declare
    p_sid integer;
  begin
    ap(p_sid);
  end;
   
  -- update row 2
  -- NOTICE: since parent trnasaction waits for lock held by AP on row 2,
  -- and AP waits for lock held by parent transaction on row 1
  -- TAG_2
  update ttttt set y = 4 where x = 2;

  rollback;
  --commit;
end;
/

    结果和我预料的一样,执行到TAG_2的地方,产生了deadlock,并且trace文件也显示这个
session把自己block了。
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-0008004a-00012451        69     151     X             69     151           X

    在继续深入的分析一下,在我们的实际环境中使用了Oracle Workflow,自治过程作为一个独立的
任务结点,“应该”在前一个任务commit之后在进行调用。这个“应该”表示在测试代码的TAG_1部分,
会有一个OWF控制的commit操作。通过了解Oracle的事务处理机制,我们知道当这个操作存在的时候,
测试代码是不会发生自己block自己的deadlock。
    既然发生了这个deadlock,那么可能的情况是什么呢?我们讨论后认为有下面几个可能
    1。OWF并没有在每个任务结点后进行commit操作
    2。OWF因为存在Defer的控制机制,在“应该”commit的地方没有立刻执行,而是执行了后面的代码。
   
    我对OWF几乎没有什么概念,于是请教了Brave。他的理解是,OWF会在每个结点后执行commit,但是
可能因为负载的原因,把这个操作延后(上面的情况2)。我们可以通过配置参数的方式来控制defer
的行为,但是没有办法强制OWF在每个结点后进行commit。

    问题到了这里已经比较清楚了,解决的方案也很简单,在适当的地方(TAG_1)加一个commit就行了。但
是衍生出来的一个问题是,我们是否应该在每个功能模块的结尾都进行commit?目前RPS的通常做法是,
在PL/SQL中不进行提交,而依赖OWF来进行提交。


-------------------------------------------------------------------------
-- result in trace file for reference
/*
/apps/oracle/admin/data/udump/data_ora_5043.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /apps/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      ecwise03
Release:        2.6.9-22.ELsmp
Version:        #1 SMP Mon Sep 19 18:32:14 EDT 2005
Machine:        i686
Instance name: data
Redo thread mounted by this instance: 1
Oracle process number: 69
Unix process pid: 5043, image: oracledata@ecwise03

*** ACTION NAME:(Deadlock with AT) 2007-11-07 15:40:08.895
*** MODULE NAME:(Demonstrate) 2007-11-07 15:40:08.895
*** SERVICE NAME:(data) 2007-11-07 15:40:08.895
*** SESSION ID:(151.53110) 2007-11-07 15:40:08.895
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
UPDATE TTTTT SET Y = 5 WHERE X = 1
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0x3eadf55c         7  procedure T2_1_11_29_A.AP
0x3eb0b0b8        16  anonymous block
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-0008004a-00012451        69     151     X             69     151           X
session 151: DID 0001-0045-00001C86     session 151: DID 0001-0045-00001C86
Rows waited on:
Session 151: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAA
  (dictionary objn - 1364748, file - 7, block - 21011, slot - 0)
Information on the OTHER waiting sessions:
End of information on OTHER waiting sessions.
*/

posted on 2011-12-21 13:52  wait4friend  阅读(405)  评论(0编辑  收藏  举报