项目中的一个dead lock分析

发现存在deadlock，并且发现一个有趣的现象，在trace文件中有如下片断（blocker和waiter具有相同的session id）
Deadlock graph:
                       ---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name          process session holds waits process session holds waits
TX-00050006-00008b95        24     105     X             24     105           X
session 105: DID 0001-0018-00076390    session 105: DID 0001-0018-00076390
Rows waited on:
Session 105: obj - rowid = 0005E3B8 - AABeO4AAFAAAN2oAAp
(dictionary objn - 385976, file - 5, block - 56744, slot - 41)

    从这个地方发现，这个deadlock是发生在一个session中的，和自己的理解有差异。以前一直认为，
deadlock应该存在不同的session中，这样才会有两个事务产生依赖。如果在一个session中，不会
同时存在两个不同的事务。一般情况下，trace文件应该是下面这样的（blocker和waiter是不同的）
Deadlock graph:
                       ---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name          process session holds waits process session holds waits
TX-00050025-000125d5        64     101     X             72     151           X
TX-0009005e-000124aa        72     151     X             64     101           X
session 101: DID 0001-0040-0000092F     session 151: DID 0001-0048-000005B9
session 151: DID 0001-0048-000005B9     session 101: DID 0001-0040-0000092F
Rows waited on:
Session 151: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAB
(dictionary objn - 1364748, file - 7, block - 21011, slot - 1)
Session 101: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAA
(dictionary objn - 1364748, file - 7, block - 21011, slot - 0)

    经过分析和测试，发现原来是自治事务产生了另一个事务。于是写了一段代码来模拟
------------------------------------------------------------------------------
drop table ttttt;
create table ttttt(x int, y int);
truncate table ttttt;
insert into ttttt select rownum, rownum from dual connect by rownum <=5;

-- create an autonomous transaction procedure
create or replace procedure ap(p int) is
pragma autonomous_transaction;
begin
update ttttt set y = 3 where x = 2;
update ttttt set y = 5 where x = 1;

commit;
end;
/

-----------------------------------------------------------------------
-- demonstrate a deadlock causeb by autonomous transaction
-----------------------------------------------------------------------
begin
dbms_application_info.set_module('Demonstrate','Deadlock with AT');

-- begin the parent transaction
update ttttt set y = 4 where x = 1;

-- TAG_1

-- here call the AP, and a child transacion will be spawned
declare
    p_sid integer;
begin
    ap(p_sid);
end;

-- update row 2
-- NOTICE: since parent trnasaction waits for lock held by AP on row 2,
-- and AP waits for lock held by parent transaction on row 1
-- TAG_2
update ttttt set y = 4 where x = 2;

rollback;
--commit;
end;
/

    结果和我预料的一样，执行到TAG_2的地方，产生了deadlock，并且trace文件也显示这个
session把自己block了。
Deadlock graph:
                       ---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name          process session holds waits process session holds waits
TX-0008004a-00012451        69     151     X             69     151           X

    在继续深入的分析一下，在我们的实际环境中使用了Oracle Workflow，自治过程作为一个独立的
任务结点，“应该”在前一个任务commit之后在进行调用。这个“应该”表示在测试代码的TAG_1部分，
会有一个OWF控制的commit操作。通过了解Oracle的事务处理机制，我们知道当这个操作存在的时候，
测试代码是不会发生自己block自己的deadlock。
    既然发生了这个deadlock，那么可能的情况是什么呢？我们讨论后认为有下面几个可能
    1。OWF并没有在每个任务结点后进行commit操作
    2。OWF因为存在Defer的控制机制，在“应该”commit的地方没有立刻执行，而是执行了后面的代码。

    我对OWF几乎没有什么概念，于是请教了Brave。他的理解是，OWF会在每个结点后执行commit，但是
可能因为负载的原因，把这个操作延后（上面的情况2）。我们可以通过配置参数的方式来控制defer
的行为，但是没有办法强制OWF在每个结点后进行commit。

    问题到了这里已经比较清楚了，解决的方案也很简单，在适当的地方(TAG_1)加一个commit就行了。但
是衍生出来的一个问题是，我们是否应该在每个功能模块的结尾都进行commit？目前RPS的通常做法是，
在PL/SQL中不进行提交，而依赖OWF来进行提交。

-------------------------------------------------------------------------
-- result in trace file for reference
/*
/apps/oracle/admin/data/udump/data_ora_5043.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /apps/oracle/product/10.2.0/db_1
System name:    Linux
Node name:      ecwise03
Release:        2.6.9-22.ELsmp
Version:        #1 SMP Mon Sep 19 18:32:14 EDT 2005
Machine:        i686
Instance name: data
Redo thread mounted by this instance: 1
Oracle process number: 69
Unix process pid: 5043, image: oracledata@ecwise03

*** ACTION NAME:(Deadlock with AT) 2007-11-07 15:40:08.895
*** MODULE NAME:(Demonstrate) 2007-11-07 15:40:08.895
*** SERVICE NAME:(data) 2007-11-07 15:40:08.895
*** SESSION ID:(151.53110) 2007-11-07 15:40:08.895
DEADLOCK DETECTED
[Transaction Deadlock]
Current SQL statement for this session:
UPDATE TTTTT SET Y = 5 WHERE X = 1
----- PL/SQL Call Stack -----
object      line object
handle    number name
0x3eadf55c         7 procedure T2_1_11_29_A.AP
0x3eb0b0b8        16 anonymous block
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name          process session holds waits process session holds waits
TX-0008004a-00012451        69     151     X             69     151           X
session 151: DID 0001-0045-00001C86     session 151: DID 0001-0045-00001C86
Rows waited on:
Session 151: obj - rowid = 0014D30C - AAFNN5AAHAAAFITAAA
(dictionary objn - 1364748, file - 7, block - 21011, slot - 0)
Information on the OTHER waiting sessions:
End of information on OTHER waiting sessions.
*/

posted on 2011-12-21 13:52 wait4friend 阅读(405) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

wait4friend

项目中的一个dead lock分析

导航

公告