[20250204]21c library cache mutex的深入探究7(_mutex_wait_scheme=2).txt
[20250204]21c library cache mutex的深入探究7(_mutex_wait_scheme=2).txt
--//探究library cache mutex阻塞导致的相关等待事件,分析_mutex_wait_scheme=2的情况的相关细节。
1.环境:
SYS@book> @ver2
==============================
PORT_STRING : x86_64/Linux 2.4.xx
VERSION : 21.0.0.0.0
BANNER : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
BANNER_FULL : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
BANNER_LEGACY : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
CON_ID : 0
PL/SQL procedure successfully completed.
SYS@book> @ hidez ^_mutex
NUM N_HEX NAME DESCRIPTION DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE ISSES ISSYS_MOD
---- ----- ------------------ ----------------- ------------- ------------- ------------ ----- ---------
3553 DE1 _mutex_wait_time Mutex wait time TRUE 1 1 FALSE IMMEDIATE
3554 DE2 _mutex_spin_count Mutex spin count TRUE 255 255 FALSE IMMEDIATE
3555 DE3 _mutex_wait_scheme Mutex wait scheme TRUE 2 2 FALSE IMMEDIATE
--//缺省_mutex_wait_time=1,时间单位与_mutex_wait_scheme相关,_mutex_wait_scheme=2时时间单位是厘秒,而
--//_mutex_wait_scheme=0,1时,单位时毫秒。
--//_mutex_wait_scheme =2时,_mutex_wait_time>1时sleeps的时间会出现指数回退.
--//缺省_mutex_wait_scheme =2.
--//网上找了一段资料:
* _mutex_spin_count (Integer)
- This sets the number of times to spin before yielding/waiting.
* _mutex_wait_scheme (Integer)
- In 11.2 this controls which wait scheme to use. It can be set to one of the three wait schemes described above thus:
_mutex_wait_scheme = 0 – Always YIELD
_mutex_wait_scheme = 1 & _mutex_wait_time = t – Always SLEEP for t milli-seconds
_mutex_wait_scheme = 2 & _mutex_wait_time = t – EXP BACKOFF with maximum sleep (default)
2.测试前准备:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug dump library_cache 4
Statement processed.
$ grep "^Bucket:" /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_10770.trc | head -4
Bucket: #=0 Mutex=0x6cfa1400(1125281431552, 9, 0, 6)
Bucket: #=5 Mutex=0x6cfa14f0(1125281431552, 4, 0, 6)
Bucket: #=17 Mutex=0x6cfa1730(1125281431552, 8, 0, 6)
Bucket: #=39 Mutex=0x6cfa1b50(1125281431552, 4, 0, 6)
--//以上是前面测试的结果,直接找Bucket: #=0 Mutex=0x6cfa1400的语句测试。
--//该sql语句select /*+ 9 */ count(*) from dept where deptno = 93834;的bucket=0.
--//Bucket: #=0 Mutex=0x6cfa1400
--//0x6cfa1400-0x10 = 0x6cfa13f0
3.测试:
--//session 2:
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 6CFA13F0 00000000 6CFA13F0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
--//前面2个8字节等于0x6cfa13f0,说明没有对象使用该library cache mutex。
--//session 1:
SCOTT@book01p> @ spid
==============================
SID : 281
SERIAL# : 36437
PROCESS : 3397
SERVER : DEDICATED
SPID : 3399
PID : 42
P_SERIAL# : 5
KILL_COMMAND : alter system kill session '281,36437' immediate;
PL/SQL procedure successfully completed.
SCOTT@book01p> alter session set session_cached_cursors=0 ;
Session altered.
--//主要目的避免光标缓存,保证每次执行library cache mutex gets增加。
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
COUNT(*)
----------
0
--//执行多次。
SCOTT@book01p> @ hash
HASH_VALUE SQL_ID CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX SQL_EXEC_START SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
3403546624 as3g00v5dw000 0 0 2236899148 cade0000 2025-02-04 14:49:53 16777216
--//session 2:
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000014 00000000 00000000 00000000 611B9E10 00000000
SYS@book> @ sharepool/shp4z as3g00v5dw000 -1
SYS@book> @ pr
==============================
HANDLE_TYPE : parent handle address
KGLHDADR : 00000000611AFD80
KGLHDPAR : 00000000611AFD80
C40 : select /*+ 9 */ count(*) from dept where
KGLHDLMD : 1
KGLHDPMD : 0
KGLHDIVC : 0
KGLOBHD0 : 00000000610A1760
KGLOBHD6 : 00
KGLOBHS0 : 4064
KGLOBHS6 : 0
KGLOBT16 : 0
N0_6_16 : 4064
N20 : 4064
KGLNAHSH : 3403546624
KGLOBT03 : as3g00v5dw000
KGLOBT09 : 65535
PL/SQL procedure successfully completed.
--//父游标句柄地址00000000611AFD80已经写入mutex地址-0x10处。
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000018 00000000 00000000 00000000 611B9E10 00000000
--//可以发现实际上执行shp4z.sql脚本2次,导致gets增加4次,有时候又没有不确定..
4.继续测试:
--//由于目前版本21c不支持oradebug poke内存信息,采用gdb操作。
--//session 3:
SYS@book> @ spid
==============================
SID : 263
SERIAL# : 53123
PROCESS : 3404
SERVER : DEDICATED
SPID : 3405
PID : 86
P_SERIAL# : 2
KILL_COMMAND : alter system kill session '263,53123' immediate;
PL/SQL procedure successfully completed.
--//window 1:
$ rlgdb -f -p 3405
...
(gdb) x /8wx 0x6cfa1400
0x6cfa1400: 0x00000000 0x00000000 0x00000018 0x00000000
0x6cfa1410: 0x00000000 0x00000000 0x611b9e10 0x00000000
--//当前gets=0x18.
(gdb) set *(int *)0x6cfa1404=0x7777
--//注意intel CPU的字节顺序,修改地址是0x6cfa1404。
(gdb) x /8wx 0x6cfa1400
0x6cfa1400: 0x00000000 0x00007777 0x00000018 0x00000000
0x6cfa1410: 0x00000000 0x00000000 0x611b9e10 0x00000000
--//windows 2,跟踪session 1进程:
$ strace -Ttt -y -f -p 3399 2>&1 | tee mutex0204.txt
--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
--//挂起,等3X秒。
--//window 1:
(gdb) set *(int *)0x6cfa1404=0x0
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000019 000009A2 00000000 00000000 611B9E10 00000000
--//对比前面的gets从0x18->0x19. 增加1次,sleeps次数从0x0->0x9a2 = 2466,增加2466次。
--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
COUNT(*)
----------
0
--//执行完成。
5.分析:
--//session 2:
SYS@book> @ ashtop event,p1,p1raw,p2raw,p3raw 1=1 &5min
Total Distinct Distinct Distinct
Seconds AAS %This EVENT P1 P1RAW P2RAW P3RAW FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
--------- ------- ------- ----------------------------- ----- ----------------- ----------------- ----------------- ------------------- ------------------- ---------- -------- -----------
67 .2 42% | 3 2025-02-04 14:58:00 2025-02-04 15:02:34 1 67 67
48 .2 30% | library cache: bucket mutex X 0 0000000000000000 0000777700000000 000000000000003E 2025-02-04 14:59:32 2025-02-04 15:00:19 1 48 48
13 .0 8% | 20 2025-02-04 14:57:47 2025-02-04 15:01:22 1 13 13
11 .0 7% | 100 2025-02-04 14:57:43 2025-02-04 15:02:11 1 11 11
5 .0 3% | LGWR all worker groups 0 0000000000000000 0000000000000000 0000000000000000 2025-02-04 14:59:37 2025-02-04 15:00:26 1 5 5
4 .0 3% | 300 2025-02-04 14:58:50 2025-02-04 14:59:15 1 4 4
3 .0 2% | 0 2025-02-04 14:59:39 2025-02-04 15:00:24 1 3 3
2 .0 1% | control file parallel write 2 0000000000000002 0000000000000003 0000000000000002 2025-02-04 15:02:30 2025-02-04 15:02:36 1 2 2
1 .0 1% | direct path read 1 0000000000000001 000000000000B880 0000000000000040 2025-02-04 15:02:32 2025-02-04 15:02:32 1 1 1
1 .0 1% | log file parallel write 1 0000000000000001 0000000000000001 0000000000000001 2025-02-04 14:59:09 2025-02-04 14:59:09 1 1 1
1 .0 1% | log file parallel write 1 0000000000000001 000000000000000F 0000000000000001 2025-02-04 14:59:37 2025-02-04 14:59:37 1 1 1
1 .0 1% | 63 2025-02-04 14:59:05 2025-02-04 14:59:05 1 1 1
1 .0 1% | 84 2025-02-04 15:02:23 2025-02-04 15:02:23 1 1 1
1 .0 1% | 291 2025-02-04 14:59:24 2025-02-04 14:59:24 1 1 1
1 .0 1% | 65536 2025-02-04 15:00:54 2025-02-04 15:00:54 1 1 1
15 rows selected.
--//P2RAW=0000777700000000.
SYS@book> @ ev_namezpr "library cache: bucket mutex X"
==============================
EVENT# : 379
EVENT_ID : 2601513493
NAME : library cache: bucket mutex X
PARAMETER1 : idn
PARAMETER2 : value
PARAMETER3 : where
WAIT_CLASS_ID : 3875070507
WAIT_CLASS# : 4
WAIT_CLASS : Concurrency
DISPLAY_NAME : library cache: bucket mutex X
CON_ID : 0
PL/SQL procedure successfully completed.
--//p1 hash=0等于bucket桶号,P2 对应mutex值,注意intel cpu的字节顺序问题,00007777在前面。p2 0x3e = 62
--//window 2:
--//按ctrl+c中断strace跟踪。
$ egrep "sched_yield|getrusage" mutex0204.txt
14:59:31.787265 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332819}, ru_stime={0, 27126}, ...}) = 0 <0.000063>
14:59:31.787491 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332902}, ru_stime={0, 27133}, ...}) = 0 <0.000015>
14:59:31.787549 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332911}, ru_stime={0, 27133}, ...}) = 0 <0.000013>
14:59:31.787671 sched_yield() = 0 <0.000227>
14:59:31.788037 sched_yield() = 0 <0.000062>
14:59:33.279533 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 333892}, ru_stime={0, 31204}, ...}) = 0 <0.000035>
14:59:35.291641 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 339971}, ru_stime={0, 32628}, ...}) = 0 <0.000179>
14:59:37.291983 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 344284}, ru_stime={0, 36240}, ...}) = 0 <0.000077>
14:59:39.285997 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 347436}, ru_stime={0, 38249}, ...}) = 0 <0.000076>
14:59:41.301019 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 350147}, ru_stime={0, 40320}, ...}) = 0 <0.000015>
14:59:43.312911 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 353924}, ru_stime={0, 45159}, ...}) = 0 <0.000015>
14:59:45.307157 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 360457}, ru_stime={0, 46785}, ...}) = 0 <0.000031>
14:59:47.310026 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 365011}, ru_stime={0, 49095}, ...}) = 0 <0.000033>
14:59:49.335822 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 367872}, ru_stime={0, 52553}, ...}) = 0 <0.000029>
14:59:51.329267 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 372229}, ru_stime={0, 53946}, ...}) = 0 <0.000028>
14:59:53.331265 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 374290}, ru_stime={0, 57499}, ...}) = 0 <0.000037>
14:59:55.336138 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 376766}, ru_stime={0, 59718}, ...}) = 0 <0.000081>
14:59:57.343453 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 378673}, ru_stime={0, 62024}, ...}) = 0 <0.000028>
14:59:59.340053 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 379916}, ru_stime={0, 63137}, ...}) = 0 <0.000028>
15:00:01.344914 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 384742}, ru_stime={0, 64305}, ...}) = 0 <0.000015>
15:00:03.361060 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 386351}, ru_stime={0, 68952}, ...}) = 0 <0.000044>
15:00:05.369793 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 388195}, ru_stime={0, 73472}, ...}) = 0 <0.000028>
15:00:07.365355 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 388906}, ru_stime={0, 79941}, ...}) = 0 <0.000062>
15:00:09.366864 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 391246}, ru_stime={0, 82596}, ...}) = 0 <0.000026>
15:00:11.368288 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 393473}, ru_stime={0, 86106}, ...}) = 0 <0.000251>
15:00:13.381993 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 394210}, ru_stime={0, 90971}, ...}) = 0 <0.000040>
15:00:15.390105 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 401516}, ru_stime={0, 92994}, ...}) = 0 <0.000028>
15:00:17.390237 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 403435}, ru_stime={0, 96736}, ...}) = 0 <0.000028>
15:00:19.390711 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 403794}, ru_stime={0, 99023}, ...}) = 0 <0.000232>
15:00:20.383125 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 404769}, ru_stime={0, 100092}, ...}) = 0 <0.000026>
15:00:20.384382 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 404975}, ru_stime={0, 100143}, ...}) = 0 <0.000016>
--//注意看时间间隔,开始sched_yield执行2次之后2秒调用1次getrusage。
$ awk '{print $2}' mutex0204.txt |egrep "semtimedop|getrusage|sched_yield|select" | uniq -c|head
3 getrusage(0x1
2 sched_yield()
83 semtimedop(98304,
1 getrusage(0x1
115 semtimedop(98304,
1 getrusage(0x1
123 semtimedop(98304,
1 getrusage(0x1
86 semtimedop(98304,
1 getrusage(0x1
--//可以看出规律,开始2次sched_yield,接着调用semtimedop许多次(每次0.01秒),2秒之后调用1次getrusage,如此重复。
$ grep "semtimedop" mutex0204.txt | head -4
14:59:31.788149 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.020113>
14:59:31.808359 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.009714>
14:59:31.818220 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.010634>
14:59:31.828961 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.017562>
$ grep semtimedop mutex0204.txt | awk '{print $8}'| uniq -c
2464 10000000})
--//调用semtimedop 2464次,加上2次sched_yield(),sleeps合计2466次,与opeek看到的结果0x000009A2=2466一致。
--//{0, 10000000} 前面参数的时间单位是秒,后面参数的时间单位是纳秒。10000000/10^9 = .01,相当于10毫秒,与后面的执行时间一
--//致。感觉虚拟机器有点不稳定,以前我在11g下测试基本都是18X次。
SYS@book> @ mutexprofz id,loc "ts>=trunc(sysdate)+14/24+59/1440+30/86400"
-- MutexProf by Tanel Poder (http://www.tanelpoder.com)
-- Showing profile of top 20 sleeps...
-- column info : id idn hash hash_value=>hash_value ts=>sleep_timestamp
-- req=>requesting_session blk=>blocking_session val=>mutex_value maddr=>mutex_addr
SUM_SLEEPS GETS_DIFF MUTEX_TYPE ID GET_LOCATION OBJECT_NAME
---------- --------- --------------- ---------- ------------------------- --------------------------------------------------------------------------------
2466 Library Cache 0 kglhdgn1 62 (name not found)
3 Cursor Parent 2276283790 kksLoadChild [KKSPRTLOC4] SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=TRUE) */
"OBJ#","CLASS_OID","JOB_FLAGS","RUN_TIME","PRIORITY","JOB_STATUS","RUNNING_INSTA
NCE","SCHEDULE_LIMIT","JOB_WEIGHT","INSTANCE_ID","AFFINITY","DATABASE_ROLE"
FROM "SYS"."SCHEDULER$_JOB_REFRESH" "SCHEDULER$_JOB_REFRESH" WHERE 1=1
1 Library Cache 2276283790 kglhdgn2 106 SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=TRUE) */
"OBJ#","CLASS_OID","JOB_FLAGS","RUN_TIME","PRIORITY","JOB_STATUS","RUNNING_INSTA
NCE","SCHEDULE_LIMIT","JOB_WEIGHT","INSTANCE_ID","AFFINITY","DATABASE_ROLE"
FROM "SYS"."SCHEDULER$_JOB_REFRESH" "SCHEDULER$_JOB_REFRESH" WHERE 1=1
--//记录的sleeps次数也是2466次。GET_LOCATION 对应的数字62 与前面ashtop 看到的p2 0x3e对应,注意id=0表示bucket桶号。
$ ./lookup.awk kglhdgn1
(kglhdgn)1 : kernel generic library cache management object handle get handle, if it doesn't exist, create it ??
4.小结:
--//缺省_mutex_wait_time=1,_mutex_wait_scheme=2的情况下semtimedop的时间是10毫秒。
--//_mutex_wait_scheme=2模式下开始sched_yield 2次,接着调用semtimedop许多次(每次0.01秒),2秒之后调用1次getrusage,如此重复。
--//这次测试_mutex_wait_time=1的情况,单位厘秒.调用semtimedop每次0.01秒,当大于1时sleeps的时间会出现指数回退,另外写一篇
--//验证这样的情况.另外有机会测试_mutex_wait_scheme=0,1的情况。
--//另外关于_mutex_spin_count缺省255,按照前面理解,在yielding/waiting之前应该存在255次spin,不知道理解是否正确,如何验
--//证不是很清楚.
* _mutex_spin_count (Integer)
- This sets the number of times to spin before yielding/waiting.
--//我做了如下尝试,从时间上看不出来。但是可以通过top -p spid发现CPU使用增加。
SYS@book> alter system set "_mutex_spin_count"=65535 scope=memory;
System altered.
--//探究library cache mutex阻塞导致的相关等待事件,分析_mutex_wait_scheme=2的情况的相关细节。
1.环境:
SYS@book> @ver2
==============================
PORT_STRING : x86_64/Linux 2.4.xx
VERSION : 21.0.0.0.0
BANNER : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
BANNER_FULL : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
BANNER_LEGACY : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
CON_ID : 0
PL/SQL procedure successfully completed.
SYS@book> @ hidez ^_mutex
NUM N_HEX NAME DESCRIPTION DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE ISSES ISSYS_MOD
---- ----- ------------------ ----------------- ------------- ------------- ------------ ----- ---------
3553 DE1 _mutex_wait_time Mutex wait time TRUE 1 1 FALSE IMMEDIATE
3554 DE2 _mutex_spin_count Mutex spin count TRUE 255 255 FALSE IMMEDIATE
3555 DE3 _mutex_wait_scheme Mutex wait scheme TRUE 2 2 FALSE IMMEDIATE
--//缺省_mutex_wait_time=1,时间单位与_mutex_wait_scheme相关,_mutex_wait_scheme=2时时间单位是厘秒,而
--//_mutex_wait_scheme=0,1时,单位时毫秒。
--//_mutex_wait_scheme =2时,_mutex_wait_time>1时sleeps的时间会出现指数回退.
--//缺省_mutex_wait_scheme =2.
--//网上找了一段资料:
* _mutex_spin_count (Integer)
- This sets the number of times to spin before yielding/waiting.
* _mutex_wait_scheme (Integer)
- In 11.2 this controls which wait scheme to use. It can be set to one of the three wait schemes described above thus:
_mutex_wait_scheme = 0 – Always YIELD
_mutex_wait_scheme = 1 & _mutex_wait_time = t – Always SLEEP for t milli-seconds
_mutex_wait_scheme = 2 & _mutex_wait_time = t – EXP BACKOFF with maximum sleep (default)
2.测试前准备:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug dump library_cache 4
Statement processed.
$ grep "^Bucket:" /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_10770.trc | head -4
Bucket: #=0 Mutex=0x6cfa1400(1125281431552, 9, 0, 6)
Bucket: #=5 Mutex=0x6cfa14f0(1125281431552, 4, 0, 6)
Bucket: #=17 Mutex=0x6cfa1730(1125281431552, 8, 0, 6)
Bucket: #=39 Mutex=0x6cfa1b50(1125281431552, 4, 0, 6)
--//以上是前面测试的结果,直接找Bucket: #=0 Mutex=0x6cfa1400的语句测试。
--//该sql语句select /*+ 9 */ count(*) from dept where deptno = 93834;的bucket=0.
--//Bucket: #=0 Mutex=0x6cfa1400
--//0x6cfa1400-0x10 = 0x6cfa13f0
3.测试:
--//session 2:
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 6CFA13F0 00000000 6CFA13F0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
--//前面2个8字节等于0x6cfa13f0,说明没有对象使用该library cache mutex。
--//session 1:
SCOTT@book01p> @ spid
==============================
SID : 281
SERIAL# : 36437
PROCESS : 3397
SERVER : DEDICATED
SPID : 3399
PID : 42
P_SERIAL# : 5
KILL_COMMAND : alter system kill session '281,36437' immediate;
PL/SQL procedure successfully completed.
SCOTT@book01p> alter session set session_cached_cursors=0 ;
Session altered.
--//主要目的避免光标缓存,保证每次执行library cache mutex gets增加。
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
COUNT(*)
----------
0
--//执行多次。
SCOTT@book01p> @ hash
HASH_VALUE SQL_ID CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX SQL_EXEC_START SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
3403546624 as3g00v5dw000 0 0 2236899148 cade0000 2025-02-04 14:49:53 16777216
--//session 2:
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000014 00000000 00000000 00000000 611B9E10 00000000
SYS@book> @ sharepool/shp4z as3g00v5dw000 -1
SYS@book> @ pr
==============================
HANDLE_TYPE : parent handle address
KGLHDADR : 00000000611AFD80
KGLHDPAR : 00000000611AFD80
C40 : select /*+ 9 */ count(*) from dept where
KGLHDLMD : 1
KGLHDPMD : 0
KGLHDIVC : 0
KGLOBHD0 : 00000000610A1760
KGLOBHD6 : 00
KGLOBHS0 : 4064
KGLOBHS6 : 0
KGLOBT16 : 0
N0_6_16 : 4064
N20 : 4064
KGLNAHSH : 3403546624
KGLOBT03 : as3g00v5dw000
KGLOBT09 : 65535
PL/SQL procedure successfully completed.
--//父游标句柄地址00000000611AFD80已经写入mutex地址-0x10处。
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000018 00000000 00000000 00000000 611B9E10 00000000
--//可以发现实际上执行shp4z.sql脚本2次,导致gets增加4次,有时候又没有不确定..
4.继续测试:
--//由于目前版本21c不支持oradebug poke内存信息,采用gdb操作。
--//session 3:
SYS@book> @ spid
==============================
SID : 263
SERIAL# : 53123
PROCESS : 3404
SERVER : DEDICATED
SPID : 3405
PID : 86
P_SERIAL# : 2
KILL_COMMAND : alter system kill session '263,53123' immediate;
PL/SQL procedure successfully completed.
--//window 1:
$ rlgdb -f -p 3405
...
(gdb) x /8wx 0x6cfa1400
0x6cfa1400: 0x00000000 0x00000000 0x00000018 0x00000000
0x6cfa1410: 0x00000000 0x00000000 0x611b9e10 0x00000000
--//当前gets=0x18.
(gdb) set *(int *)0x6cfa1404=0x7777
--//注意intel CPU的字节顺序,修改地址是0x6cfa1404。
(gdb) x /8wx 0x6cfa1400
0x6cfa1400: 0x00000000 0x00007777 0x00000018 0x00000000
0x6cfa1410: 0x00000000 0x00000000 0x611b9e10 0x00000000
--//windows 2,跟踪session 1进程:
$ strace -Ttt -y -f -p 3399 2>&1 | tee mutex0204.txt
--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
--//挂起,等3X秒。
--//window 1:
(gdb) set *(int *)0x6cfa1404=0x0
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 611AFD80 00000000 611AFD80 00000000 00000000 00000000 00000019 000009A2 00000000 00000000 611B9E10 00000000
--//对比前面的gets从0x18->0x19. 增加1次,sleeps次数从0x0->0x9a2 = 2466,增加2466次。
--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
COUNT(*)
----------
0
--//执行完成。
5.分析:
--//session 2:
SYS@book> @ ashtop event,p1,p1raw,p2raw,p3raw 1=1 &5min
Total Distinct Distinct Distinct
Seconds AAS %This EVENT P1 P1RAW P2RAW P3RAW FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
--------- ------- ------- ----------------------------- ----- ----------------- ----------------- ----------------- ------------------- ------------------- ---------- -------- -----------
67 .2 42% | 3 2025-02-04 14:58:00 2025-02-04 15:02:34 1 67 67
48 .2 30% | library cache: bucket mutex X 0 0000000000000000 0000777700000000 000000000000003E 2025-02-04 14:59:32 2025-02-04 15:00:19 1 48 48
13 .0 8% | 20 2025-02-04 14:57:47 2025-02-04 15:01:22 1 13 13
11 .0 7% | 100 2025-02-04 14:57:43 2025-02-04 15:02:11 1 11 11
5 .0 3% | LGWR all worker groups 0 0000000000000000 0000000000000000 0000000000000000 2025-02-04 14:59:37 2025-02-04 15:00:26 1 5 5
4 .0 3% | 300 2025-02-04 14:58:50 2025-02-04 14:59:15 1 4 4
3 .0 2% | 0 2025-02-04 14:59:39 2025-02-04 15:00:24 1 3 3
2 .0 1% | control file parallel write 2 0000000000000002 0000000000000003 0000000000000002 2025-02-04 15:02:30 2025-02-04 15:02:36 1 2 2
1 .0 1% | direct path read 1 0000000000000001 000000000000B880 0000000000000040 2025-02-04 15:02:32 2025-02-04 15:02:32 1 1 1
1 .0 1% | log file parallel write 1 0000000000000001 0000000000000001 0000000000000001 2025-02-04 14:59:09 2025-02-04 14:59:09 1 1 1
1 .0 1% | log file parallel write 1 0000000000000001 000000000000000F 0000000000000001 2025-02-04 14:59:37 2025-02-04 14:59:37 1 1 1
1 .0 1% | 63 2025-02-04 14:59:05 2025-02-04 14:59:05 1 1 1
1 .0 1% | 84 2025-02-04 15:02:23 2025-02-04 15:02:23 1 1 1
1 .0 1% | 291 2025-02-04 14:59:24 2025-02-04 14:59:24 1 1 1
1 .0 1% | 65536 2025-02-04 15:00:54 2025-02-04 15:00:54 1 1 1
15 rows selected.
--//P2RAW=0000777700000000.
SYS@book> @ ev_namezpr "library cache: bucket mutex X"
==============================
EVENT# : 379
EVENT_ID : 2601513493
NAME : library cache: bucket mutex X
PARAMETER1 : idn
PARAMETER2 : value
PARAMETER3 : where
WAIT_CLASS_ID : 3875070507
WAIT_CLASS# : 4
WAIT_CLASS : Concurrency
DISPLAY_NAME : library cache: bucket mutex X
CON_ID : 0
PL/SQL procedure successfully completed.
--//p1 hash=0等于bucket桶号,P2 对应mutex值,注意intel cpu的字节顺序问题,00007777在前面。p2 0x3e = 62
--//window 2:
--//按ctrl+c中断strace跟踪。
$ egrep "sched_yield|getrusage" mutex0204.txt
14:59:31.787265 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332819}, ru_stime={0, 27126}, ...}) = 0 <0.000063>
14:59:31.787491 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332902}, ru_stime={0, 27133}, ...}) = 0 <0.000015>
14:59:31.787549 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 332911}, ru_stime={0, 27133}, ...}) = 0 <0.000013>
14:59:31.787671 sched_yield() = 0 <0.000227>
14:59:31.788037 sched_yield() = 0 <0.000062>
14:59:33.279533 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 333892}, ru_stime={0, 31204}, ...}) = 0 <0.000035>
14:59:35.291641 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 339971}, ru_stime={0, 32628}, ...}) = 0 <0.000179>
14:59:37.291983 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 344284}, ru_stime={0, 36240}, ...}) = 0 <0.000077>
14:59:39.285997 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 347436}, ru_stime={0, 38249}, ...}) = 0 <0.000076>
14:59:41.301019 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 350147}, ru_stime={0, 40320}, ...}) = 0 <0.000015>
14:59:43.312911 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 353924}, ru_stime={0, 45159}, ...}) = 0 <0.000015>
14:59:45.307157 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 360457}, ru_stime={0, 46785}, ...}) = 0 <0.000031>
14:59:47.310026 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 365011}, ru_stime={0, 49095}, ...}) = 0 <0.000033>
14:59:49.335822 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 367872}, ru_stime={0, 52553}, ...}) = 0 <0.000029>
14:59:51.329267 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 372229}, ru_stime={0, 53946}, ...}) = 0 <0.000028>
14:59:53.331265 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 374290}, ru_stime={0, 57499}, ...}) = 0 <0.000037>
14:59:55.336138 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 376766}, ru_stime={0, 59718}, ...}) = 0 <0.000081>
14:59:57.343453 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 378673}, ru_stime={0, 62024}, ...}) = 0 <0.000028>
14:59:59.340053 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 379916}, ru_stime={0, 63137}, ...}) = 0 <0.000028>
15:00:01.344914 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 384742}, ru_stime={0, 64305}, ...}) = 0 <0.000015>
15:00:03.361060 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 386351}, ru_stime={0, 68952}, ...}) = 0 <0.000044>
15:00:05.369793 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 388195}, ru_stime={0, 73472}, ...}) = 0 <0.000028>
15:00:07.365355 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 388906}, ru_stime={0, 79941}, ...}) = 0 <0.000062>
15:00:09.366864 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 391246}, ru_stime={0, 82596}, ...}) = 0 <0.000026>
15:00:11.368288 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 393473}, ru_stime={0, 86106}, ...}) = 0 <0.000251>
15:00:13.381993 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 394210}, ru_stime={0, 90971}, ...}) = 0 <0.000040>
15:00:15.390105 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 401516}, ru_stime={0, 92994}, ...}) = 0 <0.000028>
15:00:17.390237 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 403435}, ru_stime={0, 96736}, ...}) = 0 <0.000028>
15:00:19.390711 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 403794}, ru_stime={0, 99023}, ...}) = 0 <0.000232>
15:00:20.383125 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 404769}, ru_stime={0, 100092}, ...}) = 0 <0.000026>
15:00:20.384382 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 404975}, ru_stime={0, 100143}, ...}) = 0 <0.000016>
--//注意看时间间隔,开始sched_yield执行2次之后2秒调用1次getrusage。
$ awk '{print $2}' mutex0204.txt |egrep "semtimedop|getrusage|sched_yield|select" | uniq -c|head
3 getrusage(0x1
2 sched_yield()
83 semtimedop(98304,
1 getrusage(0x1
115 semtimedop(98304,
1 getrusage(0x1
123 semtimedop(98304,
1 getrusage(0x1
86 semtimedop(98304,
1 getrusage(0x1
--//可以看出规律,开始2次sched_yield,接着调用semtimedop许多次(每次0.01秒),2秒之后调用1次getrusage,如此重复。
$ grep "semtimedop" mutex0204.txt | head -4
14:59:31.788149 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.020113>
14:59:31.808359 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.009714>
14:59:31.818220 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.010634>
14:59:31.828961 semtimedop(98304, {{46, -1, 0}}, 1, {0, 10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.017562>
$ grep semtimedop mutex0204.txt | awk '{print $8}'| uniq -c
2464 10000000})
--//调用semtimedop 2464次,加上2次sched_yield(),sleeps合计2466次,与opeek看到的结果0x000009A2=2466一致。
--//{0, 10000000} 前面参数的时间单位是秒,后面参数的时间单位是纳秒。10000000/10^9 = .01,相当于10毫秒,与后面的执行时间一
--//致。感觉虚拟机器有点不稳定,以前我在11g下测试基本都是18X次。
SYS@book> @ mutexprofz id,loc "ts>=trunc(sysdate)+14/24+59/1440+30/86400"
-- MutexProf by Tanel Poder (http://www.tanelpoder.com)
-- Showing profile of top 20 sleeps...
-- column info : id idn hash hash_value=>hash_value ts=>sleep_timestamp
-- req=>requesting_session blk=>blocking_session val=>mutex_value maddr=>mutex_addr
SUM_SLEEPS GETS_DIFF MUTEX_TYPE ID GET_LOCATION OBJECT_NAME
---------- --------- --------------- ---------- ------------------------- --------------------------------------------------------------------------------
2466 Library Cache 0 kglhdgn1 62 (name not found)
3 Cursor Parent 2276283790 kksLoadChild [KKSPRTLOC4] SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=TRUE) */
"OBJ#","CLASS_OID","JOB_FLAGS","RUN_TIME","PRIORITY","JOB_STATUS","RUNNING_INSTA
NCE","SCHEDULE_LIMIT","JOB_WEIGHT","INSTANCE_ID","AFFINITY","DATABASE_ROLE"
FROM "SYS"."SCHEDULER$_JOB_REFRESH" "SCHEDULER$_JOB_REFRESH" WHERE 1=1
1 Library Cache 2276283790 kglhdgn2 106 SELECT /*+ NO_STATEMENT_QUEUING RESULT_CACHE (SYSOBJ=TRUE) */
"OBJ#","CLASS_OID","JOB_FLAGS","RUN_TIME","PRIORITY","JOB_STATUS","RUNNING_INSTA
NCE","SCHEDULE_LIMIT","JOB_WEIGHT","INSTANCE_ID","AFFINITY","DATABASE_ROLE"
FROM "SYS"."SCHEDULER$_JOB_REFRESH" "SCHEDULER$_JOB_REFRESH" WHERE 1=1
--//记录的sleeps次数也是2466次。GET_LOCATION 对应的数字62 与前面ashtop 看到的p2 0x3e对应,注意id=0表示bucket桶号。
$ ./lookup.awk kglhdgn1
(kglhdgn)1 : kernel generic library cache management object handle get handle, if it doesn't exist, create it ??
4.小结:
--//缺省_mutex_wait_time=1,_mutex_wait_scheme=2的情况下semtimedop的时间是10毫秒。
--//_mutex_wait_scheme=2模式下开始sched_yield 2次,接着调用semtimedop许多次(每次0.01秒),2秒之后调用1次getrusage,如此重复。
--//这次测试_mutex_wait_time=1的情况,单位厘秒.调用semtimedop每次0.01秒,当大于1时sleeps的时间会出现指数回退,另外写一篇
--//验证这样的情况.另外有机会测试_mutex_wait_scheme=0,1的情况。
--//另外关于_mutex_spin_count缺省255,按照前面理解,在yielding/waiting之前应该存在255次spin,不知道理解是否正确,如何验
--//证不是很清楚.
* _mutex_spin_count (Integer)
- This sets the number of times to spin before yielding/waiting.
--//我做了如下尝试,从时间上看不出来。但是可以通过top -p spid发现CPU使用增加。
SYS@book> alter system set "_mutex_spin_count"=65535 scope=memory;
System altered.
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
2024-02-19 [20240219]建立完善sql_idx.sh脚本.txt
2019-02-19 [20190219]那个更快(11g).txt