[20250205]21c library cache mutex的深入探究9(_mutex_wait_scheme=0).txt

[20250205]21c library cache mutex的深入探究9(_mutex_wait_scheme=0).txt

--//探究library cache mutex阻塞导致的相关等待事件,分析_mutex_wait_scheme=0的情况的相关细节。
--//补充说明:注意测试前关闭resmgr cpu quantum,不然getrusage时间间隔不是2秒。

1.环境:
SCOTT@book01p> @ ver2
==============================
PORT_STRING                   : x86_64/Linux 2.4.xx
VERSION                       : 21.0.0.0.0
BANNER                        : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
BANNER_FULL                   : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
BANNER_LEGACY                 : Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
CON_ID                        : 0
PL/SQL procedure successfully completed.

SYS@book> @ hidez ^_mutex
 NUM N_HEX NAME               DESCRIPTION       DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE ISSES ISSYS_MOD
---- ----- ------------------ ----------------- ------------- ------------- ------------ ----- ---------
3553   DE1 _mutex_wait_time   Mutex wait time   TRUE          1             1            FALSE IMMEDIATE
3554   DE2 _mutex_spin_count  Mutex spin count  TRUE          255           255          FALSE IMMEDIATE
3555   DE3 _mutex_wait_scheme Mutex wait scheme TRUE          2             2            FALSE IMMEDIATE
--//缺省_mutex_wait_time=1,时间单位与_mutex_wait_scheme相关,_mutex_wait_scheme=2时时间单位是厘秒,而
--//_mutex_wait_scheme=0,1时,单位时毫秒。
--//_mutex_wait_scheme =2时,_mutex_wait_time>1时sleeps的时间会出现指数回退.

--//缺省_mutex_wait_scheme =2.

--//网上找了一段资料:
* _mutex_spin_count (Integer)
- This sets the number of times to spin before yielding/waiting.

* _mutex_wait_scheme (Integer)
- In 11.2 this controls which wait scheme to use. It can be set to one of the three wait schemes described above thus:
_mutex_wait_scheme = 0                        – Always YIELD
_mutex_wait_scheme = 1 & _mutex_wait_time = t – Always SLEEP for t milli-seconds
_mutex_wait_scheme = 2 & _mutex_wait_time = t – EXP BACKOFF with maximum sleep (default)

2.测试前准备:
SYS@book> oradebug setmypid
Statement processed.
SYS@book> oradebug dump library_cache 4
Statement processed.

$ grep "^Bucket:" /u01/app/oracle/diag/rdbms/book/book/trace/book_ora_10770.trc | head -4
Bucket: #=0 Mutex=0x6cfa1400(1125281431552, 9, 0, 6)
Bucket: #=5 Mutex=0x6cfa14f0(1125281431552, 4, 0, 6)
Bucket: #=17 Mutex=0x6cfa1730(1125281431552, 8, 0, 6)
Bucket: #=39 Mutex=0x6cfa1b50(1125281431552, 4, 0, 6)
--//以上是前面测试的结果,直接找Bucket: #=0 Mutex=0x6cfa1400的语句测试。
--//该sql语句select /*+ 9 */ count(*) from dept where deptno = 93834;的bucket=0.
--//Bucket: #=0 Mutex=0x6cfa1400
--//0x6cfa1400-0x10 = 0x6cfa13f0

--//昨天测试了_mutex_wait_scheme=2的情况,当_mutex_wait_time>1时才会出现EXP BACKOFF with maximum sleep。
--//今天测试_mutex_wait_scheme=0的情况。

--//session 2:
SYS@book> alter system set "_mutex_wait_scheme"=0 scope=memory;
System altered.

2.测试:
--//session 1:
SCOTT@book01p> @ spid
==============================
SID                           : 148
SERIAL#                       : 21638
PROCESS                       : 3538
SERVER                        : DEDICATED
SPID                          : 3540
PID                           : 49
P_SERIAL#                     : 4
KILL_COMMAND                  : alter system kill session '148,21638' immediate;
PL/SQL procedure successfully completed.

SCOTT@book01p> alter session  set session_cached_cursors=0 ;
Session altered.
--//主要目的避免光标缓存,保证每次执行library cache mutex gets增加。

SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
  COUNT(*)
----------
         0
--//执行多次。

SCOTT@book01p> @ hash
HASH_VALUE SQL_ID        CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX   SQL_EXEC_START      SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
3403546624 as3g00v5dw000            0          0      2236899148  cade0000  2025-02-05 08:40:21    16777218

--//session 2:
SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 68761790 00000000 68761790 00000000 00000000 00000000 0000000F 00000000 00000000 00000000 632EEA58 00000000
--//gets=0x0f

SYS@book> @ sharepool/shp4z as3g00v5dw000 -1
SYS@book> @ pr
==============================
HANDLE_TYPE                   : parent handle address
KGLHDADR                      : 0000000068761790
KGLHDPAR                      : 0000000068761790
C40                           : select /*+ 9 */ count(*) from dept where
KGLHDLMD                      : 0
KGLHDPMD                      : 0
KGLHDIVC                      : 0
KGLOBHD0                      : 00000000620C6AF0
KGLOBHD6                      : 00
KGLOBHS0                      : 4064
KGLOBHS6                      : 0
KGLOBT16                      : 0
N0_6_16                       : 4064
N20                           : 4064
KGLNAHSH                      : 3403546624
KGLOBT03                      : as3g00v5dw000
KGLOBT09                      : 65535
PL/SQL procedure successfully completed.
--//父游标句柄地址0000000068761790已经写入mutex地址-0x10处。

4.继续测试:
--//由于目前版本21c不支持oradebug poke内存信息,采用gdb操作。
--//session 3:
SYS@book> @ spid
==============================
SID                           : 276
SERIAL#                       : 24759
PROCESS                       : 3545
SERVER                        : DEDICATED
SPID                          : 3546
PID                           : 58
P_SERIAL#                     : 3
KILL_COMMAND                  : alter system kill session '276,24759' immediate;
PL/SQL procedure successfully completed.

--//window 1:
$ strace -Ttt -y -f -p 3540 2>&1 | tee  mutex0205a.txt

--//window 2:
$rlgdb -f -p 3546
...

(gdb) x /8wx 0x6cfa1400
0x6cfa1400:     0x00000000      0x00000000      0x0000000f      0x00000000
0x6cfa1410:     0x00000000      0x00000000      0x632eea58      0x00000000

(gdb) set *(int *)0x6cfa1404=0x8888

(gdb) x /8wx 0x6cfa1400
0x6cfa1400:     0x00000000      0x00008888      0x00000010      0x00000000
0x6cfa1410:     0x00000000      0x00000000      0x632eea58      0x00000000
--//gets数量0x10,什么变了。

--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
--//挂起!!等32秒以上。

--//window 1:
(gdb) set *(int *)0x6cfa1404=0x0
(gdb) x /8wx 0x6cfa1400
0x6cfa1400:     0x00000000      0x00000000      0x00000013      0x0000ba64
0x6cfa1410:     0x00000000      0x00000000      0x632eea58      0x00000000
--//gets增加到0x13,实际上按照前面测试应该仅仅增加1次。sleeps 增加0xba64.
--//我估计其他因素影响导致gets增加。

--//session 1:
SCOTT@book01p> select /*+ 9 */ count(*) from dept where deptno = 93834;
  COUNT(*)
----------
         0
--//执行完成。

5.分析:
--//session 2:
SYS@book> @ ashtop event,sid,p1,p1raw,p2raw,p3raw 1=1 "'2025-02-05 08:51:14'" "'2025-02-05 08:52:22'"
    Total                                                                                                                                                               Distinct Distinct    Distinct
  Seconds     AAS %This   EVENT                          SID         P1 P1RAW             P2RAW             P3RAW             FIRST_SEEN          LAST_SEEN           Execs Seen  Tstamps Execs Seen1
--------- ------- ------- ----------------------------- ---- ---------- ----------------- ----------------- ----------------- ------------------- ------------------- ---------- -------- -----------
       66     1.0   83% | library cache: bucket mutex X  148          0 0000000000000000  0000888800000000  000000000000003E  2025-02-05 08:51:15 2025-02-05 08:52:20          1       66          66
        7      .1    9% |                                133        100                                                       2025-02-05 08:51:18 2025-02-05 08:52:21          1        7           7
        3      .0    4% |                                  2          3                                                       2025-02-05 08:51:19 2025-02-05 08:51:21          1        3           3
        1      .0    1% |                                  1        100                                                       2025-02-05 08:52:06 2025-02-05 08:52:06          1        1           1
        1      .0    1% |                                 16         20                                                       2025-02-05 08:52:06 2025-02-05 08:52:06          1        1           1
        1      .0    1% |                                259        100                                                       2025-02-05 08:51:56 2025-02-05 08:51:56          1        1           1
        1      .0    1% |                                388        100                                                       2025-02-05 08:51:18 2025-02-05 08:51:18          1        1           1
7 rows selected.

--//window 2:
--//按ctrl+c中断strace跟踪。
$ egrep "getrusage" mutex0205a.txt | head -20
08:51:14.947298 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 411961}, ru_stime={0, 28243}, ...}) = 0 <0.000017>
08:51:14.947431 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 412021}, ru_stime={0, 28243}, ...}) = 0 <0.000021>
08:51:16.608737 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 523427}, ru_stime={0, 119054}, ...}) = 0 <0.000006>
08:51:18.627918 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 645856}, ru_stime={0, 224028}, ...}) = 0 <0.000007>
08:51:20.760185 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 799263}, ru_stime={0, 334037}, ...}) = 0 <0.000014>
08:51:22.631676 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 941764}, ru_stime={0, 455657}, ...}) = 0 <0.000007>
08:51:24.632237 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 101602}, ru_stime={0, 585192}, ...}) = 0 <0.000006>
08:51:26.632877 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 247521}, ru_stime={0, 736923}, ...}) = 0 <0.000005>
08:51:28.633729 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 375019}, ru_stime={0, 859112}, ...}) = 0 <0.000006>
08:51:30.635505 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 499714}, ru_stime={0, 977453}, ...}) = 0 <0.000007>
08:51:32.636968 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 622990}, ru_stime={1, 90826}, ...}) = 0 <0.000008>
08:51:34.639436 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 799361}, ru_stime={1, 200304}, ...}) = 0 <0.000007>
08:51:36.638991 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={1, 970501}, ru_stime={1, 306695}, ...}) = 0 <0.000020>
08:51:38.640378 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 100057}, ru_stime={1, 414119}, ...}) = 0 <0.000007>
08:51:40.640799 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 254635}, ru_stime={1, 542760}, ...}) = 0 <0.000005>
08:51:42.641496 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 399291}, ru_stime={1, 675920}, ...}) = 0 <0.000006>
08:51:44.641819 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 522889}, ru_stime={1, 846384}, ...}) = 0 <0.000007>
08:51:46.658962 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 653332}, ru_stime={1, 993208}, ...}) = 0 <0.000146>
08:51:48.644690 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 734808}, ru_stime={2, 62349}, ...}) = 0 <0.000008>
08:51:50.646492 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={2, 866758}, ru_stime={2, 173378}, ...}) = 0 <0.000007>
--//调用getrusage是2秒间隔.

$ awk '{print $2}' mutex0205a.txt |egrep "semtimedop|getrusage|sched_yield|select" | uniq -c
      2 getrusage(0x1
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     75 sched_yield()
      1 getrusage(0x1
     24 sched_yield()
...
      1 select(0,
     38 sched_yield()
      1 getrusage(0x1
     61 sched_yield()
      1 select(0,
     99 sched_yield()
...
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     28 sched_yield()
      6 getrusage(0x1
--//75+24 = 99, 38+ 61 = 99
--//99次sched_yield,1次select。遇到getrusage分开,相加次数还是99次,如此反复循环。

$ grep select mutex0205a.txt | head
08:51:14.953021 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001553>
08:51:14.960297 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.002133>
08:51:14.967727 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.008293>
08:51:14.981274 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.005533>
08:51:14.992501 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001128>
08:51:14.999121 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001251>
08:51:15.005823 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001398>
08:51:15.012689 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.023802>
08:51:15.042366 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.013728>
08:51:15.061710 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.005186>
--//休眠的时间单位微秒(1/10^6秒),1000微秒= 1000/10^6  = .001秒。
--//sched_yield()方式非常耗CPU。select休眠的时间很短1毫秒。

SYS@book> @ mutexprofz idn,loc "ts>=trunc(sysdate)+08/24+51/1440+10/86400"
-- MutexProf by Tanel Poder (http://www.tanelpoder.com)
-- Showing profile of top 20 sleeps...
-- column info : id idn hash hash_value=>hash_value ts=>sleep_timestamp
--               req=>requesting_session blk=>blocking_session val=>mutex_value maddr=>mutex_addr

SUM_SLEEPS      GETS_DIFF MUTEX_TYPE             IDN GET_LOCATION                      OBJECT_NAME
---------- -------------- --------------- ---------- --------------------------------- -----------------
     47716                Library Cache            0 kglhdgn1  62                      (name not found)
--//0x0000ba64= 47716
--//记录的sleeps次数也是47716次。GET_LOCATION 对应的数字62 与前面ashtop 看到的p2 0x3e对应,注意id=0表示bucket桶号。

$ grep "select" mutex0205a.txt |wc
  11618  127798  859732

$ grep "sched_yield" mutex0205a.txt |wc
1150210 5751050 63261550

--//1150210+11618 = 1161828,远远大于47716,实际上单次sleeps最大的sleeps的保存是0xffff,
--//1161828 = 0x11ba64,16进制后4位正好是0xba64 = 47716,sleeps计数溢出了。

SYS@book> @opeek 0x6cfa13f0 48 0
[06CFA13F0, 06CFA1420) = 68761790 00000000 68761790 00000000 00000000 00000000 00000014 0000BA64 00000000 00000000 632EEA58 00000000
--//与opeek执行看到的一致。

6.小结:
--//缺省_mutex_wait_time=1,_mutex_wait_scheme=0的情况下select的时间是1毫秒。
--//_mutex_wait_scheme=0模式下开始sched_yield 99次,接着调用select 1次(每次0.001秒),2秒之后调用1次getrusage,如此重复。
--//如果遇到调用getrusage,sched_yield调用次数不足99次,下次补齐99次。
--//_mutex_wait_scheme=0模式调用sched_yield()非常消耗CPU资源,select休眠仅仅0.001秒,时间很短。

--//mutex_wait_scheme=0,完全不受_mutex_wait_time的影响,调用select每次还是0.001秒。
--//有时间测试mutex_wait_scheme=1的情况。

SYS@book> alter system set "_mutex_wait_time"=20 scope=memory;
System altered.

--//重复测试:
$ egrep "getrusage" mutex0205b.txt | head -20
09:35:15.844063 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 3453}, ru_stime={3, 965880}, ...}) = 0 <0.000019>
09:35:15.844235 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 3484}, ru_stime={3, 965905}, ...}) = 0 <0.000014>
09:35:17.407293 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 108244}, ru_stime={4, 39694}, ...}) = 0 <0.000007>
09:35:19.409264 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 201890}, ru_stime={4, 106755}, ...}) = 0 <0.000014>
09:35:21.410438 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 341571}, ru_stime={4, 200818}, ...}) = 0 <0.000005>
09:35:23.412471 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={5, 506516}, ru_stime={4, 340368}, ...}) = 0 <0.000007>
--//间隔还是2秒。

$ awk '{print $2}' mutex0205b.txt |egrep "semtimedop|getrusage|sched_yield|select" | uniq -c | head -10
      2 getrusage(0x1
     71 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()
      1 select(0,
     99 sched_yield()

$ grep "select" mutex0205b.txt |head -4
09:35:15.848202 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001327>
09:35:15.855328 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001124>
09:35:15.861792 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001389>
09:35:15.868375 select(0, [], [], [], {0, 1000}) = 0 (Timeout) <0.001603>
--//与前面测试完全一样。
--//符合前面介绍的_mutex_wait_scheme = 0 – Always YIELD的情况。

posted @   lfree  阅读(4)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· 实操Deepseek接入个人知识库
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
历史上的今天:
2024-02-19 [20240219]建立完善sql_idx.sh脚本.txt
2019-02-19 [20190219]那个更快(11g).txt
点击右上角即可分享
微信分享提示