[20240511]测试问题sql语句对数据库性能影响.txt
[20240511]测试问题sql语句对数据库性能影响.txt
--//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql
--//语句,正常情况下应该报ORA-00942: table or view does not exist,很奇怪的同事反映应用前台根本不报错,真不知道开发如何写代
--//码,难道没做例外处理以及记录吗?导致前台无法应用程序无法卡顿,业务停顿接近1个小时.
--//在测试环境模拟看看出现这样的情况,看看如何诊断和分析问题.
1.环境:
SCOTT@test01p> @ver1
PORT_STRING VERSION BANNER CON_ID
-------------------- ---------- ---------------------------------------------------------------------------- ------
IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
2.建立测试环境:
$ cat m1.txt
set verify off
declare
v_id number;
begin
for i in 1 .. &&1 loop
select count(*) into v_id from deptxxx;
end loop;
end ;
/
quit
SCOTT@test01p> @ m1.txt 20
select count(*) into v_id from deptxxx;
*
ERROR at line 5:
ORA-06550: line 5, column 40:
PL/SQL: ORA-00942: table or view does not exist
ORA-06550: line 5, column 9:
PL/SQL: SQL Statement ignored
--//不能通过PL/SQL测试.等于无法编译自动中断执行.
3.换一个测试方式:
$ cat m1.sh
#! /bin/bash
sqlplus -s -l scott/btbtms@test01p <<EOF >/dev/null
set feedback off
select count(*) from deptxxx;
/
... --//snap
/
quit
EOF
--//主:中间的/太长,我加入20000个/,我的测试环境cygwin有点慢.如果是linux环境,你可以写成$(seq 20000|xargs -IQ echo /).
--//这样执行有一个缺点就是先要展开里面/才能执行,这样在测试环境时有点延迟执行,我的测试环境延迟实在是无法忍受.
--//我估计在linux执行也存在类似问题,可以尝试改成$(seq 1000|xargs -IQ echo /).
--//实在不行还是写成20000个/的形式.
SYS@test> @ zzdate
C30 C30 C31
------------------- -------------------------------------- ---------------------------------
2024-05-12 06:45:26 trunc(sysdate)+06/24+45/1440+26/86400 "timestamp'2024-05-12 06:45:26'"
--//开始测试,每次并发10个执行.
$ seq 100 | xargs -IQ -P 10 bash -c ./m1.sh
SYS@test> @ ashtop event 1=1 trunc(sysdate)+06/24+45/1440+26/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- ------- ------- ----------------------------- ------------------- ------------------- ---------- -------- -----------
420 4.2 56% | library cache lock 2024-05-12 06:45:49 2024-05-12 06:47:05 1 52 52
242 2.4 32% | kksfbc child completion 2024-05-12 06:45:47 2024-05-12 06:47:01 1 40 40
41 .4 6% | 2024-05-12 06:45:28 2024-05-12 06:47:01 5 34 37
18 .2 2% | cursor: pin S wait on X 2024-05-12 06:45:49 2024-05-12 06:46:47 1 14 14
17 .2 2% | library cache: mutex X 2024-05-12 06:45:51 2024-05-12 06:47:05 1 11 11
5 .1 1% | SQL*Net break/reset to client 2024-05-12 06:45:55 2024-05-12 06:46:48 1 5 5
1 .0 0% | control file sequential read 2024-05-12 06:46:56 2024-05-12 06:46:56 1 1 1
1 .0 0% | cursor: mutex X 2024-05-12 06:45:58 2024-05-12 06:45:58 1 1 1
8 rows selected.
--//仅仅不到2分钟,出现library cache lock.
SYS@test> @ ashtop sql_id,event 1=1 trunc(sysdate)+06/24+45/1440+26/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This SQL_ID EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ------------- ------------------------------ ------------------- ------------------- ---------- -------- -----------
564 4.2 52% | bkx4nctn9dung library cache lock 2024-05-12 06:45:49 2024-05-12 06:47:39 1 72 72
388 2.9 36% | bkx4nctn9dung kksfbc child completion 2024-05-12 06:45:47 2024-05-12 06:47:40 1 61 61
51 .4 5% | bkx4nctn9dung 2024-05-12 06:45:48 2024-05-12 06:47:40 1 46 46
28 .2 3% | bkx4nctn9dung cursor: pin S wait on X 2024-05-12 06:45:49 2024-05-12 06:47:40 1 22 22
24 .2 2% | bkx4nctn9dung library cache: mutex X 2024-05-12 06:45:51 2024-05-12 06:47:38 1 16 16
11 .1 1% | 2024-05-12 06:45:28 2024-05-12 06:47:29 1 11 11
6 .0 1% | SQL*Net break/reset to client 2024-05-12 06:45:55 2024-05-12 06:47:12 1 6 6
4 .0 0% | f9z1qr21vjkms 2024-05-12 06:45:53 2024-05-12 06:46:36 4 4 4
1 .0 0% | bkx4nctn9dung cursor: mutex X 2024-05-12 06:45:58 2024-05-12 06:45:58 1 1 1
1 .0 0% | gjaap3w3qbf8c library cache lock 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
1 .0 0% | gjaap3w3qbf8c 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
1 .0 0% | control file sequential read 2024-05-12 06:46:56 2024-05-12 06:46:56 1 1 1
1 .0 0% | oracle thread bootstrap 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
13 rows selected.
SYS@test> @ sql_id bkx4nctn9dung
--SQL_ID = bkx4nctn9dung
--//查询不到sql语句,我估计我同事在这里可能有点晕....
--//如果当时查询到sql语句,我估计定位很快,不会走许多弯路....虽然不明白问题为什么^_^?
SYS@test> @ sharepool/shp4x bkx4nctn9dung 0
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ---------------------------- -------- -------- ---------- ---------------- -------- ---------- ---------- ---------- --------- ---------- ---------- ------------- --------
child handle address 000007FF1199DAD8 000007FF007152C8 select count(*) from deptxxx 0 0 21848 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 0
child handle address 000007FF1199D8A8 000007FF007152C8 select count(*) from deptxxx 0 0 9408 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 1
child handle address 000007FEFFDFC4F0 000007FF007152C8 select count(*) from deptxxx 0 0 274 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 2
child handle address 000007FF11C1DEF0 000007FF007152C8 select count(*) from deptxxx 0 0 345 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 3
child handle address 000007FF11F18DC0 000007FF007152C8 select count(*) from deptxxx 0 0 149 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 4
child handle address 000007FF11FA6A78 000007FF007152C8 select count(*) from deptxxx 0 0 85 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 5
....
child handle address 000007FF154DDD90 000007FF007152C8 select count(*) from deptxxx 0 0 18 00 00 0 0 2879573 2879573 2879573 1754720911 bkx4nctn9dung 3537
child handle address 000007FF11D612C0 000007FF007152C8 select count(*) from deptxxx 0 0 4 00 00 0 0 2879573 2879573 2879573 1754720911 bkx4nctn9dung 3538
parent handle address 000007FF007152C8 000007FF007152C8 select count(*) from deptxxx 1 0 187396 000007FF00715108 00 2882976 0 0 2882976 2882976 1754720911 bkx4nctn9dung 65535
3540 rows selected.
--//我的测试产生大量子光标,不知道为什么.出现kksfbc child completion,cursor: pin S wait on X也许与该问题相关.
--//记下KGLNAHSH=1754720911,也就是sql语句hash_value.
--//我觉得如果事先我不知道表没建立,是否会定位这个问题是由于大量子光标产生的,不过总该会手动执行看看.
SYS@test> @ wcx sysdate-3/86400 sysdate
-- Display ASH Wait Chain Signatures script v0.7 by Tanel Poder ( http://blog.tanelpoder.com )
%This SECONDS AAS WAIT_CHAIN FIRST_SEEN LAST_SEEN
----- ------- --- --------------------------------------------------------------------------------------------------------------------------- ------------------- -------------------
6% 2 .7 -> 177,28943,@1=>256,63635,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>17,5013,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>188,25645,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>186,57445,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> ,,@=>16,13196,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
3% 1 .3 -> 177,28943,@1=>99,44579,@1=>cursor: pin S wait on X -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> ,,@=>186,57445,@1=>kksfbc child completion 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>260,19955,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 177,28943,@1=>20,56177,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 177,28943,@1=>20,56177,@1=>cursor: pin S wait on X -> ,,@=>177,28943,@1=> 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> ,,@=>99,44579,@1=>kksfbc child completion 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>177,28943,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>16,13196,@1=>db file sequential read 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 188,25645,@1=>20,56177,@1=>library cache lock -> 260,19955,@1=>188,25645,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 260,19955,@1=>99,44579,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>263,41888,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 17,5013,@1=>269,47730,@1=>library cache lock -> 177,28943,@1=>17,5013,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>269,47730,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>260,19955,@1=>kksfbc child completion 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> ,,@=>256,63635,@1=>kksfbc child completion 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>269,47730,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 260,19955,@1=>263,41888,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 260,19955,@1=>188,25645,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>263,41888,@1=>library cache lock 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>17,5013,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
27 rows selected.
--//出现大量library cache lock阻塞.
SYS@test> @ ashtop event 1=1 sysdate-5/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ----------------------- ------------------- ------------------- ---------- -------- -----------
36 7.2 72% | library cache lock 2024-05-12 06:57:18 2024-05-12 06:57:22 1 5 5
8 1.6 16% | kksfbc child completion 2024-05-12 06:57:18 2024-05-12 06:57:20 1 3 3
6 1.2 12% | library cache: mutex X 2024-05-12 06:57:18 2024-05-12 06:57:21 1 4 4
--// kksfbc child completion 应该与产生大量子光标有关,为什么不是很清楚,生产系统应该没有这个事情或者太靠后没有发现.
SYS@test> @ ashtop sql_id,event,p1,p2,p3 "event like 'library cache%'" sysdate-5/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This SQL_ID EVENT P1 P2 P3 FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ------------- ---------------------- -------------- -------------- ------- ------------------- ------------------- ---------- -------- -----------
4 .8 8% | bkx4nctn9dung library cache lock 8792098174448 8791795839352 5373954 2024-05-12 06:54:19 2024-05-12 06:54:22 1 4 4
4 .8 8% | bkx4nctn9dung library cache lock 8792098174448 8792158716560 5373954 2024-05-12 06:54:18 2024-05-12 06:54:22 1 4 4
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792091899480 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792094103944 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792095087984 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792095603568 5373955 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792098669376 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792142089264 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792148012576 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792148432296 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache: mutex X 1888004588 1099511627776 5374033 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
2 .4 4% | bkx4nctn9dung library cache: mutex X 1888004588 1155346202624 5374033 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8791795841040 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792091899776 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792091900072 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095087096 5373955 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095088448 5373955 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095603568 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095604792 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792098668616 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792142090488 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792142863816 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148013464 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148432928 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148433224 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
25 rows selected.
--//前面记下KGLNAHSH=1754720911,也就是sql语句hash_value,有点奇怪是并没有在P1参数出现.
--//记住library cache lock的P1=8792098174448, library cache: mutex X P1=1888004588.
--//8792098174448 = 0x7ff11e375f0
--//5373954 = /2^16 %2^16 (Type | Mode) = 82,2 = 0x520002 ,82(namespace) 表示SQL AREA BUILD. 2 表示mode.
--//5374033 = /2^16 %2^16 (Type | Mode) = 82,81 = 0x520051
--//intel系列CPU注意大小头问题.
SYS@test> select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 and rownum=1 ;
KGLHDNSP KGLHDNSD KGLOBTYD
-------------- -------------- -------------
82 SQL AREA BUILD CURSOR
--//注意生产系统可能执行很慢!!
SYS@test> @ ev_name "library cache%"
EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS DISPLAY_NAME
------ -------------- --------------------------------- -------------- ------------ ------------------ ------------- ----------- ----------- ----------------------------------
342 2802704141 library cache pin handle address pin address 100*mode+namespace 3875070507 4 Concurrency library cache pin
343 916468430 library cache lock handle address lock address 100*mode+namespace 3875070507 4 Concurrency library cache lock
344 2952162927 library cache load lock object address lock address 100*mask+namespace 3875070507 4 Concurrency library cache load lock
345 1646780882 library cache: mutex X idn value where 3875070507 4 Concurrency library cache: mutex X
346 2601513493 library cache: bucket mutex X idn value where 3875070507 4 Concurrency library cache: bucket mutex X
347 2382371857 library cache: dependency mutex X idn value where 3875070507 4 Concurrency library cache: dependency mutex X
348 2446268751 library cache: mutex S idn value where 3875070507 4 Concurrency library cache: mutex S
1361 800384622 library cache revalidation 1893977003 0 Other library cache revalidation
1362 1578286316 library cache shutdown 1893977003 0 Other library cache shutdown
9 rows selected.
--//library cache lock的P1表示handle address,看看是怎么对象.
--//library cache: mutex X的P1表示idn,我以前理解对应hash_vale.可以是sql语句或者某个对象的hash_value.
--//Tanel Poder有1篇文档,Oracle_Latch_And_Mutex_Contention_Troubleshooting.pdf
library cache: mutex* wait events
1) library cache hash bucket number (if idn <= 131072)
2) idn = hash value of the library cache object under protection (if idn > 131072)
--//一点疑问,如果hash value<=131072(2^17)呢?
--//library cache lock的P1表示handle address,看看是怎么对象,8792098174448 = 0x7ff11e375f0
SYS@test> @ fchaz 7ff11e375f0
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- -------- -------- -------- -------- -------- -------- -------- -----------------
SGA 000007FF11E375C0 1 1 KGLHD 816 recr 80 00 000007FF11E378EF
--//000007FF11E375C0 = 8792098174400
--//8792098174400+48 = 0x7ff11e375f0
--//0x000007FF11E375C0+0x30 = 0x7ff11e375f0 0x7ff11e375f0 (handle address)
--//如果熟悉以前学习可以知道开始的偏移0x30(48字节)就是handle address,顺便说一下我这里是测试环境,
--//如果在生产系统执行tpt fcha.sql脚本会报错或者要么很慢,生产系统共享池很大,扫描会很慢.我使用我改写fchaz,取消执行前的警
--//告提示,但是依旧很慢.
SYS@test> @ sharepool/shp4 7ff11e375f0 0
SYS@test> @ prxx
==============================
TEXT : parent handle address
KGLHDADR : 000007FF11E375F0
KGLHDPAR : 000007FF11E375F0
C40 : b97494666896ea8f
KGLHDLMD : 3
KGLHDPMD : 0
KGLHDIVC : 0
KGLOBHD0 : 00
KGLOBHD6 : 00
KGLOBHS0 : 0
KGLOBHS6 : 0
KGLOBT16 : 0
N0_6_16 : 0
N20 : 0
KGLNAHSH : 1888004588
KGLOBT03 :
KGLOBT09 : 0
PL/SQL procedure successfully completed.
--//有一点点诧异,并不是DEPTXXX对象,看到的对象是b97494666896ea8f.
--//注:如果生产系统执行以上语句可能报错或者很慢,实际上选择hash_value=1888004588,使用shp4x.sql脚本查询,就没有这个问题.
--//后记:实际上测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,可是实际的情况是KGLNAHSH=
--//1888004588.为什么?不过这里出现一个线索,就是KGLNAHSH=1888004588,正好对应前面library cache: mutex X的P1.
--//换一句话讲问题出在hash_value=1888004588,对象是b97494666896ea8f,这个是什么?这个对象字符串长度正好16位,感觉就像
--//full_hash_value取的后16位(16进制),与sql_id类似(注:sql_id为32进制)
--//b97494666896ea8f = 13363469161877400207
--//6896ea8f = 1754720911
--//取后8位计算的结果正好是1754720911,等于前面sql语句的hash_value.注意前面 @sharepool/shp4x bkx4nctn9dung 0的输出,
--//KGLNAHSH=1754720911.
--//可以推测在这个对象b97494666896ea8f存在争用,并且这个值就是等于前面错误sql语句的sql_id值,仅仅sql_id是32进制罢了.
--//可以执行如下验证:
$ ./sql_idz.sh "select count(*) from deptxxx" 0
sql_text = select count(*) from deptxxx\0
full_hash_value(16) = 2A80660E44510A13B97494666896EA8F or 2a80660e44510a13b97494666896ea8f
hash_value(10) = 1754720911
~~~~~~~~~~~~~~~~~~~~~~~~~~
sql_id(32) = bkx4nctn9dung
sql_id(32) = bkx4nctn9dung
sql_id(32) = bkx4nctn9dung
--//full_hash_value(16) = 2A80660E44510A13B97494666896EA8F 后16位就是 b97494666896ea8f
--//执行我以前写查看library cache lock的lcl.sql脚本:
SYS@test> @ lcl
display library cache lock problem
INST_ID SADDR HANDLE MOD REQ OBJECT SQL_ID HASH_VALUE KGLNAOBJ USER_NAME C50
------- ---------------- ---------------- --- --- ---------------- ------------- ---------- ---------------- --------- ------------------------------------------------
1 000007FF09CFB988 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '256,50011' immediate;
1 000007FF0A39C278 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '188,58372' immediate;
1 000007FF09CF22A8 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '260,29617' immediate;
--//注意看这个HASH_VALUE=1888004588,KGLNAOBJ=b97494666896ea8f 对应 library cache: mutex X的P1=1888004588
--//x$kgllk视图存在KGLLKSQLID,对应sql_id并且有值,奇怪x$kglob没有值.
display wait library cache lock
SID SERIAL# SPID PID P_SERIAL# EVENT C50
---- ------- ---- ------- --------- ------------------ --------------------------------------------------
263 6846 5056 43 151 library cache lock alter system kill session '263,6846' immediate;
177 59369 6688 46 30 library cache lock alter system kill session '177,59369' immediate;
14 34381 7096 64 10 library cache lock alter system kill session '14,34381' immediate;
186 34958 5812 66 16 library cache lock alter system kill session '186,34958' immediate;
260 29617 6916 67 10 library cache lock alter system kill session '260,29617' immediate;
17 44622 7136 68 8 library cache lock alter system kill session '17,44622' immediate;
99 51439 6292 69 6 library cache lock alter system kill session '99,51439' immediate;
188 58372 652 70 6 library cache lock alter system kill session '188,58372' immediate;
20 52459 4128 72 9 library cache lock alter system kill session '20,52459' immediate;
9 rows selected.
SYS@test> @ sql_id 1fuagn5s8jagc
--SQL_ID = 1fuagn5s8jagc
--//还是看不到sql语句.也许这是问题的难点,实际上如果像平常那样处理,kill掉这些进程没有任何用处,因为程序一旦登陆问题再现.
--//实际上如果当时查询x$kglob视图,KGLOBT03=bkx4nctn9dung,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立.
--//看看这里的sql_id=1fuagn5s8jagc 如何计算.
--//看看里面的内容:
SYS@test> @ fchaz 7ff11e375f0
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- -------------- -------------- ---------------- -------------- -------- -------------- ---------------- -----------------
SGA 000007FF11E375C0 1 1 KGLHD 816 recr 80 00 000007FF11E378EF
SYS@test> oradebug setmypid
Statement processed.
SYS@test> oradebug peek 0x000007FF11E375C0 816
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946570 00000000 11A4DB60 000007FF 14B840D0 000007FF 00000003 00080050 034EF508 000007FF ..
SYS@test> @ ti
New tracefile_identifier = D:\APP\ORACLE\diag\rdbms\test\test\trace\test_ora_3940_0001.trc
SYS@test> oradebug peek 0x000007FF11E375C0 816 1
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946458 00000000 0A946480 00000000 14B840D0 000007FF 00000003 00080050 034EF508 000007FF ...
--//这样看信息不全,要看跟踪文件,最后的参数1写入跟踪文件.816表示查看长度.
SYS@test> @t
TRACEFILE
-----------------------------------------------------------------
D:\APP\ORACLE\diag\rdbms\test\test\trace\test_ora_3940_0001.trc
*** 2024-05-12T07:34:45.801424+08:00 (CDB$ROOT(1))
Processing Oradebug command 'peek 0x000007FF11E375C0 816 1'
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946458 ...
Dump of memory from 0x7FF11E375D4 to 0x7FF11E378F0
7FF11E375D0 00000000 0A946480 00000000 [.....d......]
7FF11E375E0 14B840D0 000007FF 00000003 00080050 [.@..........P...]
7FF11E375F0 034EF508 000007FF 034EF508 000007FF [..N.......N.....]
7FF11E37600 00000000 00000000 11E37758 000007FF [........Xw......]
7FF11E37610 00030052 00010000 00000000 00000000 [R...............]
7FF11E37620 00010001 00000000 00000000 00000000 [................]
7FF11E37630 00000000 00000000 00000000 00000000 [................]
7FF11E37640 11E377D0 000007FF 00000001 00000000 [.w..............]
7FF11E37650 11E37650 000007FF 11E37650 000007FF [Pv......Pv......]
7FF11E37660 11E37660 000007FF 11E37660 000007FF [`v......`v......]
7FF11E37670 15EBC6E0 000007FF 15EBC6E0 000007FF [................]
7FF11E37680 15C3CE40 000007FF 1198B988 000007FF [@...............]
7FF11E37690 11E37690 000007FF 11E37690 000007FF [.v.......v......]
7FF11E376A0 00000000 00000000 00000000 00000000 [................]
7FF11E376B0 17694FA1 00000000 0000A9EC 00000000 [.Oi.............]
7FF11E376C0 11E37740 000007FF 11E376C8 000007FF [@w.......v......]
7FF11E376D0 11E376C8 000007FF 11E376D8 000007FF [.v.......v......]
7FF11E376E0 11E376D8 000007FF 00000000 00000000 [.v..............]
7FF11E376F0 03081E00 000007FF 00000000 00000000 [................]
7FF11E37700 00000000 00000000 00157212 00000000 [.........r......]
7FF11E37710 00157214 00157214 00000000 00000000 [.r...r..........]
7FF11E37720 00000000 00000000 00000000 00000000 [................]
7FF11E37730 00000000 00000003 AE4581A3 00000000 [..........E.....]
7FF11E37740 00000000 000000B1 00F043A3 001AF790 [.........C......]
7FF11E37750 7088A9EC 00000000 F3B5EABA 290512F5 [...p...........)]
7FF11E37760 17694FA1 7088A9EC 00000000 00000000 [.Oi....p........]
7FF11E37770 00000000 00000000 00000000 00000000 [................]
7FF11E37780 00000010 00000000 00000007 00000700 [................]
7FF11E37790 00000000 00000000 11E377B0 000007FF [.........w......]
7FF11E377A0 00000000 00000000 00000000 00000000 [................]
7FF11E377B0 34373962 36363439 36393836 66386165 [b97494666896ea8f]
7FF11E377C0 49554224 5424444C 30545345 00005031 [$BUILD$TEST01P..]
7FF11E377D0 00000000 00000000 00000000 00000000 [................]
Repeat 17 times
--//注意看后面的信息.
7FF11E377B0 34373962 36363439 36393836 66386165 [b97494666896ea8f]
7FF11E377C0 49554224 5424444C 30545345 00005031 [$BUILD$TEST01P..]
--//就是某个对象.
SYS@test> select * from v$db_object_cache where name = 'b97494666896ea8f'
2 @prxx
==============================
OWNER : $BUILD$
NAME : b97494666896ea8f
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 1
PINS : 0
KEPT : NO
CHILD_LATCH : 43500
INVALIDATIONS : 0
HASH_VALUE : 1888004588
LOCK_MODE : EXCLUSIVE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 2140477
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : f3b5eaba290512f517694fa17088a9ec
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF11E375F0
EDITION :
PL/SQL procedure successfully completed.
--//NAMESPACE=SQL AREA BUILD,select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 order by 1;
--//kglhdnsp=82 , 对应8进制是\01222,16进制0x52(字符R)
--//1888004588%2^17 = 43500,对应CHILD_LATCH =43500.
--//注意:这个视图没有sql_id字段,感觉oracle设计这个视图不合理,应该把sql_id加入进来,对应x$kglob.KGLOBT03.
--//我的感觉这个视图一直存在,而从9i开始,oracle才使用sql_id标识sql语句,oracle为什么不加入该字段.
--//注意OWNER=$BUILD$,开始乱猜使用SCOTT计算sql_id不对.使用我以前写sql_idx.sh脚本.
$ echo -n -e 'b97494666896ea8f.$BUILD$.TEST01P' > bb.txt
$ ./sql_idx.sh bb.txt 1
sql_text = b97494666896ea8f.$BUILD$.TEST01P\0122\0\0\0
full_hash_value(16) = F3B5EABA290512F517694FA17088A9EC
hash_value(10) = 1888004588
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
--//基本都能对上前面select * from v$db_object_cache where name = 'b97494666896ea8f'的输出.
--//注:我改写我原来写的sql_idx.sh脚本,因为如果cat 文件里面包含chr(0),cat的输出就忽略掉,准备重写该脚本
--//有点教条了了,可以执行如下:
$ sql_idz.sh 'b97494666896ea8f.$BUILD$.TEST01P\0122\0\0'
sql_text = b97494666896ea8f.$BUILD$.TEST01P\0122\0\0\0
full_hash_value(16) = F3B5EABA290512F517694FA17088A9EC or f3b5eaba290512f517694fa17088a9ec
hash_value(10) = 1888004588
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
4.总结:
--//我已经基本展示生产系统出现的问题,还是不明白为什么这样会产生大量的子光标.
--//在分析sql语句时或许与产生大量子光标有关,在对象b97494666896ea8f上存在争用,NAMESPACE = SQL AREA BUILD,导致
--//出现library cache: mutex X,library cache lock.
--//另外我做了测试,如果打开几个回话,手工执行select count(*) from deptxxx;当然不是并发执行,并没有产生多个子光标的情况.
--//也许问题就出在密集的执行产生大量子光标上面.
--//在测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,甚至在shared pool latch上存在争用,
--//因为错误的sql语句每次存在硬解析,实际的情况都不是,而在NAMESPACE = SQL AREA BUILD,建立的对象上存在争用.
--//还有一些细节我需要验证,好好消化,写的有点乱...
--//我估计同事看到sql_id后查询不到对应sql语句,而不断的kill进程并不会有助于问题解决,在这上面浪费大量时间,而且实际上
--//阻塞的时间都很短.
--//实际上只要查询底层的x$kglob的,KGLOBT03=bkx4nctn9dung,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立.
--//实际上还可以查询v$db_object_cache的HASH_VALUE(它可以通过sql_id获得),获得对应sql语句,该视图的底层就是x$kglob.
5.补充测试:
--//手工执行4次:
select count(*) from deptxxx;
SYS@test> select * from v$db_object_cache where name = 'b97494666896ea8f'
2 @ prxx
==============================
OWNER : $BUILD$
NAME : b97494666896ea8f
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 43500
INVALIDATIONS : 0
HASH_VALUE : 1888004588
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : f3b5eaba290512f517694fa17088a9ec
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF15958380
EDITION :
PL/SQL procedure successfully completed.
--//只要执行就会出现这个对象.
--//手工执行4次:
select count(*) from dept;
SCOTT@test01p> @ hash
HASH_VALUE SQL_ID CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX SQL_EXEC_START SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
761178024 1jrz3ucqpx9x8 0 42920 3051237957 2d5ea7a8 2024-05-15 21:14:52 16777219
SYS@test> select FULL_HASH_VALUE from v$db_object_cache where hash_value = '761178024' and rownum=1;
FULL_HASH_VALUE
--------------------------------
aba620d08eb373c818dfe3d32d5ea7a8
SYS@test> select * from v$db_object_cache where name='18dfe3d32d5ea7a8'
2 @ prxx
==============================
OWNER : $BUILD$
NAME : 18dfe3d32d5ea7a8
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 45772
INVALIDATIONS : 0
HASH_VALUE : 4236817100
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : 87defd2f856915dc18d747fdfc88b2cc
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF1611C200
EDITION :
PL/SQL procedure successfully completed.
--//无论我执行select count(*) from dept;,LOCKED_TOTAL =1,即使新用户登陆执行LOCKED_TOTAL=1不变.
--//如果建立新的子光标
SCOTT@test01p> alter session set optimizer_index_cost_adj=99;
Session altered.
SCOTT@test01p> select count(*) from dept;
COUNT(*)
----------
4
SYS@test> @prxx
==============================
OWNER : $BUILD$
NAME : 18dfe3d32d5ea7a8
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 45772
INVALIDATIONS : 0
HASH_VALUE : 4236817100
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : 87defd2f856915dc18d747fdfc88b2cc
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF1659A580
EDITION :
PL/SQL procedure successfully completed.
--//LOCKED_TOTAL =2.
--//这样问题就很明显解析的通生产系统遇到的问题,一条不能执行sql语句,密集执行就会在这个对象造成阻塞.
--//即使没有产生子光标这个因素,都是如此.
SYS@test> select count(*) from v$db_object_cache where length(name)=16 and NAMESPACE='SQL AREA BUILD';
COUNT(*)
----------
215
--//有215个类似对象.
SYS@test> select name,substr(name,-8,8),to_number(substr(name,-8,8),lpad('x',8,'x')) hash_value from v$db_object_cache where length(name)=16 and NAMESPACE='SQL AREA BUILD' and rownum<=10;
NAME SUBSTR(NAME,-8,8 HASH_VALUE
-------------------- ---------------- ----------
b4035ec136571959 36571959 911677785
55d317c9fd64f69a fd64f69a 4251252378
7dea5291a4131ac2 a4131ac2 2752715458
b6917fa597638e12 97638e12 2539884050
54aafc527339dee7 7339dee7 1933172455
1ae760adac0ea9ca ac0ea9ca 2886642122
cc52a92b7e2ba318 7e2ba318 2116789016
f2ff6345225ba386 225ba386 576430982
be2e6ed6e2ce0906 e2ce0906 3805153542
06bbaef78b718ad6 8b718ad6 2339474134
10 rows selected.
SYS@test> @ sharepool/shp4x 0 911677785
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ---------------------------------------- ---------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
child handle address 000007FF155A04B8 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF11DB70E8 000007FF15E5B6A8 4032 8088 3915 16035 16035 911677785 b80uys4v5f6at 0
child handle address 000007FF15E0EFF0 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF157D6CA8 000007FF15E5BA40 4032 8088 3915 16035 16035 911677785 b80uys4v5f6at 1
parent handle address 000007FF11D87948 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF11D87788 00 4072 0 0 4072 4072 911677785 b80uys4v5f6at 65535
--//我尝试以上都能找到类似的语句.
6.贴一段生产系统看到的情况.
> @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'"
Total
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN
------- ----- ----- ---------------------- ------------------- -------------------
8164610 913.0 76% library cache lock 2024-04-26 08:23:22 2024-04-26 09:17:44
2587860 289.4 24% library cache: mutex X 2024-04-26 08:23:22 2024-04-26 09:17:44
10 .0 0% library cache pin 2024-04-26 10:00:40 2024-04-26 10:00:40
--//出现问题时间点2024-04-26 08:23:22,到2024-04-26 09:17:44问题解决.后面library cache pin仅仅出现1次,可以不用考虑.
> @ dashtop event,sql_id "event='library cache lock'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 08:30:00'"
Total
Seconds AAS %This EVENT SQL_ID FIRST_SEEN LAST_SEEN
------- ----- ----- ------------------ ------------- ------------------- -------------------
554750 318.3 96% library cache lock arw6k2rf5s29b 2024-04-26 08:23:22 2024-04-26 08:29:55
13880 8.0 2% library cache lock bakhcg5b1749p 2024-04-26 08:23:22 2024-04-26 08:29:55
2580 1.5 0% library cache lock ddd4xgabw2tct 2024-04-26 08:23:33 2024-04-26 08:29:55
2440 1.4 0% library cache lock 306sjy7n4m9bk 2024-04-26 08:23:22 2024-04-26 08:29:55
2300 1.3 0% library cache lock av7t0b1v7jn1m 2024-04-26 08:23:22 2024-04-26 08:29:55
440 .3 0% library cache lock 69zcqdq405z1h 2024-04-26 08:25:31 2024-04-26 08:29:55
420 .2 0% library cache lock dgvzbsr848vam 2024-04-26 08:25:09 2024-04-26 08:29:55
410 .2 0% library cache lock bg1t2pc5ym090 2024-04-26 08:24:15 2024-04-26 08:29:55
350 .2 0% library cache lock 0j7bs1g3fg66x 2024-04-26 08:23:33 2024-04-26 08:29:55
310 .2 0% library cache lock 7yhjdqcg8wzny 2024-04-26 08:23:22 2024-04-26 08:29:55
300 .2 0% library cache lock 4f03m8k9vgb6r 2024-04-26 08:23:43 2024-04-26 08:29:55
250 .1 0% library cache lock 9bsd8hfq3cf81 2024-04-26 08:23:54 2024-04-26 08:29:55
200 .1 0% library cache lock 5xw2vsz1868zh 2024-04-26 08:25:31 2024-04-26 08:29:55
20 .0 0% library cache lock 899c5c7ug4dp4 2024-04-26 08:23:22 2024-04-26 08:23:22
14 rows selected.
> @ sqlhh arw6k2rf5s29b 100
time unit : millisecond
BEGIN_INTERVAL_TIME INST_ID SQL_ID PLAN_HASH_VALUE EXECUTIONS ELA_MS_PER_EXEC CPU_MS_PER_EXEC ROWS_PER_EXEC LIOS_PER_EXEC BLKRD_PER_EXEC IOW_MS_PER_EXEC AVG_IOW_MS CLW_MS_PER_EXEC APW_MS_PER_EXEC CCW_MS_PER_EXEC
------------------- ------- ------------- --------------- ---------- --------------- --------------- ------------- ------------- -------------- --------------- ----------- --------------- --------------- ---------------
2024-04-26 11:00:15 1 arw6k2rf5s29b 3673589541 135628 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 12:00:17 1 arw6k2rf5s29b 3673589541 125952 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 13:00:36 1 arw6k2rf5s29b 3673589541 117259 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 14:00:13 1 arw6k2rf5s29b 3673589541 121757 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 15:00:15 1 arw6k2rf5s29b 3673589541 131134 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 16:00:35 1 arw6k2rf5s29b 3673589541 135313 0 0 0.0 0 0 0 0.0 0 0 0
...
2024-04-29 02:00:20 1 arw6k2rf5s29b 2199259672 54386 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 03:00:39 1 arw6k2rf5s29b 2199259672 52685 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 04:00:08 1 arw6k2rf5s29b 2199259672 52485 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 05:00:17 1 arw6k2rf5s29b 2199259672 52623 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 06:00:37 1 arw6k2rf5s29b 2199259672 51764 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 07:00:04 1 arw6k2rf5s29b 2199259672 56257 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 09:00:35 1 arw6k2rf5s29b 2199259672 127381 0 0 0.0 0 0 0 0.0 0 0 0
47 rows selected.
--//135628/3600 = 37.6次/秒.
--//可以发现1个细节,我无法解析,9:17恢复系统,很奇怪的是9:17-11:00并没有大量执行.而之后有记录.
--//要么就是9,10点执行次数很少,没有写入awr相关历史表中.
--//最奇怪的涉及查询返回0,我查看相关表现在的情况表中没有任何记录,似乎再2024-04-29 09:00:35之后某个时刻发现问题,不再执行.
--//我估计取消了升级与改动,我们团队缺乏必要的交流与沟通,根本不知道开发做了什么.
5.相关脚本:
--//lcl.sql
PROMPT
PROMPT display library cache lock problem
PROMPT
column object format a20
column user_name format a20
SELECT inst_id
,kgllkses saddr
,kgllkhdl handle
,kgllkmod MOD
,kgllkreq REQ
,kglnaobj object
,KGLLKSQLID sql_id
,kglnahsh hash_value
,KGLNAOBJ
,user_name
, 'alter system kill session '''
|| s.sid
|| ','
|| s.serial#
|| ''''
|| ' immediate;'
c50
--,lock_a.*
FROM x$kgllk lock_a, v$session s
WHERE s.saddr = lock_a.kgllkses and
kgllkmod > 0
AND EXISTS
(SELECT lock_b.kgllkhdl
FROM x$kgllk lock_b
WHERE kgllkses IN (SELECT saddr
FROM v$session
WHERE event like 'library cache lock') /* blocked session */
--WHERE event like 'library cache pin') /* blocked session */
AND lock_a.kgllkhdl = lock_b.kgllkhdl
AND kgllkreq > 0);
PROMPT
PROMPT display wait library cache lock
PROMPT
SELECT s.sid
,s.serial#
,p.spid
,p.pid
,p.serial# p_serial#
,s.event
, 'alter system kill session '''
|| s.sid
|| ','
|| s.serial#
|| ''''
|| ' immediate;'
c50
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND s.event= 'library cache lock';
--AND s.event= 'library cache pin';
--//shp4.sql
column N0_6_16 format 99999999
column fcura_addrlen new_value _fcura_addrlen
set termout off
select vsize(addr)*2 fcura_addrlen from x$dual;
set termout on
SELECT DECODE (kglhdadr,
kglhdpar, 'parent handle address',
'child handle address')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
KGLHDLMD,
KGLHDPMD,
kglhdivc,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03 ,
kglobt09
FROM x$kglob
WHERE
KGLHDPAR = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLHDADR = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBHD0 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD1 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD2 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD3 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD4 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD5 = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBHD6 = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBT03 = lower('&1')
or KGLNAHSH= &2;
--//shp4x.sql
column N0_6_16 format 99999999
SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr,
kglhdpar, 'parent handle address',
'child handle address')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
KGLHDLMD,
KGLHDPMD,
kglhdivc,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03,
kglobt09
FROM x$kglob
WHERE kglobt03 = lower('&1') or KGLNAHSH= &2;
--//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql
--//语句,正常情况下应该报ORA-00942: table or view does not exist,很奇怪的同事反映应用前台根本不报错,真不知道开发如何写代
--//码,难道没做例外处理以及记录吗?导致前台无法应用程序无法卡顿,业务停顿接近1个小时.
--//在测试环境模拟看看出现这样的情况,看看如何诊断和分析问题.
1.环境:
SCOTT@test01p> @ver1
PORT_STRING VERSION BANNER CON_ID
-------------------- ---------- ---------------------------------------------------------------------------- ------
IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0
2.建立测试环境:
$ cat m1.txt
set verify off
declare
v_id number;
begin
for i in 1 .. &&1 loop
select count(*) into v_id from deptxxx;
end loop;
end ;
/
quit
SCOTT@test01p> @ m1.txt 20
select count(*) into v_id from deptxxx;
*
ERROR at line 5:
ORA-06550: line 5, column 40:
PL/SQL: ORA-00942: table or view does not exist
ORA-06550: line 5, column 9:
PL/SQL: SQL Statement ignored
--//不能通过PL/SQL测试.等于无法编译自动中断执行.
3.换一个测试方式:
$ cat m1.sh
#! /bin/bash
sqlplus -s -l scott/btbtms@test01p <<EOF >/dev/null
set feedback off
select count(*) from deptxxx;
/
... --//snap
/
quit
EOF
--//主:中间的/太长,我加入20000个/,我的测试环境cygwin有点慢.如果是linux环境,你可以写成$(seq 20000|xargs -IQ echo /).
--//这样执行有一个缺点就是先要展开里面/才能执行,这样在测试环境时有点延迟执行,我的测试环境延迟实在是无法忍受.
--//我估计在linux执行也存在类似问题,可以尝试改成$(seq 1000|xargs -IQ echo /).
--//实在不行还是写成20000个/的形式.
SYS@test> @ zzdate
C30 C30 C31
------------------- -------------------------------------- ---------------------------------
2024-05-12 06:45:26 trunc(sysdate)+06/24+45/1440+26/86400 "timestamp'2024-05-12 06:45:26'"
--//开始测试,每次并发10个执行.
$ seq 100 | xargs -IQ -P 10 bash -c ./m1.sh
SYS@test> @ ashtop event 1=1 trunc(sysdate)+06/24+45/1440+26/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- ------- ------- ----------------------------- ------------------- ------------------- ---------- -------- -----------
420 4.2 56% | library cache lock 2024-05-12 06:45:49 2024-05-12 06:47:05 1 52 52
242 2.4 32% | kksfbc child completion 2024-05-12 06:45:47 2024-05-12 06:47:01 1 40 40
41 .4 6% | 2024-05-12 06:45:28 2024-05-12 06:47:01 5 34 37
18 .2 2% | cursor: pin S wait on X 2024-05-12 06:45:49 2024-05-12 06:46:47 1 14 14
17 .2 2% | library cache: mutex X 2024-05-12 06:45:51 2024-05-12 06:47:05 1 11 11
5 .1 1% | SQL*Net break/reset to client 2024-05-12 06:45:55 2024-05-12 06:46:48 1 5 5
1 .0 0% | control file sequential read 2024-05-12 06:46:56 2024-05-12 06:46:56 1 1 1
1 .0 0% | cursor: mutex X 2024-05-12 06:45:58 2024-05-12 06:45:58 1 1 1
8 rows selected.
--//仅仅不到2分钟,出现library cache lock.
SYS@test> @ ashtop sql_id,event 1=1 trunc(sysdate)+06/24+45/1440+26/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This SQL_ID EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ------------- ------------------------------ ------------------- ------------------- ---------- -------- -----------
564 4.2 52% | bkx4nctn9dung library cache lock 2024-05-12 06:45:49 2024-05-12 06:47:39 1 72 72
388 2.9 36% | bkx4nctn9dung kksfbc child completion 2024-05-12 06:45:47 2024-05-12 06:47:40 1 61 61
51 .4 5% | bkx4nctn9dung 2024-05-12 06:45:48 2024-05-12 06:47:40 1 46 46
28 .2 3% | bkx4nctn9dung cursor: pin S wait on X 2024-05-12 06:45:49 2024-05-12 06:47:40 1 22 22
24 .2 2% | bkx4nctn9dung library cache: mutex X 2024-05-12 06:45:51 2024-05-12 06:47:38 1 16 16
11 .1 1% | 2024-05-12 06:45:28 2024-05-12 06:47:29 1 11 11
6 .0 1% | SQL*Net break/reset to client 2024-05-12 06:45:55 2024-05-12 06:47:12 1 6 6
4 .0 0% | f9z1qr21vjkms 2024-05-12 06:45:53 2024-05-12 06:46:36 4 4 4
1 .0 0% | bkx4nctn9dung cursor: mutex X 2024-05-12 06:45:58 2024-05-12 06:45:58 1 1 1
1 .0 0% | gjaap3w3qbf8c library cache lock 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
1 .0 0% | gjaap3w3qbf8c 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
1 .0 0% | control file sequential read 2024-05-12 06:46:56 2024-05-12 06:46:56 1 1 1
1 .0 0% | oracle thread bootstrap 2024-05-12 06:47:28 2024-05-12 06:47:28 1 1 1
13 rows selected.
SYS@test> @ sql_id bkx4nctn9dung
--SQL_ID = bkx4nctn9dung
--//查询不到sql语句,我估计我同事在这里可能有点晕....
--//如果当时查询到sql语句,我估计定位很快,不会走许多弯路....虽然不明白问题为什么^_^?
SYS@test> @ sharepool/shp4x bkx4nctn9dung 0
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ---------------------------- -------- -------- ---------- ---------------- -------- ---------- ---------- ---------- --------- ---------- ---------- ------------- --------
child handle address 000007FF1199DAD8 000007FF007152C8 select count(*) from deptxxx 0 0 21848 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 0
child handle address 000007FF1199D8A8 000007FF007152C8 select count(*) from deptxxx 0 0 9408 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 1
child handle address 000007FEFFDFC4F0 000007FF007152C8 select count(*) from deptxxx 0 0 274 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 2
child handle address 000007FF11C1DEF0 000007FF007152C8 select count(*) from deptxxx 0 0 345 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 3
child handle address 000007FF11F18DC0 000007FF007152C8 select count(*) from deptxxx 0 0 149 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 4
child handle address 000007FF11FA6A78 000007FF007152C8 select count(*) from deptxxx 0 0 85 00 00 0 0 2357453 2357453 2357453 1754720911 bkx4nctn9dung 5
....
child handle address 000007FF154DDD90 000007FF007152C8 select count(*) from deptxxx 0 0 18 00 00 0 0 2879573 2879573 2879573 1754720911 bkx4nctn9dung 3537
child handle address 000007FF11D612C0 000007FF007152C8 select count(*) from deptxxx 0 0 4 00 00 0 0 2879573 2879573 2879573 1754720911 bkx4nctn9dung 3538
parent handle address 000007FF007152C8 000007FF007152C8 select count(*) from deptxxx 1 0 187396 000007FF00715108 00 2882976 0 0 2882976 2882976 1754720911 bkx4nctn9dung 65535
3540 rows selected.
--//我的测试产生大量子光标,不知道为什么.出现kksfbc child completion,cursor: pin S wait on X也许与该问题相关.
--//记下KGLNAHSH=1754720911,也就是sql语句hash_value.
--//我觉得如果事先我不知道表没建立,是否会定位这个问题是由于大量子光标产生的,不过总该会手动执行看看.
SYS@test> @ wcx sysdate-3/86400 sysdate
-- Display ASH Wait Chain Signatures script v0.7 by Tanel Poder ( http://blog.tanelpoder.com )
%This SECONDS AAS WAIT_CHAIN FIRST_SEEN LAST_SEEN
----- ------- --- --------------------------------------------------------------------------------------------------------------------------- ------------------- -------------------
6% 2 .7 -> 177,28943,@1=>256,63635,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>17,5013,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>188,25645,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> 177,28943,@1=>186,57445,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
6% 2 .7 -> ,,@=>16,13196,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:22
3% 1 .3 -> 177,28943,@1=>99,44579,@1=>cursor: pin S wait on X -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> ,,@=>186,57445,@1=>kksfbc child completion 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>260,19955,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 177,28943,@1=>20,56177,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 177,28943,@1=>20,56177,@1=>cursor: pin S wait on X -> ,,@=>177,28943,@1=> 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> ,,@=>99,44579,@1=>kksfbc child completion 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>177,28943,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>16,13196,@1=>db file sequential read 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 188,25645,@1=>20,56177,@1=>library cache lock -> 260,19955,@1=>188,25645,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 260,19955,@1=>99,44579,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>263,41888,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 17,5013,@1=>269,47730,@1=>library cache lock -> 177,28943,@1=>17,5013,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>269,47730,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>260,19955,@1=>kksfbc child completion 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> ,,@=>256,63635,@1=>kksfbc child completion 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 177,28943,@1=>269,47730,@1=>library cache lock -> ,,@=>177,28943,@1=> 2024-05-12 06:55:21 2024-05-12 06:55:21
3% 1 .3 -> 260,19955,@1=>263,41888,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> 260,19955,@1=>188,25645,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
3% 1 .3 -> ,,@=>263,41888,@1=>library cache lock 2024-05-12 06:55:22 2024-05-12 06:55:22
3% 1 .3 -> 260,19955,@1=>17,5013,@1=>library cache lock -> ,,@=>260,19955,@1=> 2024-05-12 06:55:23 2024-05-12 06:55:23
27 rows selected.
--//出现大量library cache lock阻塞.
SYS@test> @ ashtop event 1=1 sysdate-5/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ----------------------- ------------------- ------------------- ---------- -------- -----------
36 7.2 72% | library cache lock 2024-05-12 06:57:18 2024-05-12 06:57:22 1 5 5
8 1.6 16% | kksfbc child completion 2024-05-12 06:57:18 2024-05-12 06:57:20 1 3 3
6 1.2 12% | library cache: mutex X 2024-05-12 06:57:18 2024-05-12 06:57:21 1 4 4
--// kksfbc child completion 应该与产生大量子光标有关,为什么不是很清楚,生产系统应该没有这个事情或者太靠后没有发现.
SYS@test> @ ashtop sql_id,event,p1,p2,p3 "event like 'library cache%'" sysdate-5/86400 sysdate
Total Distinct Distinct Distinct
Seconds AAS %This SQL_ID EVENT P1 P2 P3 FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1
------- --- ------- ------------- ---------------------- -------------- -------------- ------- ------------------- ------------------- ---------- -------- -----------
4 .8 8% | bkx4nctn9dung library cache lock 8792098174448 8791795839352 5373954 2024-05-12 06:54:19 2024-05-12 06:54:22 1 4 4
4 .8 8% | bkx4nctn9dung library cache lock 8792098174448 8792158716560 5373954 2024-05-12 06:54:18 2024-05-12 06:54:22 1 4 4
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792091899480 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792094103944 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792095087984 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792095603568 5373955 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792098669376 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792142089264 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792148012576 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache lock 8792098174448 8792148432296 5373954 2024-05-12 06:54:20 2024-05-12 06:54:22 1 3 3
3 .6 6% | bkx4nctn9dung library cache: mutex X 1888004588 1099511627776 5374033 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
2 .4 4% | bkx4nctn9dung library cache: mutex X 1888004588 1155346202624 5374033 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8791795841040 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792091899776 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792091900072 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095087096 5373955 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095088448 5373955 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095603568 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792095604792 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792098668616 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792142090488 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792142863816 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148013464 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148432928 5373954 2024-05-12 06:54:19 2024-05-12 06:54:19 1 1 1
1 .2 2% | bkx4nctn9dung library cache lock 8792098174448 8792148433224 5373954 2024-05-12 06:54:18 2024-05-12 06:54:18 1 1 1
25 rows selected.
--//前面记下KGLNAHSH=1754720911,也就是sql语句hash_value,有点奇怪是并没有在P1参数出现.
--//记住library cache lock的P1=8792098174448, library cache: mutex X P1=1888004588.
--//8792098174448 = 0x7ff11e375f0
--//5373954 = /2^16 %2^16 (Type | Mode) = 82,2 = 0x520002 ,82(namespace) 表示SQL AREA BUILD. 2 表示mode.
--//5374033 = /2^16 %2^16 (Type | Mode) = 82,81 = 0x520051
--//intel系列CPU注意大小头问题.
SYS@test> select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 and rownum=1 ;
KGLHDNSP KGLHDNSD KGLOBTYD
-------------- -------------- -------------
82 SQL AREA BUILD CURSOR
--//注意生产系统可能执行很慢!!
SYS@test> @ ev_name "library cache%"
EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS DISPLAY_NAME
------ -------------- --------------------------------- -------------- ------------ ------------------ ------------- ----------- ----------- ----------------------------------
342 2802704141 library cache pin handle address pin address 100*mode+namespace 3875070507 4 Concurrency library cache pin
343 916468430 library cache lock handle address lock address 100*mode+namespace 3875070507 4 Concurrency library cache lock
344 2952162927 library cache load lock object address lock address 100*mask+namespace 3875070507 4 Concurrency library cache load lock
345 1646780882 library cache: mutex X idn value where 3875070507 4 Concurrency library cache: mutex X
346 2601513493 library cache: bucket mutex X idn value where 3875070507 4 Concurrency library cache: bucket mutex X
347 2382371857 library cache: dependency mutex X idn value where 3875070507 4 Concurrency library cache: dependency mutex X
348 2446268751 library cache: mutex S idn value where 3875070507 4 Concurrency library cache: mutex S
1361 800384622 library cache revalidation 1893977003 0 Other library cache revalidation
1362 1578286316 library cache shutdown 1893977003 0 Other library cache shutdown
9 rows selected.
--//library cache lock的P1表示handle address,看看是怎么对象.
--//library cache: mutex X的P1表示idn,我以前理解对应hash_vale.可以是sql语句或者某个对象的hash_value.
--//Tanel Poder有1篇文档,Oracle_Latch_And_Mutex_Contention_Troubleshooting.pdf
library cache: mutex* wait events
1) library cache hash bucket number (if idn <= 131072)
2) idn = hash value of the library cache object under protection (if idn > 131072)
--//一点疑问,如果hash value<=131072(2^17)呢?
--//library cache lock的P1表示handle address,看看是怎么对象,8792098174448 = 0x7ff11e375f0
SYS@test> @ fchaz 7ff11e375f0
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- -------- -------- -------- -------- -------- -------- -------- -----------------
SGA 000007FF11E375C0 1 1 KGLHD 816 recr 80 00 000007FF11E378EF
--//000007FF11E375C0 = 8792098174400
--//8792098174400+48 = 0x7ff11e375f0
--//0x000007FF11E375C0+0x30 = 0x7ff11e375f0 0x7ff11e375f0 (handle address)
--//如果熟悉以前学习可以知道开始的偏移0x30(48字节)就是handle address,顺便说一下我这里是测试环境,
--//如果在生产系统执行tpt fcha.sql脚本会报错或者要么很慢,生产系统共享池很大,扫描会很慢.我使用我改写fchaz,取消执行前的警
--//告提示,但是依旧很慢.
SYS@test> @ sharepool/shp4 7ff11e375f0 0
SYS@test> @ prxx
==============================
TEXT : parent handle address
KGLHDADR : 000007FF11E375F0
KGLHDPAR : 000007FF11E375F0
C40 : b97494666896ea8f
KGLHDLMD : 3
KGLHDPMD : 0
KGLHDIVC : 0
KGLOBHD0 : 00
KGLOBHD6 : 00
KGLOBHS0 : 0
KGLOBHS6 : 0
KGLOBT16 : 0
N0_6_16 : 0
N20 : 0
KGLNAHSH : 1888004588
KGLOBT03 :
KGLOBT09 : 0
PL/SQL procedure successfully completed.
--//有一点点诧异,并不是DEPTXXX对象,看到的对象是b97494666896ea8f.
--//注:如果生产系统执行以上语句可能报错或者很慢,实际上选择hash_value=1888004588,使用shp4x.sql脚本查询,就没有这个问题.
--//后记:实际上测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,可是实际的情况是KGLNAHSH=
--//1888004588.为什么?不过这里出现一个线索,就是KGLNAHSH=1888004588,正好对应前面library cache: mutex X的P1.
--//换一句话讲问题出在hash_value=1888004588,对象是b97494666896ea8f,这个是什么?这个对象字符串长度正好16位,感觉就像
--//full_hash_value取的后16位(16进制),与sql_id类似(注:sql_id为32进制)
--//b97494666896ea8f = 13363469161877400207
--//6896ea8f = 1754720911
--//取后8位计算的结果正好是1754720911,等于前面sql语句的hash_value.注意前面 @sharepool/shp4x bkx4nctn9dung 0的输出,
--//KGLNAHSH=1754720911.
--//可以推测在这个对象b97494666896ea8f存在争用,并且这个值就是等于前面错误sql语句的sql_id值,仅仅sql_id是32进制罢了.
--//可以执行如下验证:
$ ./sql_idz.sh "select count(*) from deptxxx" 0
sql_text = select count(*) from deptxxx\0
full_hash_value(16) = 2A80660E44510A13B97494666896EA8F or 2a80660e44510a13b97494666896ea8f
hash_value(10) = 1754720911
~~~~~~~~~~~~~~~~~~~~~~~~~~
sql_id(32) = bkx4nctn9dung
sql_id(32) = bkx4nctn9dung
sql_id(32) = bkx4nctn9dung
--//full_hash_value(16) = 2A80660E44510A13B97494666896EA8F 后16位就是 b97494666896ea8f
--//执行我以前写查看library cache lock的lcl.sql脚本:
SYS@test> @ lcl
display library cache lock problem
INST_ID SADDR HANDLE MOD REQ OBJECT SQL_ID HASH_VALUE KGLNAOBJ USER_NAME C50
------- ---------------- ---------------- --- --- ---------------- ------------- ---------- ---------------- --------- ------------------------------------------------
1 000007FF09CFB988 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '256,50011' immediate;
1 000007FF0A39C278 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '188,58372' immediate;
1 000007FF09CF22A8 000007FF11E375F0 2 0 b97494666896ea8f 1fuagn5s8jagc 1888004588 b97494666896ea8f SCOTT alter system kill session '260,29617' immediate;
--//注意看这个HASH_VALUE=1888004588,KGLNAOBJ=b97494666896ea8f 对应 library cache: mutex X的P1=1888004588
--//x$kgllk视图存在KGLLKSQLID,对应sql_id并且有值,奇怪x$kglob没有值.
display wait library cache lock
SID SERIAL# SPID PID P_SERIAL# EVENT C50
---- ------- ---- ------- --------- ------------------ --------------------------------------------------
263 6846 5056 43 151 library cache lock alter system kill session '263,6846' immediate;
177 59369 6688 46 30 library cache lock alter system kill session '177,59369' immediate;
14 34381 7096 64 10 library cache lock alter system kill session '14,34381' immediate;
186 34958 5812 66 16 library cache lock alter system kill session '186,34958' immediate;
260 29617 6916 67 10 library cache lock alter system kill session '260,29617' immediate;
17 44622 7136 68 8 library cache lock alter system kill session '17,44622' immediate;
99 51439 6292 69 6 library cache lock alter system kill session '99,51439' immediate;
188 58372 652 70 6 library cache lock alter system kill session '188,58372' immediate;
20 52459 4128 72 9 library cache lock alter system kill session '20,52459' immediate;
9 rows selected.
SYS@test> @ sql_id 1fuagn5s8jagc
--SQL_ID = 1fuagn5s8jagc
--//还是看不到sql语句.也许这是问题的难点,实际上如果像平常那样处理,kill掉这些进程没有任何用处,因为程序一旦登陆问题再现.
--//实际上如果当时查询x$kglob视图,KGLOBT03=bkx4nctn9dung,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立.
--//看看这里的sql_id=1fuagn5s8jagc 如何计算.
--//看看里面的内容:
SYS@test> @ fchaz 7ff11e375f0
LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END
--- ---------------- -------------- -------------- ---------------- -------------- -------- -------------- ---------------- -----------------
SGA 000007FF11E375C0 1 1 KGLHD 816 recr 80 00 000007FF11E378EF
SYS@test> oradebug setmypid
Statement processed.
SYS@test> oradebug peek 0x000007FF11E375C0 816
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946570 00000000 11A4DB60 000007FF 14B840D0 000007FF 00000003 00080050 034EF508 000007FF ..
SYS@test> @ ti
New tracefile_identifier = D:\APP\ORACLE\diag\rdbms\test\test\trace\test_ora_3940_0001.trc
SYS@test> oradebug peek 0x000007FF11E375C0 816 1
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946458 00000000 0A946480 00000000 14B840D0 000007FF 00000003 00080050 034EF508 000007FF ...
--//这样看信息不全,要看跟踪文件,最后的参数1写入跟踪文件.816表示查看长度.
SYS@test> @t
TRACEFILE
-----------------------------------------------------------------
D:\APP\ORACLE\diag\rdbms\test\test\trace\test_ora_3940_0001.trc
*** 2024-05-12T07:34:45.801424+08:00 (CDB$ROOT(1))
Processing Oradebug command 'peek 0x000007FF11E375C0 816 1'
[7FF11E375C0, 7FF11E378F0) = 00000331 80B38F00 11E373C0 000007FF 0A946458 ...
Dump of memory from 0x7FF11E375D4 to 0x7FF11E378F0
7FF11E375D0 00000000 0A946480 00000000 [.....d......]
7FF11E375E0 14B840D0 000007FF 00000003 00080050 [.@..........P...]
7FF11E375F0 034EF508 000007FF 034EF508 000007FF [..N.......N.....]
7FF11E37600 00000000 00000000 11E37758 000007FF [........Xw......]
7FF11E37610 00030052 00010000 00000000 00000000 [R...............]
7FF11E37620 00010001 00000000 00000000 00000000 [................]
7FF11E37630 00000000 00000000 00000000 00000000 [................]
7FF11E37640 11E377D0 000007FF 00000001 00000000 [.w..............]
7FF11E37650 11E37650 000007FF 11E37650 000007FF [Pv......Pv......]
7FF11E37660 11E37660 000007FF 11E37660 000007FF [`v......`v......]
7FF11E37670 15EBC6E0 000007FF 15EBC6E0 000007FF [................]
7FF11E37680 15C3CE40 000007FF 1198B988 000007FF [@...............]
7FF11E37690 11E37690 000007FF 11E37690 000007FF [.v.......v......]
7FF11E376A0 00000000 00000000 00000000 00000000 [................]
7FF11E376B0 17694FA1 00000000 0000A9EC 00000000 [.Oi.............]
7FF11E376C0 11E37740 000007FF 11E376C8 000007FF [@w.......v......]
7FF11E376D0 11E376C8 000007FF 11E376D8 000007FF [.v.......v......]
7FF11E376E0 11E376D8 000007FF 00000000 00000000 [.v..............]
7FF11E376F0 03081E00 000007FF 00000000 00000000 [................]
7FF11E37700 00000000 00000000 00157212 00000000 [.........r......]
7FF11E37710 00157214 00157214 00000000 00000000 [.r...r..........]
7FF11E37720 00000000 00000000 00000000 00000000 [................]
7FF11E37730 00000000 00000003 AE4581A3 00000000 [..........E.....]
7FF11E37740 00000000 000000B1 00F043A3 001AF790 [.........C......]
7FF11E37750 7088A9EC 00000000 F3B5EABA 290512F5 [...p...........)]
7FF11E37760 17694FA1 7088A9EC 00000000 00000000 [.Oi....p........]
7FF11E37770 00000000 00000000 00000000 00000000 [................]
7FF11E37780 00000010 00000000 00000007 00000700 [................]
7FF11E37790 00000000 00000000 11E377B0 000007FF [.........w......]
7FF11E377A0 00000000 00000000 00000000 00000000 [................]
7FF11E377B0 34373962 36363439 36393836 66386165 [b97494666896ea8f]
7FF11E377C0 49554224 5424444C 30545345 00005031 [$BUILD$TEST01P..]
7FF11E377D0 00000000 00000000 00000000 00000000 [................]
Repeat 17 times
--//注意看后面的信息.
7FF11E377B0 34373962 36363439 36393836 66386165 [b97494666896ea8f]
7FF11E377C0 49554224 5424444C 30545345 00005031 [$BUILD$TEST01P..]
--//就是某个对象.
SYS@test> select * from v$db_object_cache where name = 'b97494666896ea8f'
2 @prxx
==============================
OWNER : $BUILD$
NAME : b97494666896ea8f
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 1
PINS : 0
KEPT : NO
CHILD_LATCH : 43500
INVALIDATIONS : 0
HASH_VALUE : 1888004588
LOCK_MODE : EXCLUSIVE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 2140477
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : f3b5eaba290512f517694fa17088a9ec
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF11E375F0
EDITION :
PL/SQL procedure successfully completed.
--//NAMESPACE=SQL AREA BUILD,select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 order by 1;
--//kglhdnsp=82 , 对应8进制是\01222,16进制0x52(字符R)
--//1888004588%2^17 = 43500,对应CHILD_LATCH =43500.
--//注意:这个视图没有sql_id字段,感觉oracle设计这个视图不合理,应该把sql_id加入进来,对应x$kglob.KGLOBT03.
--//我的感觉这个视图一直存在,而从9i开始,oracle才使用sql_id标识sql语句,oracle为什么不加入该字段.
--//注意OWNER=$BUILD$,开始乱猜使用SCOTT计算sql_id不对.使用我以前写sql_idx.sh脚本.
$ echo -n -e 'b97494666896ea8f.$BUILD$.TEST01P' > bb.txt
$ ./sql_idx.sh bb.txt 1
sql_text = b97494666896ea8f.$BUILD$.TEST01P\0122\0\0\0
full_hash_value(16) = F3B5EABA290512F517694FA17088A9EC
hash_value(10) = 1888004588
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
--//基本都能对上前面select * from v$db_object_cache where name = 'b97494666896ea8f'的输出.
--//注:我改写我原来写的sql_idx.sh脚本,因为如果cat 文件里面包含chr(0),cat的输出就忽略掉,准备重写该脚本
--//有点教条了了,可以执行如下:
$ sql_idz.sh 'b97494666896ea8f.$BUILD$.TEST01P\0122\0\0'
sql_text = b97494666896ea8f.$BUILD$.TEST01P\0122\0\0\0
full_hash_value(16) = F3B5EABA290512F517694FA17088A9EC or f3b5eaba290512f517694fa17088a9ec
hash_value(10) = 1888004588
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
sql_id(32) = 1fuagn5s8jagc
4.总结:
--//我已经基本展示生产系统出现的问题,还是不明白为什么这样会产生大量的子光标.
--//在分析sql语句时或许与产生大量子光标有关,在对象b97494666896ea8f上存在争用,NAMESPACE = SQL AREA BUILD,导致
--//出现library cache: mutex X,library cache lock.
--//另外我做了测试,如果打开几个回话,手工执行select count(*) from deptxxx;当然不是并发执行,并没有产生多个子光标的情况.
--//也许问题就出在密集的执行产生大量子光标上面.
--//在测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,甚至在shared pool latch上存在争用,
--//因为错误的sql语句每次存在硬解析,实际的情况都不是,而在NAMESPACE = SQL AREA BUILD,建立的对象上存在争用.
--//还有一些细节我需要验证,好好消化,写的有点乱...
--//我估计同事看到sql_id后查询不到对应sql语句,而不断的kill进程并不会有助于问题解决,在这上面浪费大量时间,而且实际上
--//阻塞的时间都很短.
--//实际上只要查询底层的x$kglob的,KGLOBT03=bkx4nctn9dung,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立.
--//实际上还可以查询v$db_object_cache的HASH_VALUE(它可以通过sql_id获得),获得对应sql语句,该视图的底层就是x$kglob.
5.补充测试:
--//手工执行4次:
select count(*) from deptxxx;
SYS@test> select * from v$db_object_cache where name = 'b97494666896ea8f'
2 @ prxx
==============================
OWNER : $BUILD$
NAME : b97494666896ea8f
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 43500
INVALIDATIONS : 0
HASH_VALUE : 1888004588
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : f3b5eaba290512f517694fa17088a9ec
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF15958380
EDITION :
PL/SQL procedure successfully completed.
--//只要执行就会出现这个对象.
--//手工执行4次:
select count(*) from dept;
SCOTT@test01p> @ hash
HASH_VALUE SQL_ID CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX SQL_EXEC_START SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
761178024 1jrz3ucqpx9x8 0 42920 3051237957 2d5ea7a8 2024-05-15 21:14:52 16777219
SYS@test> select FULL_HASH_VALUE from v$db_object_cache where hash_value = '761178024' and rownum=1;
FULL_HASH_VALUE
--------------------------------
aba620d08eb373c818dfe3d32d5ea7a8
SYS@test> select * from v$db_object_cache where name='18dfe3d32d5ea7a8'
2 @ prxx
==============================
OWNER : $BUILD$
NAME : 18dfe3d32d5ea7a8
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 45772
INVALIDATIONS : 0
HASH_VALUE : 4236817100
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : 87defd2f856915dc18d747fdfc88b2cc
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF1611C200
EDITION :
PL/SQL procedure successfully completed.
--//无论我执行select count(*) from dept;,LOCKED_TOTAL =1,即使新用户登陆执行LOCKED_TOTAL=1不变.
--//如果建立新的子光标
SCOTT@test01p> alter session set optimizer_index_cost_adj=99;
Session altered.
SCOTT@test01p> select count(*) from dept;
COUNT(*)
----------
4
SYS@test> @prxx
==============================
OWNER : $BUILD$
NAME : 18dfe3d32d5ea7a8
DB_LINK :
NAMESPACE : SQL AREA BUILD
TYPE : CURSOR
SHARABLE_MEM : 0
LOADS : 0
EXECUTIONS : 0
LOCKS : 0
PINS : 0
KEPT : NO
CHILD_LATCH : 45772
INVALIDATIONS : 0
HASH_VALUE : 4236817100
LOCK_MODE : NONE
PIN_MODE : NONE
STATUS : UNKOWN
TIMESTAMP :
PREVIOUS_TIMESTAMP :
LOCKED_TOTAL : 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PINNED_TOTAL : 0
PROPERTY :
FULL_HASH_VALUE : 87defd2f856915dc18d747fdfc88b2cc
CON_ID : 3
CON_NAME : TEST01P
ADDR : 000007FF1659A580
EDITION :
PL/SQL procedure successfully completed.
--//LOCKED_TOTAL =2.
--//这样问题就很明显解析的通生产系统遇到的问题,一条不能执行sql语句,密集执行就会在这个对象造成阻塞.
--//即使没有产生子光标这个因素,都是如此.
SYS@test> select count(*) from v$db_object_cache where length(name)=16 and NAMESPACE='SQL AREA BUILD';
COUNT(*)
----------
215
--//有215个类似对象.
SYS@test> select name,substr(name,-8,8),to_number(substr(name,-8,8),lpad('x',8,'x')) hash_value from v$db_object_cache where length(name)=16 and NAMESPACE='SQL AREA BUILD' and rownum<=10;
NAME SUBSTR(NAME,-8,8 HASH_VALUE
-------------------- ---------------- ----------
b4035ec136571959 36571959 911677785
55d317c9fd64f69a fd64f69a 4251252378
7dea5291a4131ac2 a4131ac2 2752715458
b6917fa597638e12 97638e12 2539884050
54aafc527339dee7 7339dee7 1933172455
1ae760adac0ea9ca ac0ea9ca 2886642122
cc52a92b7e2ba318 7e2ba318 2116789016
f2ff6345225ba386 225ba386 576430982
be2e6ed6e2ce0906 e2ce0906 3805153542
06bbaef78b718ad6 8b718ad6 2339474134
10 rows selected.
SYS@test> @ sharepool/shp4x 0 911677785
TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09
--------------------- ---------------- ---------------- ---------------------------------------- ---------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
child handle address 000007FF155A04B8 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF11DB70E8 000007FF15E5B6A8 4032 8088 3915 16035 16035 911677785 b80uys4v5f6at 0
child handle address 000007FF15E0EFF0 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF157D6CA8 000007FF15E5BA40 4032 8088 3915 16035 16035 911677785 b80uys4v5f6at 1
parent handle address 000007FF11D87948 000007FF11D87948 select count(*) from SYS.chnf$_reg_queri 1 0 0 000007FF11D87788 00 4072 0 0 4072 4072 911677785 b80uys4v5f6at 65535
--//我尝试以上都能找到类似的语句.
6.贴一段生产系统看到的情况.
> @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'"
Total
Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN
------- ----- ----- ---------------------- ------------------- -------------------
8164610 913.0 76% library cache lock 2024-04-26 08:23:22 2024-04-26 09:17:44
2587860 289.4 24% library cache: mutex X 2024-04-26 08:23:22 2024-04-26 09:17:44
10 .0 0% library cache pin 2024-04-26 10:00:40 2024-04-26 10:00:40
--//出现问题时间点2024-04-26 08:23:22,到2024-04-26 09:17:44问题解决.后面library cache pin仅仅出现1次,可以不用考虑.
> @ dashtop event,sql_id "event='library cache lock'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 08:30:00'"
Total
Seconds AAS %This EVENT SQL_ID FIRST_SEEN LAST_SEEN
------- ----- ----- ------------------ ------------- ------------------- -------------------
554750 318.3 96% library cache lock arw6k2rf5s29b 2024-04-26 08:23:22 2024-04-26 08:29:55
13880 8.0 2% library cache lock bakhcg5b1749p 2024-04-26 08:23:22 2024-04-26 08:29:55
2580 1.5 0% library cache lock ddd4xgabw2tct 2024-04-26 08:23:33 2024-04-26 08:29:55
2440 1.4 0% library cache lock 306sjy7n4m9bk 2024-04-26 08:23:22 2024-04-26 08:29:55
2300 1.3 0% library cache lock av7t0b1v7jn1m 2024-04-26 08:23:22 2024-04-26 08:29:55
440 .3 0% library cache lock 69zcqdq405z1h 2024-04-26 08:25:31 2024-04-26 08:29:55
420 .2 0% library cache lock dgvzbsr848vam 2024-04-26 08:25:09 2024-04-26 08:29:55
410 .2 0% library cache lock bg1t2pc5ym090 2024-04-26 08:24:15 2024-04-26 08:29:55
350 .2 0% library cache lock 0j7bs1g3fg66x 2024-04-26 08:23:33 2024-04-26 08:29:55
310 .2 0% library cache lock 7yhjdqcg8wzny 2024-04-26 08:23:22 2024-04-26 08:29:55
300 .2 0% library cache lock 4f03m8k9vgb6r 2024-04-26 08:23:43 2024-04-26 08:29:55
250 .1 0% library cache lock 9bsd8hfq3cf81 2024-04-26 08:23:54 2024-04-26 08:29:55
200 .1 0% library cache lock 5xw2vsz1868zh 2024-04-26 08:25:31 2024-04-26 08:29:55
20 .0 0% library cache lock 899c5c7ug4dp4 2024-04-26 08:23:22 2024-04-26 08:23:22
14 rows selected.
> @ sqlhh arw6k2rf5s29b 100
time unit : millisecond
BEGIN_INTERVAL_TIME INST_ID SQL_ID PLAN_HASH_VALUE EXECUTIONS ELA_MS_PER_EXEC CPU_MS_PER_EXEC ROWS_PER_EXEC LIOS_PER_EXEC BLKRD_PER_EXEC IOW_MS_PER_EXEC AVG_IOW_MS CLW_MS_PER_EXEC APW_MS_PER_EXEC CCW_MS_PER_EXEC
------------------- ------- ------------- --------------- ---------- --------------- --------------- ------------- ------------- -------------- --------------- ----------- --------------- --------------- ---------------
2024-04-26 11:00:15 1 arw6k2rf5s29b 3673589541 135628 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 12:00:17 1 arw6k2rf5s29b 3673589541 125952 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 13:00:36 1 arw6k2rf5s29b 3673589541 117259 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 14:00:13 1 arw6k2rf5s29b 3673589541 121757 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 15:00:15 1 arw6k2rf5s29b 3673589541 131134 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-26 16:00:35 1 arw6k2rf5s29b 3673589541 135313 0 0 0.0 0 0 0 0.0 0 0 0
...
2024-04-29 02:00:20 1 arw6k2rf5s29b 2199259672 54386 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 03:00:39 1 arw6k2rf5s29b 2199259672 52685 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 04:00:08 1 arw6k2rf5s29b 2199259672 52485 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 05:00:17 1 arw6k2rf5s29b 2199259672 52623 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 06:00:37 1 arw6k2rf5s29b 2199259672 51764 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 07:00:04 1 arw6k2rf5s29b 2199259672 56257 0 0 0.0 0 0 0 0.0 0 0 0
2024-04-29 09:00:35 1 arw6k2rf5s29b 2199259672 127381 0 0 0.0 0 0 0 0.0 0 0 0
47 rows selected.
--//135628/3600 = 37.6次/秒.
--//可以发现1个细节,我无法解析,9:17恢复系统,很奇怪的是9:17-11:00并没有大量执行.而之后有记录.
--//要么就是9,10点执行次数很少,没有写入awr相关历史表中.
--//最奇怪的涉及查询返回0,我查看相关表现在的情况表中没有任何记录,似乎再2024-04-29 09:00:35之后某个时刻发现问题,不再执行.
--//我估计取消了升级与改动,我们团队缺乏必要的交流与沟通,根本不知道开发做了什么.
5.相关脚本:
--//lcl.sql
PROMPT
PROMPT display library cache lock problem
PROMPT
column object format a20
column user_name format a20
SELECT inst_id
,kgllkses saddr
,kgllkhdl handle
,kgllkmod MOD
,kgllkreq REQ
,kglnaobj object
,KGLLKSQLID sql_id
,kglnahsh hash_value
,KGLNAOBJ
,user_name
, 'alter system kill session '''
|| s.sid
|| ','
|| s.serial#
|| ''''
|| ' immediate;'
c50
--,lock_a.*
FROM x$kgllk lock_a, v$session s
WHERE s.saddr = lock_a.kgllkses and
kgllkmod > 0
AND EXISTS
(SELECT lock_b.kgllkhdl
FROM x$kgllk lock_b
WHERE kgllkses IN (SELECT saddr
FROM v$session
WHERE event like 'library cache lock') /* blocked session */
--WHERE event like 'library cache pin') /* blocked session */
AND lock_a.kgllkhdl = lock_b.kgllkhdl
AND kgllkreq > 0);
PROMPT
PROMPT display wait library cache lock
PROMPT
SELECT s.sid
,s.serial#
,p.spid
,p.pid
,p.serial# p_serial#
,s.event
, 'alter system kill session '''
|| s.sid
|| ','
|| s.serial#
|| ''''
|| ' immediate;'
c50
FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND s.event= 'library cache lock';
--AND s.event= 'library cache pin';
--//shp4.sql
column N0_6_16 format 99999999
column fcura_addrlen new_value _fcura_addrlen
set termout off
select vsize(addr)*2 fcura_addrlen from x$dual;
set termout on
SELECT DECODE (kglhdadr,
kglhdpar, 'parent handle address',
'child handle address')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
KGLHDLMD,
KGLHDPMD,
kglhdivc,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03 ,
kglobt09
FROM x$kglob
WHERE
KGLHDPAR = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLHDADR = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBHD0 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD1 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD2 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD3 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD4 = lpad(upper('&1'), &_fcura_addrlen, '0')
--or KGLOBHD5 = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBHD6 = lpad(upper('&1'), &_fcura_addrlen, '0')
or KGLOBT03 = lower('&1')
or KGLNAHSH= &2;
--//shp4x.sql
column N0_6_16 format 99999999
SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr,
kglhdpar, 'parent handle address',
'child handle address')
text,
kglhdadr,
kglhdpar,
substr(kglnaobj,1,40) c40,
KGLHDLMD,
KGLHDPMD,
kglhdivc,
kglobhd0,
kglobhd6,
kglobhs0,kglobhs6,kglobt16,
kglobhs0+kglobhs6+kglobt16 N0_6_16,
kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
kglnahsh,
kglobt03,
kglobt09
FROM x$kglob
WHERE kglobt03 = lower('&1') or KGLNAHSH= &2;