[20240510]SQL语句存在问题与共享池内存分配.txt

[20240510]SQL语句存在问题与共享池内存分配.txt

--//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql
--//语句.很奇怪的是应用前台根本不报错,真不知道开发如何写代码,难道没做例外处理以及记录吗?
--//测试看看如果sql语句存在问题,是否消耗共享池以及parse的情况以及对性能相关问题.

1.环境:
SCOTT@test01p> @ver1
PORT_STRING                    VERSION        BANNER                                                                               CON_ID
------------------------------ -------------- -------------------------------------------------------------------------------- ----------
IBMPC/WIN_NT64-9.1.0           12.2.0.1.0     Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0

2.测试:
SCOTT@test01p> select sql_id from v$sql where sql_text='select count(1) from deptxxx';
no rows selected
--//相关语句不在共享池,并且deptxxx表不存在.

SCOTT@test01p> select count(1) from deptxxx;
select count(1) from deptxxx
                     *
ERROR at line 1:
ORA-00942: table or view does not exist
--//表deptxxx不存在!!

SCOTT@test01p> @ hash
HASH_VALUE SQL_ID        CHILD_NUMBER KGL_BUCKET PLAN_HASH_VALUE HASH_HEX   SQL_EXEC_START      SQL_EXEC_ID
---------- ------------- ------------ ---------- --------------- ---------- ------------------- -----------
3744802176 07v29cmgma9c0            0      75136       903671040  df352580  2024-05-10 22:36:28    16777216

SCOTT@test01p> @ sql_id 07v29cmgma9c0
--SQL_ID = 07v29cmgma9c0
select sql_id from v$sql where sql_text='select count(1) from deptxxx';
--//是前一次正确执行的sql_id,这样无法获得无法执行sql语句的sql_id.

SCOTT@test01p> select sql_id from v$sqlarea where sql_text='select count(1) from deptxxx';
no rows selected

SCOTT@test01p> select sql_id from v$sql where sql_text='select count(1) from deptxxx';
no rows selected
--//查询v$sqlarea,v$sql视图根本不能发现对应sql_id.

--//使用我写的脚本计算sql_id与hash_value:
$ ./sql_idx.sh 'select count(1) from deptxxx'
sql_text = select count(1) from deptxxx\0
full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F
hash_value(10) = 2945576271
sql_id(32) = 2utgtqyrt3vag
sql_id(32) = 2utgtqyrt3vag
sql_id(32) = 2utgtqyrt3vag

SYS@test> @ sharepool/shp4x 2utgtqyrt3vag 0
TEXT                  KGLHDADR         KGLHDPAR         C40                          KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0         KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16  N20   KGLNAHSH KGLOBT03      KGLOBT09
--------------------- ---------------- ---------------- ---------------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- ------- ---- ---------- ------------- --------
child handle address  000007FF121630F0 000007FF16537BC8 select count(1) from deptxxx        0        0        3 00               00              0        0     3165    3165 3165 2945576271 2utgtqyrt3vag        0
parent handle address 000007FF16537BC8 000007FF16537BC8 select count(1) from deptxxx        1        0        3 000007FF12E82EA0 00           4072        0        0    4072 4072 2945576271 2utgtqyrt3vag    65535
--//可以发现这种情况一样建立父子光标,即使sql语句执行错误.不过子光标的堆0,堆6不存在罢了.
--//注:可以查询v$db_object_cache视图,比如name(对应sql文本)里面的特征字符,不过如果是生产系统,我估计会很慢!!
--//说明即使无法正确执行的sql语句也会消耗共享池内存.

3.看看parse的情况:
--//session 1:
SCOTT@test01p(265,16062)> @ spid
SID SERIAL# PROCESS   SERVER    SPID PID  P_SERIAL# C50
--- ------- --------- --------- ---- --- ---------- --------------------------------------------------
265   16062 9104:6104 DEDICATED 9516  59          8 alter system kill session '265,16062' immediate;

--//session 2:
SYS@test> @ ses2 265 "parse count"
SID NAME                   VALUE
--- ---------------------- -----
265 parse count (total)      729
265 parse count (hard)       266
265 parse count (failures)    31
--//session 1:
--//执行3次,输出略.前面parse count (failures)=31,说明我已经重复执行多次.
select count(1) from deptxxx;

--//session 2:
SYS@test> @ ses2 265 "parse count"
SID NAME                   VALUE
--- ---------------------- -----
265 parse count (total)      732
265 parse count (hard)       269
265 parse count (failures)    34

--//可以发现parse count (failures),parse count (hard),parse count (total)各自增加3次.
--//可以想象,如果密集的执行这些sql语句将是一场"灾难",如果生产系统这些语句执行频繁,至少出现大量分析,并且因为hash_value一样,
--//会在相同bucket上出现争用,出现大量library cache: mutex X,shared pool latch争用,但是不明白生产系统还会出现library
--//cache lock.
--//注:事后测试发现,情况并不是那样,没有出现大量shared pool latch争用,另外写blog分析.

--//生产系统的当时的情况:
> @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'"
  Total
Seconds     AAS %This   EVENT                  FIRST_SEEN          LAST_SEEN
------- ------- ------- ---------------------- ------------------- -------------------
8164610   913.0   76%   library cache lock     2024-04-26 08:23:22 2024-04-26 09:17:44
2587860   289.4   24%   library cache: mutex X 2024-04-26 08:23:22 2024-04-26 09:17:44
     10      .0    0%   library cache pin      2024-04-26 10:00:40 2024-04-26 10:00:40
--//问题出现在2024-04-26 08:23:22,在2024-04-26 09:17:44问题解决.
--//同事分析缺乏经验,实际上知道sql_id,如果能知道sql语句,随便执行看看或者查看对象就可以定位问题.
--//当然定位不是查询gv$sqlarea,gv$sql视图,而是查询x$kglob底层结构.

4.继续:
--//session 1:
SCOTT@test01p(265,16062)> select count(1) from;
select count(1) from
                   *
ERROR at line 1:
ORA-00903: invalid table name

--//执行多次,看看这样不完整的sql语句的情况.

$ ./sql_idx.sh 'select count(1) from'
sql_text = select count(1) from\0
full_hash_value(16) = 17A4F71D1DD4E12BAB0FAF270B2672EB
hash_value(10) = 187069163
sql_id(32) = aq3xg4w5kcwrb
sql_id(32) = aq3xg4w5kcwrb
sql_id(32) = aq3xg4w5kcwrb

SYS@test> @ sharepool/shp4x aq3xg4w5kcwrb 0
TEXT                  KGLHDADR         KGLHDPAR         C40                  KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0         KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16  N20  KGLNAHSH KGLOBT03      KGLOBT09
--------------------- ---------------- ---------------- -------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- ------- ---- --------- ------------- --------
child handle address  000007FF008F2D88 000007FF12F38968 select count(1) from        0        0        4 00               00              0        0     3157    3157 3157 187069163 aq3xg4w5kcwrb        0
parent handle address 000007FF12F38968 000007FF12F38968 select count(1) from        1        0        4 000007FF12B33EE8 00           4072        0        0    4072 4072 187069163 aq3xg4w5kcwrb    65535
--//情况与上面测试类似.

--//做这个测试主要原因在生产系统发现如下:
--//主要原因在生产系统看到如下:
> @ dashtop event,p1,p3 "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 09:30:00'"
  Total
Seconds    AAS %This   EVENT                            P1              P3 FIRST_SEEN          LAST_SEEN
------- ------ ------- ---------------------- ------------ --------------- ------------------- -------------------
8163900 1528.0   76%   library cache lock      81867324184         5373954 2024-04-26 08:23:22 2024-04-26 09:17:44
1545660  289.3   14%   library cache: mutex X   3802446058              82 2024-04-26 08:23:22 2024-04-26 09:17:44
1041590  194.9   10%   library cache: mutex X   3802446058             119 2024-04-26 08:23:33 2024-04-26 09:17:44
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    700     .1    0%   library cache lock      81867324184         5373955 2024-04-26 08:23:22 2024-04-26 09:17:44
    490     .1    0%   library cache: mutex X   3802446058             106 2024-04-26 08:25:31 2024-04-26 09:17:44
     40     .0    0%   library cache: mutex X   3771887287               1 2024-04-26 09:17:11 2024-04-26 09:17:44
     30     .0    0%   library cache: mutex X   3802446058             124 2024-04-26 08:46:19 2024-04-26 08:58:18
     20     .0    0%   library cache: mutex X   3802446058              85 2024-04-26 09:17:11 2024-04-26 09:17:22
     10     .0    0%   library cache lock      81864522800 400827822964738 2024-04-26 08:43:39 2024-04-26 08:43:39
     10     .0    0%   library cache: mutex X        37799              49 2024-04-26 08:36:47 2024-04-26 08:36:47
     10     .0    0%   library cache: mutex X       106013              49 2024-04-26 08:47:04 2024-04-26 08:47:04
     10     .0    0%   library cache: mutex X       121048              49 2024-04-26 08:36:37 2024-04-26 08:36:37
12 rows selected.

> @ sharepool/shp4x 0 3802446058
> @ pr
==============================
TEXT        : parent handle address
KGLHDADR    : 00000013077E9500
KGLHDPAR    : 00000013077E9500
C40         : select count(1) from
KGLHDLMD    : 0
KGLHDPMD    : 0
KGLHDIVC    : 0
KGLOBHD0    : 00
KGLOBHD6    : 00
KGLOBHS0    : 0
KGLOBHS6    : 0
KGLOBT16    : 0
N0_6_16     : 0
N20         : 0
KGLNAHSH    : 3802446058
~~~~~~~~~~~~~~~~~~~~~~
KGLOBT03    :            --//没有值.  
KGLOBT09    : 65535
PL/SQL procedure successfully completed.
--//小心,在生产系统访问x$kglob可能存在严重性能问题,不行快速kill相关进程.包括下面测试使用的fchaz.sql脚本.
--//我的查询可以使用索引估计问题不大!!
--//参数1 sql_id 参数2 hash_value.

> select replace(kglnaobj,chr(13),' ')  c100 ,length(kglnaobj)  n10 from x$kglob where KGLHDPAR=hextoraw('00000013077E9500');
C100                  N10
--------------------- ---
select count(1) from   20
--//长度确实是20,但是KGLNAHSH=3802446058,完全与我的测试对不上,我的测试是KGLNAHSH=187069163.这个问题先放一放.

5.继续看看对象deptxxx:
--//继续昨天的测试,在scottt用户下执行select count(1) from deptxxx;多次.

SYS@test> select * from v$open_cursor where sid=263 and sql_text like '%count%';
no rows selected
--//语句执行有问题,光标不会缓存.

SYS@test> select * from v$db_object_cache where name = 'DEPTXXX'
  2  @ prxx
==============================
OWNER                         : SCOTT
NAME                          : DEPTXXX
DB_LINK                       :
NAMESPACE                     : TABLE/PROCEDURE
TYPE                          : CURSOR
SHARABLE_MEM                  : 0
LOADS                         : 1
EXECUTIONS                    : 0
LOCKS                         : 0
PINS                          : 0
KEPT                          : NO
CHILD_LATCH                   : 77291
INVALIDATIONS                 : 0
HASH_VALUE                    : 1772563947
LOCK_MODE                     : NONE
PIN_MODE                      : NONE
STATUS                        : UNKOWN
TIMESTAMP                     :
PREVIOUS_TIMESTAMP            :
LOCKED_TOTAL                  : 4
PINNED_TOTAL                  : 4
PROPERTY                      :
FULL_HASH_VALUE               : ef603aaa09d90710c71ba16b69a72deb
CON_ID                        : 3
CON_NAME                      : TEST01P
ADDR                          : 000007FF14929138
EDITION                       :
==============================
OWNER                         : PUBLIC
NAME                          : DEPTXXX
DB_LINK                       :
NAMESPACE                     : TABLE/PROCEDURE
TYPE                          : CURSOR
SHARABLE_MEM                  : 0
LOADS                         : 1
EXECUTIONS                    : 0
LOCKS                         : 0
PINS                          : 0
KEPT                          : NO
CHILD_LATCH                   : 104220
INVALIDATIONS                 : 0
HASH_VALUE                    : 2989463324
LOCK_MODE                     : NONE
PIN_MODE                      : NONE
STATUS                        : UNKOWN
TIMESTAMP                     :
PREVIOUS_TIMESTAMP            :
LOCKED_TOTAL                  : 4
PINNED_TOTAL                  : 4
PROPERTY                      :
FULL_HASH_VALUE               : bee0db68379be71263a53e5fb22f971c
CON_ID                        : 3
CON_NAME                      : TEST01P
ADDR                          : 000007FEFFF8AF58
EDITION                       :
PL/SQL procedure successfully completed.
--//可以发现即使对象不存在,也会加载在共享池.一个owner=SCOTT,另外一个PUBLIC.oracle这样的目的是表示对象是否存在.

SYS@test> @ sharepool/shp4x 0 1772563947
TEXT                  KGLHDADR         KGLHDPAR         C40      KGLHDLMD   KGLHDPMD   KGLHDIVC KGLOBHD0         KGLOBHD6           KGLOBHS0   KGLOBHS6   KGLOBT16   N0_6_16        N20   KGLNAHSH KGLOBT03        KGLOBT09
--------------------- ---------------- ---------------- ------- --------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
parent handle address 000007FF14929138 000007FF14929138 DEPTXXX         0          0          0 00               00                        0          0          0         0          0 1772563947                        0

SYS@test> @ sharepool/shp4x 0 2989463324
TEXT                  KGLHDADR         KGLHDPAR         C40      KGLHDLMD   KGLHDPMD   KGLHDIVC KGLOBHD0         KGLOBHD6           KGLOBHS0   KGLOBHS6   KGLOBT16   N0_6_16        N20   KGLNAHSH KGLOBT03        KGLOBT09
--------------------- ---------------- ---------------- ------- --------- ---------- ---------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
parent handle address 000007FEFFF8AF58 000007FEFFF8AF58 DEPTXXX         0          0          0 00               00                        0          0          0         0          0 2989463324                        0
--//实际上v$db_object_cache视图来源就是x$kglob,v$db_object_cache.addr=KGLHDADR.
--//1772563947%2^17 = 77291  对应CHILD_LATCH = 77291
--//2989463324%2^17 = 104220 对应CHILD_LATCH = 104220

SYS@test> @ fchaz 000007FF14929138
LOC KSMCHPTR           KSMCHIDX   KSMCHDUR KSMCHCOM           KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR         KSMCHPTR_END
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ---------------- -----------------
SGA 000007FF14929108          1          1 KGLHD                   816 recr             80 00               000007FF14929437

SYS@test> @ fchaz 000007FEFFF8AF58
LOC KSMCHPTR           KSMCHIDX   KSMCHDUR KSMCHCOM           KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR         KSMCHPTR_END
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ---------------- -----------------
SGA 000007FEFFF8AF28          1          1 KGLHD                   816 recr             80 00               000007FEFFF8B257
--//2个对象消耗816字节.

6.结论:
--//可以看出几个特点即使sql语句存在问题,oracle还是消耗一定的共享池内存,建立父子光标,仅仅子光标堆0,堆6不存在.
--//并且每次执行都会执行产生1次硬分析.
--//相关表也会加载到共享池中.
--//下次测试环境下模拟生产系统大量执行这类sql语句时出现的情况.

7.附上shp4x.sql脚本:
$ cat shp4x.sql
column N0_6_16 format 99999999
SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr,
               kglhdpar, 'parent handle address',
               'child handle address')
    text,
       kglhdadr,
       kglhdpar,
       substr(kglnaobj,1,40) c40,
       KGLHDLMD,
       KGLHDPMD,
       kglhdivc,
       kglobhd0,
       kglobhd6,
       kglobhs0,kglobhs6,kglobt16,
       kglobhs0+kglobhs6+kglobt16 N0_6_16,
       kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
       kglnahsh,
       kglobt03,
       kglobt09  
  FROM x$kglob
 WHERE kglobt03 = lower('&1') or KGLNAHSH= &2;

--//fchaz.sql脚本来源tpt fcha.sql,我仅仅注解了里面的提示信息,增加一个显示字段KSMCHPTR_END.
--//TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
--//注:在生产系统执行要小心!!

-- Copyright 2018 Tanel Poder. All rights reserved. More info at http://tanelpoder.com
-- Licensed under the Apache License, Version 2.0. See LICENSE.txt for terms & conditions.

--------------------------------------------------------------------------------
--
-- File name:   fcha.sql (Find CHunk Address) v0.2
-- Purpose:     Find in which heap (UGA, PGA or Shared Pool) a memory address resides
--              
-- Author:      Tanel Poder
-- Copyright:   (c) http://blog.tanelpoder.com | @tanelpoder
--              
-- Usage:       @fchaz <addr_hex> --
--              @fchaz F6A14448  --
--
-- Other:       This would only report an UGA/PGA chunk address if it belongs
--              to *your* process/session (x$ksmup and x$ksmpp do not see other
--              session/process memory)
--              
--------------------------------------------------------------------------------

--prompt Find in which heap (UGA, PGA or Shared Pool) the memory address &1 resides...
--prompt
--prompt WARNING!!! This script will query X$KSMSP, which will cause heavy shared pool latch contention
--prompt in systems under load and with large shared pool. This may even completely hang
--prompt your instance until the query has finished! You probably do not want to run this in production!
--prompt
--pause  Press ENTER to continue, CTRL+C to cancel...

select
    'SGA' LOC,
    KSMCHPTR,
    KSMCHIDX,
    KSMCHDUR,
    KSMCHCOM,
    KSMCHSIZ,
    KSMCHCLS,
    KSMCHTYP,
    KSMCHPAR,
    TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
    x$ksmsp
where
    to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
    between
        to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
    and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
union all
select
    'UGA',
    KSMCHPTR,
    null,
    null,
    KSMCHCOM,
    KSMCHSIZ,
    KSMCHCLS,
    KSMCHTYP,
    KSMCHPAR,
    TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
    x$ksmup
where
    to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
    between
        to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
    and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
union all
select
    'PGA',
    KSMCHPTR,
    null,
    null,
    KSMCHCOM,
    KSMCHSIZ,
    KSMCHCLS,
    KSMCHTYP,
    KSMCHPAR,
    TO_CHAR(TO_NUMBER(KSMCHPTR,'XXXXXXXXXXXXXXXX') + KSMCHSIZ - 1,'FM0XXXXXXXXXXXXXXX') KSMCHPTR_END
from
    x$ksmpp
where
    to_number(substr('&1', instr(lower('&1'), 'x')+1) ,'XXXXXXXXXXXXXXXX')
    between
        to_number(ksmchptr,'XXXXXXXXXXXXXXXX')
    and to_number(ksmchptr,'XXXXXXXXXXXXXXXX') + ksmchsiz - 1
/
posted @ 2024-05-18 21:36  lfree  阅读(19)  评论(0编辑  收藏  举报