[原]由clob引发的性能问题所想到的

2009年最后一天下午比较闲,于是在生产库做了AWR report,看看生产库的性能如何,发现有一条Top SQL性能很差(貌似性能调优实例的开场白都是这样的),不多说了,上图看吧:

image

imageimage image

从这张图来看最Top 的SQL就是那条“ SELECT ‘receivedoc’,( ”了。

语句摘录如下:

SELECT 
'receivedoc', 
( 
    select COUNT(*) from 
    ( 
        select 
        F_CURACTANDUSERIDLIST, 
        CAST(SUBSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;') - 1), 
        INSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;')), ';', -1) + 1) 
        as VARCHAR2(4000)) as F_ActId from TD_ReceiveDoc 
     ) A 
     join Td_Activity B 
     on A.F_ActId = B.F_Id 
     where dbms_lob.instr(F_CURACTANDUSERIDLIST, '*599;', 1, 1 ) > 0 
) AS ReceiveDoc, 
'senddoc', 
( 
     SELECT COUNT(*) FROM TD_SendDoc 
     WHERE DBMS_LOB.INSTR(F_CuractAndUserIDList, '*599;', 1, 1) > 0 
) AS SendDoc, 
'message', 
( 
     SELECT COUNT(*) FROM TD_PublicInfo 
     WHERE dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0 
     and F_Type = 0 and F_CurrentStatus = 4 AND F_UserID <> 599 
     and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') 
     AND (SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0) 
) AS Message, 
'bulletin', 
( 
     SELECT COUNT(*) FROM TD_PublicInfo 
     WHERE F_Type = 1 and F_CurrentStatus = 4 
     and (F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') 
     AND F_UserID <> 599 
     AND (
          SELECT COUNT(*) FROM TD_ReadPerson 
          WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 
) AS Affiche, 
'email', 
( 
     SELECT COUNT(*) FROM TD_EMail 
     WHERE F_Status=1 and F_IsRead <> '0' 
     and F_UserID=599 and trunc(F_MailTime) <= trunc(SYSDATE) 
) AS EMail, 
'publication', 
( 
     SELECT COUNT(*) FROM TD_PublicInfo 
     WHERE F_Type = 3 and F_CurrentStatus = 4 
     and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') 
     AND dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0) 
     and F_UserID <> 599 AND (SELECT COUNT(*) FROM TD_ReadPerson 
     WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 
) AS InsidePublication 
FROM DUAL

执行几乎和统计信息如下:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                           |     1 |       |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |                           |     1 |   394 |            |          |
|   2 |   NESTED LOOPS               |                           |   685 |   263K|  1029   (1)| 00:00:13 |
|*  3 |    TABLE ACCESS FULL         | TD_RECEIVEDOC             |   685 |   260K|  1029   (1)| 00:00:13 |
|*  4 |    INDEX UNIQUE SCAN         | SYS_C0015018              |     1 |     4 |     0   (0)| 00:00:01 |
|   5 |  SORT AGGREGATE              |                           |     1 |   295 |            |          |
|*  6 |   TABLE ACCESS FULL          | TD_SENDDOC                |    80 | 23600 |   160   (1)| 00:00:02 |
|   7 |  SORT AGGREGATE              |                           |     1 |   507 |            |          |
|*  8 |   FILTER                     |                           |       |       |            |          |
|*  9 |    TABLE ACCESS FULL         | TD_PUBLICINFO             |     2 |  1014 |   133   (1)| 00:00:02 |
|  10 |    SORT AGGREGATE            |                           |     1 |     9 |            |          |
|* 11 |     INDEX RANGE SCAN         | IDX_TDREADPERSON_PUB_USER |     1 |     9 |     3   (0)| 00:00:01 |
|  12 |  SORT AGGREGATE              |                           |     1 |    22 |            |          |
|* 13 |   FILTER                     |                           |       |       |            |          |
|* 14 |    TABLE ACCESS FULL         | TD_PUBLICINFO             |    38 |   836 |   133   (1)| 00:00:02 |
|  15 |    SORT AGGREGATE            |                           |     1 |     9 |            |          |
|* 16 |     INDEX RANGE SCAN         | IDX_TDREADPERSON_PUB_USER |     1 |     9 |     3   (0)| 00:00:01 |
|  17 |  SORT AGGREGATE              |                           |     1 |    16 |            |          |
|* 18 |   TABLE ACCESS BY INDEX ROWID| TD_EMAIL                  |    13 |   208 |  9419   (1)| 00:01:54 |
|* 19 |    INDEX RANGE SCAN          | IDX_TDEMAIL_USERID        |  9987 |       |    23   (0)| 00:00:01 |
|  20 |  SORT AGGREGATE              |                           |     1 |   507 |            |          |
|* 21 |   FILTER                     |                           |       |       |            |          |
|* 22 |    TABLE ACCESS FULL         | TD_PUBLICINFO             |     2 |  1014 |   133   (1)| 00:00:02 |
|  23 |    SORT AGGREGATE            |                           |     1 |     9 |            |          |
|* 24 |     INDEX RANGE SCAN         | IDX_TDREADPERSON_PUB_USER |     1 |     9 |     3   (0)| 00:00:01 |
|  25 |  FAST DUAL                   |                           |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("DBMS_LOB"."INSTR"("F_CURACTANDUSERIDLIST",'*599;',1,1)>0)
   4 - access("B"."F_ID"=TO_NUMBER(CAST(SUBSTR(SUBSTR("F_CURACTANDUSERIDLIST",0,INSTR("F_CURACTAND
              USERIDLIST",'*599;')-1),INSTR(SUBSTR("F_CURACTANDUSERIDLIST",0,INSTR("F_CURACTANDUSERIDLIST",'*599
              ;')),';',(-1))+1) AS VARCHAR2(4000))))
   6 - filter("DBMS_LOB"."INSTR"("F_CURACTANDUSERIDLIST",'*599;',1,1)>0)
   8 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND
              "F_PUBLICID"=:B1)=0)
   9 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=0 AND
              ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND
              "DBMS_LOB"."INSTR"(', '||"F_NOTIFYUSERIDLIST"||', ',', 599, ',1,1)>0 AND "F_USERID"<>599)
  11 - access("F_PUBLICID"=:B1 AND "F_USERID"=599)
  13 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND
              "F_PUBLICID"=:B1)=0)
  14 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=1 AND
              ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND
              "F_USERID"<>599)
  16 - access("F_PUBLICID"=:B1 AND "F_USERID"=599)
  18 - filter(TO_NUMBER("F_STATUS")=1 AND "F_ISREAD"<>'0' AND
              TRUNC(INTERNAL_FUNCTION("F_MAILTIME"))<=TRUNC(SYSDATE@!))
  19 - access("F_USERID"=599)
  21 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND
              "F_PUBLICID"=:B1)=0)
  22 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=3 AND
              ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND
              "DBMS_LOB"."INSTR"(', '||"F_NOTIFYUSERIDLIST"||', ',', 599, ',1,1)>0 AND "F_USERID"<>599)
  24 - access("F_PUBLICID"=:B1 AND "F_USERID"=599)


Statistics
----------------------------------------------------------
          1  recursive calls
     136702  db block gets
      52121  consistent gets
       8182  physical reads
          0  redo size
       1323  bytes sent via SQL*Net to client
        542  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

该查询涉及的表的规模如下:

. . imported "SUSUNOAGC"."TD_EMAIL"                      783.9 MB 1264719 rows
. . imported "SUSUNOAGC"."TD_RECEIVEDOC"                 148.9 MB   13703 rows
. . imported "SUSUNOAGC"."TD_SENDDOC"                    76.55 MB    1605 rows
. . imported "SUSUNOAGC"."TD_PUBLICINFO"                 17.64 MB    5091 rows
. . imported "SUSUNOAGC"."TD_READPERSON"                 26.43 MB  614403 rows
. . imported "SUSUNOAGC"."TD_ACTIVITY"                   20.75 KB      86 rows

经过两天的调试,也就是2010年的1月1日,这个问题查了“两年”啊,终于找到导致 physical reads 如此之高的原因—— clob 。

仔细看以下这三个子句,其中TD_PublicInfo 的 NotifyUserIDList 和 TD_ReceiveDoc的F_CURACTANDUSERIDLIST 都是 clob 类型:

SELECT COUNT(*) FROM TD_PublicInfo 
     WHERE dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0 
     and F_Type = 0 and F_CurrentStatus = 4 AND F_UserID <> 599 
     and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') 
     AND (SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0) 

SELECT COUNT(*) FROM TD_PublicInfo 
     WHERE F_Type = 1 and F_CurrentStatus = 4 
     and (F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') 
     AND F_UserID <> 599 
     AND (
          SELECT COUNT(*) FROM TD_ReadPerson 
          WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 
select COUNT(*) from 
    ( 
        select 
        F_CURACTANDUSERIDLIST, 
        CAST(SUBSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;') - 1), 
        INSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;')), ';', -1) + 1) 
        as VARCHAR2(4000)) as F_ActId from TD_ReceiveDoc 
     ) A 
     join Td_Activity B 
     on A.F_ActId = B.F_Id 
     where dbms_lob.instr(F_CURACTANDUSERIDLIST, '*599;', 1, 1 ) > 0 

由于没有 lob 的 storag in row 属性,所以会产生大量 direct path read ,导致大量的物理读。

我尝试将这个几个表的clob字段缓存在内存中(storage in row),虽然可以大大缓解 physical reads,但是 consistent get 还是不能很好地减少。

思前想后,总觉得奇怪,为什么会在clob字段进行dbms_lob.instr搜索呢?

查看了一下示例数据,原来这些clob都共同的特点,都是保存一个长长的用户ID列表,做到这里我在想这样的设计究竟出了什么问题呢?有没有违反范式呢?如果违反了范式又是违法了哪一级范式呢?

我记得念大学的时候,教材有这么一句话,关系型数据库不允许违反第一范式,否则将无法运行。于是心中总有一个“概念”,这年代能跑的关系数据库都不用讨论是否遵守第一范式啦,于是乎从第二范式到第四范式,逐个对照,发现没有一个对应得上,于是再研究了一下最基础的第一范式 1NF 。

感谢wiki啊!,以下很多内容都是参照wiki

关系数据库之父 Edgar F. Codd 在1970年的时候提出了关系模型(Relational Model),并介绍了数据库范式的概念,其中有我们后来熟知的1NF。

简单地说1NF就是希望我们的数据组织成这样:

image而不是这样

image

有些朋友可能会说,第二张图的数据更像我们喜欢的报表类型啊,注意,从1NF到5NF更加关心的是数据的存储方式和是否会出现更新异常,而不是展现方式。说回正题,Top SQL 中clob的数据类似于Transactions里面的数据,我们要访问其中的一些信息,首先就要将其拆分开(当然dbms_lob.instr不是做拆分),然后再做匹配、统计等,wiki中的原话:

  1. Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and
  2. Deriving a query result based on the results of the first stage

而事实上,我们的Top SQL就是这样做的,于是乎该情景就要求数据库访问所有的记录(TABLE ACCESS FULL)拆分数据、匹配数据,或者可以称为数据不能被索引化,最显而易见的结果就是查询时需要访问大量无关数据,系统的扩展性不好。

解决方案比较简单,这里就不说了,但是,能不能协调得动开发商就难说咯。

posted @ 2010-01-02 18:52  killkill  阅读(3332)  评论(0编辑  收藏  举报