[原]由clob引发的性能问题所想到的
2009年最后一天下午比较闲,于是在生产库做了AWR report,看看生产库的性能如何,发现有一条Top SQL性能很差(貌似性能调优实例的开场白都是这样的),不多说了,上图看吧:
从这张图来看最Top 的SQL就是那条“ SELECT ‘receivedoc’,( ”了。
语句摘录如下:
SELECT 'receivedoc', ( select COUNT(*) from ( select F_CURACTANDUSERIDLIST, CAST(SUBSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;') - 1), INSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;')), ';', -1) + 1) as VARCHAR2(4000)) as F_ActId from TD_ReceiveDoc ) A join Td_Activity B on A.F_ActId = B.F_Id where dbms_lob.instr(F_CURACTANDUSERIDLIST, '*599;', 1, 1 ) > 0 ) AS ReceiveDoc, 'senddoc', ( SELECT COUNT(*) FROM TD_SendDoc WHERE DBMS_LOB.INSTR(F_CuractAndUserIDList, '*599;', 1, 1) > 0 ) AS SendDoc, 'message', ( SELECT COUNT(*) FROM TD_PublicInfo WHERE dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0 and F_Type = 0 and F_CurrentStatus = 4 AND F_UserID <> 599 and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') AND (SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0) ) AS Message, 'bulletin', ( SELECT COUNT(*) FROM TD_PublicInfo WHERE F_Type = 1 and F_CurrentStatus = 4 and (F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') AND F_UserID <> 599 AND ( SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 ) AS Affiche, 'email', ( SELECT COUNT(*) FROM TD_EMail WHERE F_Status=1 and F_IsRead <> '0' and F_UserID=599 and trunc(F_MailTime) <= trunc(SYSDATE) ) AS EMail, 'publication', ( SELECT COUNT(*) FROM TD_PublicInfo WHERE F_Type = 3 and F_CurrentStatus = 4 and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') AND dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0) and F_UserID <> 599 AND (SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 ) AS InsidePublication FROM DUAL
执行几乎和统计信息如下:
---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 394 | | | | 2 | NESTED LOOPS | | 685 | 263K| 1029 (1)| 00:00:13 | |* 3 | TABLE ACCESS FULL | TD_RECEIVEDOC | 685 | 260K| 1029 (1)| 00:00:13 | |* 4 | INDEX UNIQUE SCAN | SYS_C0015018 | 1 | 4 | 0 (0)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 295 | | | |* 6 | TABLE ACCESS FULL | TD_SENDDOC | 80 | 23600 | 160 (1)| 00:00:02 | | 7 | SORT AGGREGATE | | 1 | 507 | | | |* 8 | FILTER | | | | | | |* 9 | TABLE ACCESS FULL | TD_PUBLICINFO | 2 | 1014 | 133 (1)| 00:00:02 | | 10 | SORT AGGREGATE | | 1 | 9 | | | |* 11 | INDEX RANGE SCAN | IDX_TDREADPERSON_PUB_USER | 1 | 9 | 3 (0)| 00:00:01 | | 12 | SORT AGGREGATE | | 1 | 22 | | | |* 13 | FILTER | | | | | | |* 14 | TABLE ACCESS FULL | TD_PUBLICINFO | 38 | 836 | 133 (1)| 00:00:02 | | 15 | SORT AGGREGATE | | 1 | 9 | | | |* 16 | INDEX RANGE SCAN | IDX_TDREADPERSON_PUB_USER | 1 | 9 | 3 (0)| 00:00:01 | | 17 | SORT AGGREGATE | | 1 | 16 | | | |* 18 | TABLE ACCESS BY INDEX ROWID| TD_EMAIL | 13 | 208 | 9419 (1)| 00:01:54 | |* 19 | INDEX RANGE SCAN | IDX_TDEMAIL_USERID | 9987 | | 23 (0)| 00:00:01 | | 20 | SORT AGGREGATE | | 1 | 507 | | | |* 21 | FILTER | | | | | | |* 22 | TABLE ACCESS FULL | TD_PUBLICINFO | 2 | 1014 | 133 (1)| 00:00:02 | | 23 | SORT AGGREGATE | | 1 | 9 | | | |* 24 | INDEX RANGE SCAN | IDX_TDREADPERSON_PUB_USER | 1 | 9 | 3 (0)| 00:00:01 | | 25 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("DBMS_LOB"."INSTR"("F_CURACTANDUSERIDLIST",'*599;',1,1)>0) 4 - access("B"."F_ID"=TO_NUMBER(CAST(SUBSTR(SUBSTR("F_CURACTANDUSERIDLIST",0,INSTR("F_CURACTAND USERIDLIST",'*599;')-1),INSTR(SUBSTR("F_CURACTANDUSERIDLIST",0,INSTR("F_CURACTANDUSERIDLIST",'*599 ;')),';',(-1))+1) AS VARCHAR2(4000)))) 6 - filter("DBMS_LOB"."INSTR"("F_CURACTANDUSERIDLIST",'*599;',1,1)>0) 8 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND "F_PUBLICID"=:B1)=0) 9 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=0 AND ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND "DBMS_LOB"."INSTR"(', '||"F_NOTIFYUSERIDLIST"||', ',', 599, ',1,1)>0 AND "F_USERID"<>599) 11 - access("F_PUBLICID"=:B1 AND "F_USERID"=599) 13 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND "F_PUBLICID"=:B1)=0) 14 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=1 AND ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND "F_USERID"<>599) 16 - access("F_PUBLICID"=:B1 AND "F_USERID"=599) 18 - filter(TO_NUMBER("F_STATUS")=1 AND "F_ISREAD"<>'0' AND TRUNC(INTERNAL_FUNCTION("F_MAILTIME"))<=TRUNC(SYSDATE@!)) 19 - access("F_USERID"=599) 21 - filter( (SELECT COUNT(*) FROM "TD_READPERSON" "TD_READPERSON" WHERE "F_USERID"=599 AND "F_PUBLICID"=:B1)=0) 22 - filter(TO_NUMBER("F_CURRENTSTATUS")=4 AND TO_NUMBER("F_TYPE")=3 AND ("F_HADREADEDIDLIST"||', ' NOT LIKE '%, 599, %' OR "F_HADREADEDIDLIST"||'A'='A') AND "DBMS_LOB"."INSTR"(', '||"F_NOTIFYUSERIDLIST"||', ',', 599, ',1,1)>0 AND "F_USERID"<>599) 24 - access("F_PUBLICID"=:B1 AND "F_USERID"=599) Statistics ---------------------------------------------------------- 1 recursive calls 136702 db block gets 52121 consistent gets 8182 physical reads 0 redo size 1323 bytes sent via SQL*Net to client 542 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
该查询涉及的表的规模如下:
. . imported "SUSUNOAGC"."TD_EMAIL" 783.9 MB 1264719 rows . . imported "SUSUNOAGC"."TD_RECEIVEDOC" 148.9 MB 13703 rows . . imported "SUSUNOAGC"."TD_SENDDOC" 76.55 MB 1605 rows . . imported "SUSUNOAGC"."TD_PUBLICINFO" 17.64 MB 5091 rows . . imported "SUSUNOAGC"."TD_READPERSON" 26.43 MB 614403 rows . . imported "SUSUNOAGC"."TD_ACTIVITY" 20.75 KB 86 rows
经过两天的调试,也就是2010年的1月1日,这个问题查了“两年”啊,终于找到导致 physical reads 如此之高的原因—— clob 。
仔细看以下这三个子句,其中TD_PublicInfo 的 NotifyUserIDList 和 TD_ReceiveDoc的F_CURACTANDUSERIDLIST 都是 clob 类型:
SELECT COUNT(*) FROM TD_PublicInfo WHERE dbms_lob.instr(', ' || F_NotifyUserIDList || ', ', ', 599, ', 1, 1 ) > 0 and F_Type = 0 and F_CurrentStatus = 4 AND F_UserID <> 599 and ((F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') AND (SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0) SELECT COUNT(*) FROM TD_PublicInfo WHERE F_Type = 1 and F_CurrentStatus = 4 and (F_HadReadedIDList||', ' not like '%, 599, %' or F_HadReadedIDList || 'A' = 'A') AND F_UserID <> 599 AND ( SELECT COUNT(*) FROM TD_ReadPerson WHERE F_PublicID = TD_PublicInfo.F_ID AND F_UserID = 599) = 0 select COUNT(*) from ( select F_CURACTANDUSERIDLIST, CAST(SUBSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;') - 1), INSTR(SUBSTR(F_CURACTANDUSERIDLIST, 0, INSTR(F_CURACTANDUSERIDLIST, '*599;')), ';', -1) + 1) as VARCHAR2(4000)) as F_ActId from TD_ReceiveDoc ) A join Td_Activity B on A.F_ActId = B.F_Id where dbms_lob.instr(F_CURACTANDUSERIDLIST, '*599;', 1, 1 ) > 0
由于没有 lob 的 storag in row 属性,所以会产生大量 direct path read ,导致大量的物理读。
我尝试将这个几个表的clob字段缓存在内存中(storage in row),虽然可以大大缓解 physical reads,但是 consistent get 还是不能很好地减少。
思前想后,总觉得奇怪,为什么会在clob字段进行dbms_lob.instr搜索呢?
查看了一下示例数据,原来这些clob都共同的特点,都是保存一个长长的用户ID列表,做到这里我在想这样的设计究竟出了什么问题呢?有没有违反范式呢?如果违反了范式又是违法了哪一级范式呢?
我记得念大学的时候,教材有这么一句话,关系型数据库不允许违反第一范式,否则将无法运行。于是心中总有一个“概念”,这年代能跑的关系数据库都不用讨论是否遵守第一范式啦,于是乎从第二范式到第四范式,逐个对照,发现没有一个对应得上,于是再研究了一下最基础的第一范式 1NF 。
感谢wiki啊!,以下很多内容都是参照wiki。
关系数据库之父 Edgar F. Codd 在1970年的时候提出了关系模型(Relational Model),并介绍了数据库范式的概念,其中有我们后来熟知的1NF。
简单地说1NF就是希望我们的数据组织成这样:
有些朋友可能会说,第二张图的数据更像我们喜欢的报表类型啊,注意,从1NF到5NF更加关心的是数据的存储方式和是否会出现更新异常,而不是展现方式。说回正题,Top SQL 中clob的数据类似于Transactions里面的数据,我们要访问其中的一些信息,首先就要将其拆分开(当然dbms_lob.instr不是做拆分),然后再做匹配、统计等,wiki中的原话:
- Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and
- Deriving a query result based on the results of the first stage
而事实上,我们的Top SQL就是这样做的,于是乎该情景就要求数据库访问所有的记录(TABLE ACCESS FULL)拆分数据、匹配数据,或者可以称为数据不能被索引化,最显而易见的结果就是查询时需要访问大量无关数据,系统的扩展性不好。
解决方案比较简单,这里就不说了,但是,能不能协调得动开发商就难说咯。