一种优化的手段 (gather_plan_statistics + cardinality)
转自 http://www.itpub.net/thread-1280336-1-1.html
其实,许多人已经提到过这个东西了,我这里只是举一个简单的例子演示一遍具体的流程而已
比如说优化这个语句:
SELECT MAX(P.PAGEVIEW)
FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
WHERE CATAP.CATALOGID = 291
AND P.ID = CATAP.PRODUCTID
AND PUBLISHSTATUS = 3;
我收集运行时的统计信息:
SELECT /*+ gather_plan_statistics ZHAOSJ1*/max(P.PAGEVIEW)
FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
WHERE CATAP.CATALOGID = 291
AND P.ID = CATAP.PRODUCTID
AND PUBLISHSTATUS = 3;
实际的运行这个SQL语句,gather_plan_statistics是收集运行时的统计信息的提示,ZHAOSJ1 就是一个普通的注释,是为了唯一的标识这个游标的.
SQL> SELECT SQL_ID,CHILD_NUMBER FROM V$SQL WHERE SQL_TEXT LIKE '%ZHAOSJ1%' AND SQL_TEXT NOT LIKE '%V$SQL%';
SQL_ID CHILD_NUMBER
------------- ------------
79gcyuucwuzwg 0
查找刚才的游标.
SET PAGESIZE 200;
SET LINESIZE 200;
COL PLAN_TABLE_OUTPUT FOR A195;
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('79gcyuucwuzwg',0,'ALL IOSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 79gcyuucwuzwg, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics ZHAOSJ1*/max(P.PAGEVIEW) FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP WHERE CATAP.CATALOGID =
291 AND P.ID = CATAP.PRODUCTID AND PUBLISHSTATUS = 3
Plan hash value: 745285829
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 19 | | | 1 |00:00:00.08 | 17210 |
| 2 | NESTED LOOPS | | 1 | 50 | 950 | 27 (0)| 00:00:01 | 8557 |00:00:00.08 | 17210 |
|* 3 | INDEX RANGE SCAN| INDEX2_CATALOGRELATEPRODUCT | 1 | 50 | 400 | 2 (0)| 00:00:01 | 8567 |00:00:00.01 | 30 |
|* 4 | INDEX RANGE SCAN| INDEX2_PRODUCT | 8567 | 1 | 11 | 1 (0)| 00:00:01 | 8557 |00:00:00.06 | 17180 |
--------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 - SEL$1 / [email=CATAP@SEL$1]CATAP@SEL$1[/email]
4 - SEL$1 / [email=P@SEL$1]P@SEL$1[/email]
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CATAP"."CATALOGID"=291)
4 - access("P"."ID"="CATAP"."PRODUCTID" AND "PUBLISHSTATUS"=3)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) MAX("P"."PAGEVIEW")[22]
2 - "P"."PAGEVIEW"[NUMBER,22]
3 - "CATAP"."PRODUCTID"[NUMBER,22]
4 - "P"."PAGEVIEW"[NUMBER,22]
这里显示:这个语句总的逻辑IO是:17210(buffers 是实际的逻辑IO数量,这里是累计值,包括子操作的值)
starts 是对应的动作执行的次数
E-ROWS 是优化器估算这一步返回的数据行数
A-Rows 是这一步实际返回的数据行数
明显INDEX RANGE SCAN| INDEX2_CATALOGRELATEPRODUCT 这一步估算返回50行.但实际返回了8567行
因为估算返回50行,所以估算INDEX RANGE SCAN| INDEX2_PRODUCT 这一步会执行50次,但实际它执行了8567次.
显然估算与实际执行上存在着巨大的差异.
那优化器估算INDEX RANGE SCAN| INDEX2_CATALOGRELATEPRODUCT 这一步返回8567行的话,执行计划会是什么呢?实际的执行效果会怎样呢?
使用cardinality(t n) 提示不就可以了吗?!
SQL> SELECT /*+ CARDINALITY(CATAP 8500) gather_plan_statistics ZHAOSJ6*/max(P.PAGEVIEW)
2 FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
3 WHERE CATAP.CATALOGID = 291
4 AND P.ID = CATAP.PRODUCTID
5 AND PUBLISHSTATUS = 3;
MAX(P.PAGEVIEW)
---------------
18524
SQL> SELECT SQL_ID,CHILD_NUMBER FROM V$SQL WHERE SQL_TEXT LIKE '%ZHAOSJ6%' AND SQL_TEXT NOT LIKE '%V$SQL%';
SQL_ID CHILD_NUMBER
------------- ------------
f9nj2kxhphm82 0
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('f9nj2kxhphm82',0,'ALL IOSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f9nj2kxhphm82, child number 0
-------------------------------------
SELECT /*+ CARDINALITY(CATAP 8500) gather_plan_statistics ZHAOSJ6*/max(P.PAGEVIEW) FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP WHERE
CATAP.CATALOGID = 291 AND P.ID = CATAP.PRODUCTID AND PUBLISHSTATUS = 3
Plan hash value: 2173417495
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 19 | | | 1 |00:00:00.09 | 1413 |
|* 2 | HASH JOIN | | 1 | 8511 | 157K| 246 (3)| 00:00:03 | 8557 |00:00:00.09 | 1413 |
|* 3 | INDEX RANGE SCAN | INDEX2_CATALOGRELATEPRODUCT | 1 | 8500 | 68000 | 2 (0)| 00:00:01 | 8567 |00:00:00.01 | 30 |
|* 4 | INDEX FAST FULL SCAN| INDEX2_PRODUCT | 1 | 236K| 2535K| 242 (2)| 00:00:03 | 236K|00:00:00.01 | 1383 |
------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 - SEL$1 / [email=CATAP@SEL$1]CATAP@SEL$1[/email]
4 - SEL$1 / [email=P@SEL$1]P@SEL$1[/email]
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."ID"="CATAP"."PRODUCTID")
3 - access("CATAP"."CATALOGID"=291)
4 - filter("PUBLISHSTATUS"=3)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) MAX("P"."PAGEVIEW")[22]
2 - (#keys=1) "P"."PAGEVIEW"[NUMBER,22]
3 - "CATAP"."PRODUCTID"[NUMBER,22]
4 - "P"."ID"[NUMBER,22], "P"."PAGEVIEW"[NUMBER,22]
明显hash join后估算的行数8511与实际返回的行数8557的差距已经很小了.
执行计划改变了.而且逻辑IO从1.7W下降到了1400.逻辑IO的下降还是很明显的
下面要做的就是通过使用提示使得它可以走这个执行计划:
SELECT /*+ use_hash(p catap)*/max(P.PAGEVIEW)
FROM PRODUCT P, CATALOGRELATEPRODUCT CATAP
WHERE CATAP.CATALOGID = 291
AND P.ID = CATAP.PRODUCTID
AND PUBLISHSTATUS = 3;
但,事实上你要做的是查找一下优化器为什么估算返回的行数,估算错了呢?
SQL> select a.num_distinct,a.num_buckets,a.num_nulls,a.histogram,b.num_rows,round(b.num_rows/a.num_distinct) rows_per_key
2 from user_tab_columns a,user_tables b where a.table_name='CATALOGRELATEPRODUCT' and a.column_name='CATALOGID' and b.table_name='CATALOGRELATEPRODUCT';
NUM_DISTINCT NUM_BUCKETS NUM_NULLS HISTOGRAM NUM_ROWS ROWS_PER_KEY
------------ ----------- ---------- --------------- ---------- ------------
17943 1 0 NONE 904362 50
明显,这个列上并没有收集柱状图统计信息.所以对于min~max内的任意给定值,它估算的返回行数都是:num_rows/num_distinct =50
对于CATAP.CATALOGID = 291,估算确实不够准确.但问题在于两点:
1.应用程序中是使用绑定变量的.
2.对于典型输入值来说,确实返回不了几行数据(nl的执行计划确实是好的,不收集柱状图统计信息时,确实走NL了).291其实并不是一个典型输入值(对于这个非典型输入值来说,hash join确实是一个好的执行计划,而nl不是).所以如果收集了柱状图统计信息的话,每次硬分析的时候,都会peeking,这带有很大的随机性,如果peeking的刚好是291这个非典型输入值,采用hash join的话,对于一般的输入值来说,性能上其实是不好的.所以,其实就不应该收集柱状图统计信息:这样虽然对于极少数的输入值来说,执行计划并不好,但对于绝大多数的输入值来说,执行计划是很好的.对于极少数的非典型输入值来说,如果使用字面值的话,你可以使用use_hash之类的提示来纠正它的执行计划.