[Oracle 9i] Subquery Factoring in 9i (用With语句做公共子查询提取)
Subquery Factoring, 其实就是平常比较熟悉的With语句,Adrian Billington 在他的网站上写了一篇介绍Subquery Factoring很好的文章,见这里。这篇Blog同样是对他的这篇文章的笔记。
With语句一般有两种用途,一种就是用来把复杂的SQL语句简单化:复杂的SQL语句一般都会嵌套很多层次,无论是写起来还是读起来都很困难,通过用With语句,把子查询语句抽取出来,这样可以使得SQL语句“扁平化”,写起来会很方便,读起来也很容易。 With的这种用法其实就是相当于把复杂语句中的inline-view给提取出来作为一个单独的查询语句,并赋予一个名字,这样用起来就跟访问一个视图一样。
With的另外一种用处是可以用来优化SQL语句,如果一个复杂的SQL语句需要重复访问一张表(最好是数据量比较大的表),这个时候如果用With把这部分需要重复访问底层表的SQL语句提取出来(Oracle往往会把这部分数据“物化”到一个临时表中),之后就不用重复多次访问底层表,从而可以提升SQL语句的执行效率。
下面主要看看With在SQL优化方面的一个例子,(注意, 我得到的结果跟Adrian很不一样, 我用的是10g来run他的例子的!)
首先创建测试用例:
SQL> show user
USER is "FRANK"
SQL> CREATE TABLE sales
2 NOLOGGING
3 AS
4 SELECT al.owner AS region
5 , al.object_type AS product
6 , al.object_id AS order_amt
7 FROM all_objects al
8 , all_objects a2
9 WHERE ROWNUM <= 1000000;
Table created.
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'sales');
PL/SQL procedure successfully completed.
首先看看不用With语句的效果,
SQL> set autotrace on
SQL> set timing on
SQL>
SQL> SELECT region
2 , total_sales
3 FROM (
4 SELECT region
5 , NVL(SUM(order_amt), 0) AS total_sales
6 FROM sales
7 GROUP BY
8 region
9 ) ilv
10 WHERE total_sales > ( SELECT SUM(order_amt)/3 AS one_third_sales
11 FROM sales);
REGION TOTAL_SALES
------------------------------ -----------
SYS 27760438
Elapsed: 00:00:01.68
Execution Plan
----------------------------------------------------------
Plan hash value: 302444457
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 753 (26)| 00:00:10 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 7 | 753 (26)| 00:00:10 |
| 3 | TABLE ACCESS FULL| SALES | 1006K| 6877K| 597 (7)| 00:00:08 |
| 4 | SORT AGGREGATE | | 1 | 3 | | |
| 5 | TABLE ACCESS FULL| SALES | 1006K| 2947K| 597 (7)| 00:00:08 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL(SUM("ORDER_AMT"),0)> (SELECT SUM("ORDER_AMT")/3 FROM
"SALES" "SALES"))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5034 consistent gets
18 physical reads
0 redo size
477 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
从执行计划可以看到,确实出现了两次访问底层表sales!
这条SQL语句的执行时间是1.68秒,cost 是753, 物理I/O为18. (这跟Adrian的测试结果差了很多,可能是由于数据量的问题)
然后再看看用With 语句把重复访问表的部分抽出来作为临时表的效果,
SQL> WITH region_sales AS
2 ( SELECT region
3 , NVL(SUM(order_amt), 0) AS total_sales
4 FROM sales
5 GROUP BY
6 region
7 )
8 SELECT region
9 , total_sales
10 FROM region_sales
11 WHERE total_sales > (SELECT SUM(total_sales)/3 AS one_third_sales
12 FROM region_sales);
REGION TOTAL_SALES
------------------------------ -----------
SYS 27760438
Elapsed: 00:00:02.26
Execution Plan
----------------------------------------------------------
Plan hash value: 1988045888
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 757 (26)| 00:00:10 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | HASH GROUP BY | | 1 | 7 | 753 (26)| 00:00:10 |
| 4 | TABLE ACCESS FULL | SALES | 1006K| 6877K| 597 (7)| 00:00:08 |
|* 5 | VIEW | | 1 | 30 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6612_10B3A7 | 1 | 7 | 2 (0)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 13 | | |
| 8 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6612_10B3A7 | 1 | 7 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("TOTAL_SALES"> (SELECT SUM("TOTAL_SALES")/3 FROM (SELECT /*+ CACHE_TEMP_TABLE
("T1") */ "C0" "REGION","C1" "TOTAL_SALES" FROM "SYS"."SYS_TEMP_0FD9D6612_10B3A7" "T1")
"REGION_SALES"))
Statistics
----------------------------------------------------------
102 recursive calls
11 db block gets
2540 consistent gets
43 physical reads
1472 redo size
477 bytes sent via SQL*Net to client
396 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
从执行计划可以看出,这次是访问一次底层表(创建临时表),然后是两次对临时表的访问(代价很明显小多了)
时间花销是2.26秒, 代价是757,物理I/O为43,每项数据都比不用With要多! (这个跟Adrian的差得太远了!)这也说明了一点,With语句并不一定会提升性能,从With语句的执行计划中可以看出有一步叫Temp Table Transformation,Oracle会创建一个global temporary table, 并从底层表sales 装载数据到这个临时表中,这些从执行计划都可以看到。这一步带来的代价是很高的(753),所以如果直接访问底层多次的代价比创建临时表的代价还要小的话,用With语句就没有多大价值了。相反,如果底层表的数据量很大,多次访问地层表的代价就很可能超过一次创建临时表的代价,这样With语句的优势就显现出来了!
还有一点要注意的是,用With语句并不代表Oracle一定会创建临时表,因为查询优化器很“聪明”,它会分析值不值得这么做。这个时候,如果我们想明确告诉Oracle,我们需要它来创建这样一个临时表,我们可以用一个hint – materialize . 但是由于这是一个没有官方文档说明的hint, 所以很难保证这个hint会一直起作用,所以用这个hint也要很小心才行!
关于With语句用法的一些限制性,Adrian说得很清楚,可以参考他的文章。
--------------------------------------
Regards,
FangwenYu