Oracle优化器的工作主要分成3步:
首先, 优化器会尝试把复杂的SQL语句转化较为简单的SQL语句, 通常倾向于转化为表的连接方式.
然后, 优化器会对依据统计信息对SQL语句进行估量, 这些估量主要包括3个方面: Selectivity,
Cardinality和Cost, 这三个方面是相互相关的.
最后, 优化器会尝试各种执行计划并给出一个代价最低的计划.
要想理解优化器如何得到最后的执行计划, 首先就要理解被优化的SQL语句的结构, 这是浅显的道理. 但由于Oracle优化器有时候会对我们的SQL语句进行转化, 这就意味着优化器最终所优化器的对象, 并不一定就是我们最初提供的那个语句. 所以, 现在我们将会开始介绍优化器对SQL语句进行转化时所遵循的一些机制. 由于优化器的复杂远远超出我的认知, 这里我不可能列出所有的机制, 不过我希望至少能够达到这样的效果:
假设您对优化器了解甚少, 在我介绍之前, 您对于某些执行计划的看法可能是:
“我实在不理解这个该死的执行计划是怎么来的”
我希望在我介绍之后, 您的看法会转变为:
“我还是不完全理解这个该死的执行计划是怎么来的, 不过我至少知道这是优化器转化的结果, 我觉得我可以尝试对我的SQL改变一下格式或者使用一些提示”
闲话少叙, 正式开始:
Oracle的优化器对SQL转化的手段包括5种, 我们接下来会对这个方面分别进行说明:
- View Merging
- Predicate Pushing
- Subquery Unnesting
- Query Rewrite with Materialized Views
- OR-expansion
对于每一部分, 我打算都先给大家看一看Oracle官方文档的权威说法 (通常都是摘自Oracle Database Performance Tuning Guide 10g Release2), 然后通过实际的例子进一步说明:
2.1 View Merging
2.1.1 View Merging定义
Each view referenced in a query is expanded by the parser into a separate query block. The query block essentially represents the view definition, and therefore the result of a view. One option for the optimizer is to analyze the view query block separately and generate a view subplan. The optimizer then processes the rest of the query by using the view subplan in the generation of an overall query plan. This technique usually leads to a suboptimal query plan, because the view is optimized separately from rest of the query.
The query transformer then removes the potentially suboptimal plan by merging the view query block into the query block that contains the view. Most types of views are merged. When a view is merged, the query block representing the view is merged into the containing query block. Generating a subplan is no longer necessary, because the view query block is eliminated.
对于SQL里的视图, 优化器可以选择先对视图代表的查询单独优化并生成view本身的执行计划, 在把这个计划和SQL语句其他的部分合并在一起生成最终的执行计划, 当然这种技术往往会产生一个不是最好的计划, 因为视图的优化是单独考虑了, 而没有放到整个SQL环境中通盘考虑.
而Oracle的查询优化器则可以帮助我们把视图代表的查询merge到SQL的其他部分当中, 从而可以更全面的考虑SQL的执行计划.
那么优化器的view merging操作到底是怎么实现的呢? 我们先看一个简单的例子.
首先我们使用的是Oracle 9i提供的一个sample schema: HR, 创建脚本下载:
http://gdcckm.chn.hp.com/teams/ams/asdba/dbcommunity/Shared%20Documents/Others/hr_schema.zip
例一:
我们先对里面employees表视图:
CREATE VIEW emp_10
AS SELECT employee_id, last_name, job_id, manager_id, hire_date, salary,
commission_pct, department_id
FROM employees
WHERE department_id = 10;
我们针对视图emp_10做一个查询:
SELECT employee_id
FROM emp_10
WHERE employee_id > 170;
Execution Plan
----------------------------------------------------------
Plan hash value: 3659898230
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 7 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPLOYEE_ID">170)
2 - access("DEPARTMENT_ID"=10)
我们看到的执行计划里面已经看不到emp_10这个视图的名字, 因为视图已经被转化成了针对employees的查询, 转化后的语句应该是这样的(根据10053的输出):
SELECT "EMPLOYEES"."EMPLOYEE_ID" "EMPLOYEE_ID"
FROM HR."EMPLOYEES" "EMPLOYEES"
WHERE "EMPLOYEES"."DEPARTMENT_ID"=10 AND "EMPLOYEES"."EMPLOYEE_ID">170
如果我们使用hint来要求Oracle不要做view merging的话, 我们看到的执行计划应该是这样的:
SELECT /*+ no_merge(emp_10) */ employee_id
FROM emp_10
WHERE employee_id > 170;
Execution Plan
----------------------------------------------------------
Plan hash value: 1825255554
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
| 1 | VIEW | EMP_10 | 1 | 13 | 2 (0)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 25 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("EMPLOYEE_ID">170)
3 - access("DEPARTMENT_ID"=10)
我们会在10053 trace中看到SQL的结构是:
SELECT /*+ NO_MERGE ("EMP_10") */ "EMP_10"."EMPLOYEE_ID" "EMPLOYEE_ID"
FROM
(SELECT "EMPLOYEES"."EMPLOYEE_ID" "EMPLOYEE_ID" FROM HR
."EMPLOYEES" "EMPLOYEES" WHERE "EMPLOYEES"."DEPARTMENT_ID"=10 AND "EMPLOYEES"."EMPLOYEE_ID">170) "EMP_10"
这里我们看到视图是单独处理的, 不过值得一提的是exployee_id > 170这个条件也被加入视图中事先处理了, 这就涉及到了SQL语句转化另一种技术: Predicate Pushing, 我们会在后面再介绍.
当然, 由于这是一个最简单的例子, 所以你会发现不管做不做merging, 执行计划的cost都是一样的.
但是如果你尝试稍微复杂的例子, 你就比较看到一些区别了.
例二:
CREATE OR REPLACE VIEW avg_salary_view AS
SELECT department_id, AVG(salary) AS avg_sal_dept
FROM employees
GROUP BY department_id;
SELECT departments.location_id, avg_sal_dept
FROM departments, avg_salary_view
WHERE departments.department_id = avg_salary_view.department_id
AND departments.location_id = 2400;
Execution Plan
----------------------------------------------------------
Plan hash value: 2237210116
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 130 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 5 | 130 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 10 | 70 | 1 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 5 | 130 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 1 | 19 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | DEPT_LOCATION_IX | 1 | | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 10 | | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("DEPARTMENTS"."LOCATION_ID"=2400)
6 - access("DEPARTMENTS"."DEPARTMENT_ID"="DEPARTMENT_ID")
SELECT /*+ no_merge(avg_salary_view) */ departments.location_id, avg_sal_dept
FROM departments, avg_salary_view
WHERE departments.department_id = avg_salary_view.department_id
AND departments.location_id = 2400;
Execution Plan
----------------------------------------------------------
Plan hash value: 725906304
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 33 | 7 (29)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 33 | 7 (29)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 1 | 7 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | DEPT_LOCATION_IX | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW | AVG_SALARY_VIEW | 11 | 286 | 4 (25)| 00:00:01 |
| 5 | HASH GROUP BY | | 11 | 77 | 4 (25)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMPLOYEES | 107 | 749 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPARTMENTS"."DEPARTMENT_ID"="AVG_SALARY_VIEW"."DEPARTMENT_ID")
3 - access("DEPARTMENTS"."LOCATION_ID"=2400)
我们可以明显看出view merging发生与否导致的cost的区别, 不过这里更重要的是我们需要引入另一个概念:
2.1.2 Mergeable and Nonmergeable Views
并不是所有的视图都是可以做merging操作的, 一个视图是可以merging, 前提是这个视图不包含以下操作中的任何一个:
- Set operators (
UNION , UNION ALL, INTERSECT, MINUS) - A CONNECT BY clause
- A ROWNUM pseudocolumn
- Aggregate functions (AVG, COUNT, MAX, MIN, SUM) in the select list
不包含以上操作的视图称为mergeable view, 否则称为nonmergeable view.
但是我上面的例子中的视图avg_salary_view就包括avg函数, 为什么它就可以merge呢, 这是因为视图中加入了group by