不稳定的执行计划(Oracle执行计划稳定性/不稳定性)
2025-02-10 17:27 潇湘隐者 阅读(20) 评论(0) 编辑 收藏 举报本文是翻译Kerry Osborne的Unstable Plans (Oracle Plan Stability/Instability)[1] 这篇文章,翻译如有不当的地方,敬请谅解,请尊重原创和翻译劳动成果,转载的时候请注明出处。谢谢!文中有些翻译的地方按自己的理解进行了调整,但是尽量 符合原文意思。请知晓!
Oracle基于成本的优化器(Cost Based Optimizer,CBO)有时候表现得非常不稳定,这可能是最令人沮丧的事情之一。它的性能表现有时似乎是随机的。 解决这些问题也颇具挑战性。这有点像把你的车送去修车厂,因为它发出奇怪的声音,但当修车师傅在场时,它又从不发出那种声音。幸运的是,我们有 ASH(Active Session History)和AWR(Automatic Workload Repository)工具,它们可以捕获大量关于数据库当时正在做什么的信息。
这里有两个脚本,我发现它们很有用。
脚本unstable_plans.sql
第一个脚本我命名为unstable_plans.sql. 如下所示
----------------------------------------------------------------------------------------
--
-- File name: unstable_plans.sql
--
-- Purpose: Attempts to find SQL statements with plan instability.
--
-- Author: Kerry Osborne
--
-- Usage: This scripts prompts for two values, both of which can be left blank.
--
-- min_stddev: the minimum "normalized" standard deviation between plans
-- (the default is 2)
--
-- min_etime: only include statements that have an avg. etime > this value
-- (the default is .1 second)
--
-- See http://kerryosborne.oracle-guy.com/2008/10/unstable-plans/ for more info.
---------------------------------------------------------------------------------------
set lines 155
col execs for 999,999,999
col min_etime for 999,999.99
col max_etime for 999,999.99
col avg_etime for 999,999.999
col avg_lio for 999,999,999.9
col norm_stddev for 999,999.9999
col begin_interval_time for a30
col node for 99999
break on plan_hash_value on startup_time skip 1
select * from (
select sql_id, sum(execs), min(avg_etime) min_etime, max(avg_etime) max_etime, stddev_etime/min(avg_etime) norm_stddev
from (
select sql_id, plan_hash_value, execs, avg_etime,
stddev(avg_etime) over (partition by sql_id) stddev_etime
from (
select sql_id, plan_hash_value,
sum(nvl(executions_delta,0)) execs,
(sum(elapsed_time_delta)/decode(sum(nvl(executions_delta,0)),0,1,sum(executions_delta))/1000000) avg_etime
-- sum((buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta))) avg_lio
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
group by sql_id, plan_hash_value
)
)
group by sql_id, stddev_etime
)
where norm_stddev > nvl(to_number('&min_stddev'),2)
and max_etime > nvl(to_number('&min_etime'),.1)
order by norm_stddev
/
它可以用来显示那些执行时间存在明显/显著差异的SQL语句(它也可以修改为查找逻辑I/O存在差异的SQL,但我会留给读者来完成这个练习)。 它使用分析函数来计算SQL执行计划的平均执行时间的标准方差。因此,那些具有多个计划且计划之间响应时间差异很大的语句将被该脚本返回。 脚本会提示输入两个值:第一个是最小标准差数量,第二个是最小执行时间(通常,如果一个语句有时执行时间为0.005秒,有时为0.02秒,我并不在意,尽管从统计学上来说这是一个很大的波动)。顺便说一下,这两个输入都有默认值。
脚本awr_plan_change.sql
第二个脚本为awr_plan_change.sql,如下所示:
set lines 155
col execs for 999,999,999
col avg_etime for 999,999.999
col avg_lio for 999,999,999.9
col begin_interval_time for a30
col node for 99999
break on plan_hash_value on startup_time skip 1
select ss.snap_id, ss.instance_number node, begin_interval_time, sql_id, plan_hash_value,
nvl(executions_delta,0) execs,
(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,
(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio
from DBA_HIST_SQLSTAT S, DBA_HIST_SNAPSHOT SS
where sql_id = nvl('&sql_id','4dqs2k5tynk61')
and ss.snap_id = S.snap_id
and ss.instance_number = S.instance_number
and executions_delta > 0
order by 1, 2, 3
/
它显示了给定语句的执行计划随时间的变化信息,以及一些统计数据,例如平均执行时间(average elapsed time)和平均逻辑I/O读写(average lio’s)。
总之,这里有一个使用这两个脚本的示例(顺便说一下,示例是在11gR1数据库上进行的,但这些脚本在10g上也能正常工作)。
> sqlplus / as sysdba
SQL*Plus: Release 11.1.0.6.0 - Production on Tue Oct 7 15:44:20 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> alter system set optimizer_mode=rule;
System altered.
SQL> @awr_snap
PL/SQL procedure successfully completed.
SQL> @unstable_plans
Enter value for min_stddev:
Enter value for min_etime:
SQL_ID SUM(EXECS) MIN_ETIME MAX_ETIME NORM_STDDEV
------------- ---------- ----------- ----------- -------------
c5by9gdw3814r 14 .03 .12 2.1274
848dyu9288c3h 16 .02 .16 2.1309
2am60vd2kw8ux 11 .05 .22 2.4976
frxg53fac2n8j 5 .03 .30 4.2479
0qa98gcnnza7h 62 25.58 314.34 7.9833
g0jvz8csyrtcf 2 .09 1.19 8.2304
2cn0kc8u4b81w 545 .02 .42 12.8022
9wt62290ah0f7 6 .01 .47 38.5857
d8mayxqw0wnpv 1373 .01 .85 48.3874
9 rows selected.
SQL> /
Enter value for min_stddev:
Enter value for min_etime: 2
SQL_ID SUM(EXECS) MIN_ETIME MAX_ETIME NORM_STDDEV
------------- ---------- ----------- ----------- -------------
0qa98gcnnza7h 62 25.58 314.34 7.9833
SQL> @find_sql
Enter value for sql_text:
Enter value for address:
Enter value for sql_id: 0qa98gcnnza7h
SQL_ID CHILD PLAN_HASH EXECS ETIME AVG_ETIME USERNAME SQL_TEXT
------------- ------ ---------- ------------ ------------- ------------- ------------- -----------------------------------------
0qa98gcnnza7h 0 3723858078 5 356.53 71.31 SYS select avg(pk_col) from kso.skew where co
l1 > 0
0qa98gcnnza7h 1 568322376 1 7.92 7.92 SYS select avg(pk_col) from kso.skew where co
l1 > 0
0qa98gcnnza7h 2 568322376 10 52.14 5.21 SYS select avg(pk_col) from kso.skew where co
l1 > 0
0qa98gcnnza7h 3 568322376 30 1,064.19 35.47 KSO select avg(pk_col) from kso.skew where co
l1 > 0
0qa98gcnnza7h 4 3723858078 10 4,558.62 455.86 KSO select avg(pk_col) from kso.skew where co
l1 > 0
SQL> @awr_plan_change
Enter value for sql_id: 0qa98gcnnza7h
SNAP_ID NODE BEGIN_INTERVAL_TIME SQL_ID PLAN_HASH_VALUE EXECS AVG_ETIME AVG_LIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ --------------
3206 1 02-OCT-08 08.00.38.743 AM 0qa98gcnnza7h 568322376 4 10.359 121,722.8
3235 1 03-OCT-08 01.00.44.932 PM 0qa98gcnnza7h 1 10.865 162,375.0
3235 1 03-OCT-08 01.00.44.932 PM 0qa98gcnnza7h 3723858078 1 127.664 28,913,271.0
3236 1 03-OCT-08 01.28.09.000 PM 0qa98gcnnza7h 568322376 1 7.924 162,585.0
3236 1 03-OCT-08 01.28.09.000 PM 0qa98gcnnza7h 3723858078 1 86.682 27,751,123.0
3305 1 06-OCT-08 10.00.11.988 AM 0qa98gcnnza7h 4 64.138 22,616,931.5
3305 1 06-OCT-08 10.00.11.988 AM 0qa98gcnnza7h 568322376 2 5.710 81,149.0
3306 1 06-OCT-08 11.00.16.490 AM 0qa98gcnnza7h 6 5.512 108,198.5
3307 1 06-OCT-08 12.00.20.716 PM 0qa98gcnnza7h 2 3.824 81,149.0
3328 1 07-OCT-08 08.39.20.525 AM 0qa98gcnnza7h 30 35.473 156,904.7
3335 1 07-OCT-08 03.00.20.950 PM 0qa98gcnnza7h 3723858078 10 455.862 28,902,128.6
11 rows selected.
SQL> @dplan_awr
Enter value for sql_id: 0qa98gcnnza7h
Enter value for plan_hash_value:
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0qa98gcnnza7h
--------------------
select avg(pk_col) from kso.skew where col1 > 0
Plan hash value: 568322376
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 44497 (100)| |
| 1 | SORT AGGREGATE | | 1 | 11 | | |
| 2 | TABLE ACCESS FULL| SKEW | 32M| 335M| 44497 (2)| 00:08:54 |
---------------------------------------------------------------------------
SQL_ID 0qa98gcnnza7h
--------------------
select avg(pk_col) from kso.skew where col1 > 0
Plan hash value: 3723858078
--------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | TABLE ACCESS BY INDEX ROWID| SKEW |
| 3 | INDEX RANGE SCAN | SKEW_COL1 |
--------------------------------------------------
Note
-----
- rule based optimizer used (consider using cbo)
33 rows selected.
几点简短的评论,unstable_plan.sql 脚本显示了存在多个执行计划/执行计划存在切换的SQL语句。 它们中的大多数都是亚秒级响应时间(大多数的响应时间都在一秒以内)。真正引人注目的是sql_id为0qa98gcnnza7h的语句。它被执行了62次, 其中一个执行计划的平均执行时间约为25秒,而另一个计划的平均执行时间则约为314秒。awr_plan_changes.sql 脚本显示此语句一直在两个 计划(3723858078 和 568322376)之间切换。3723858078 是效率较低的计划(每次执行执行 25M 左右的逻辑 I/O),而计划 568322376要好 得多(每次执行只执行大约 120K 的逻辑I/O)。
当绑定变量窥探(bind variable peeking)出现问题时,我们通常会看到这种现象,即在两个或三个选项之间来回切换执行计划。这篇文章不是专门 讨论绑定变量窥探问题的,因此我不会深入介绍所有细节(如果你想了解更多信息,只需在此站点上搜索相关内容)。但我要说的是,无论使用何种 绑定变量(在本例中为 568322376 的执行计划),单个计划通常是可以接受的。使用 Outline、SQL Profile 或 Baseline(取决于您使用的 Oracle 版本) 让 Oracle 使用特定计划非常容易。而我们在这个案例中正是通过这种方式解决了问题。
1: http://kerryosborne.oracle-guy.com/2008/10/08/unstable-plans/
![](https://images.cnblogs.com/cnblogs_com/kerrycode/1913302/o_240731062102_kerrycode.png)
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 本地部署 DeepSeek:小白也能轻松搞定!
· 传国玉玺易主,ai.com竟然跳转到国产AI
· 自己如何在本地电脑从零搭建DeepSeek!手把手教学,快来看看! (建议收藏)
· 我们是如何解决abp身上的几个痛点
· 如何基于DeepSeek开展AI项目
2015-02-10 SSRS ReportServer Database 的Blocking问题
2015-02-10 [转载]DBA的特质第二部分:性格
2015-02-10 [转载]DBA的特质第一部分:技术
2015-02-10 DPA/Ignite由于DNS问题导致连接不上被监控的数据库服务器