Oracle面对“数据倾斜列使用绑定变量”场景的解决方案
2019-08-27 00:32 AlfredZhao 阅读(1581) 评论(1) 编辑 收藏 举报1.背景知识介绍
我们知道,Oracle在传统的OLTP(在线事务处理)类系统中,强烈推荐使用绑定变量,这样可以有效的减少硬解析从而增加系统的并发处理能力。甚至在有些老旧系统,由于在开始开发阶段缺乏认识没有使用到绑定变量,后期并发量增长且无法改造程序时,运维DBA还会不得已去设置cursor_sharing=force来强制使用系统的绑定变量(这是一个万不得已的方案,并不是最佳实践)。虽然使用绑定变量给OLTP系统带来了巨大的好处,但也同时带来一些棘手的问题,最典型的就是由于SQL文本中包含绑定变量,优化器无法知道绑定变量代表的具体值,只能使用默认的可选择率,这就可能导致由于无法准确判断值的可选择率而造成选择错误的执行计划。Oracle在9i时代就有了针对这个问题的解决方案,即绑定变量窥探(bind peeking)特性。开启该特性的情况下,当遇到有绑定变量的SQL,在其第一次硬解析时,优化器会窥探真实的值从而准确判断可选择率(selectivity),最终选择正确的执行计划。可是该特性同时又引入另一个棘手的问题,因为在第一次硬解析之后就都是软/软软解析,所以也就不会再次窥探绑定变量的真实值,而如果该值所在字段本身数值比例就分布不均,就极可能导致性能问题(尤其是如果第一次窥探的值代表了少数情况,那问题就会更加严重),所以一直以来,虽然Oracle默认是开启这个特性的,但很多的客户生产环境最佳实践都将这个特性给关闭了。
直到Oracle 11g的时代,才推出了acs(adaptive_cursor_sharing)特性,配合bind peeking才算真正意义上解决了这个问题。不过也不够完美,因为acs特性本身也的确会增加额外的硬解析,且会导致child cursor增多,从而软解析扫描chain的时间变长,同时对shared pool空间需求也增加,且早期bug较多,即使Oracle默认也是开启这个特性的,很多客户生产环境也是将其关闭的。
在这种背景下,咨询了公司SQL优化专家赵勇,建议是当遇到在数据倾斜的列上使用绑定变量的情况,应该及时与开发沟通,能否在这类数据分布严重倾斜的列上不用绑定变量,若该列上的值很多,不用绑定变量可能导致大量的硬解析的话,还可在应用发出SQL前,先判断其传入的值,是否是非典型值,若是非典型值,使用非绑定变量的SQL;若是典型值,则使用绑定变量的语句。
如果是不能改应用的情况呢?我目前能想到的是要么牺牲非典型值的执行效率(防止非典型值先被窥探导致更严重的性能后果,可以按典型值的执行计划绑定);要么是干脆尝试同时打开bind peeking和acs特性,实际测试验证能否解决问题同时不引起其他性能问题(如果是已经关闭这些特性的生产系统,开启还是要慎重测试后决定)。
2.构造测试用例
下面构造一个简单的测试用例来说明Oracle在这种场景下提供的解决方案(bind peeking + acs):--建表T_SKEW,构造出严重的数据倾斜:
create table jingyu.t_skew as select * from dba_objects;
create index jingyu.idx_t_skew on jingyu.t_skew(object_id);
update jingyu.t_skew set object_id=3 where object_id>3;
commit;
--查看数据列OBJECT_ID的倾斜程度:
select object_id, count(*) from jingyu.t_skew group by object_id;
OBJECT_ID COUNT(*)
---------- ----------
2 1
3 86412
--收集统计信息:
exec dbms_stats.gather_table_stats('JINGYU','T_SKEW');
--查看列OBJECT_ID的直方图信息:
select owner, table_name, column_name, histogram from dba_tab_col_statistics where table_name = 'T_SKEW' and column_name = 'OBJECT_ID';
OWNER Name Name HISTOGRAM
------------------------------------------------------------ --------------- ------------------------- ------------------------------
JINGYU T_SKEW OBJECT_ID FREQUENCY
使用MOS:SCRIPT - Select to show Optimizer Statistics for CBO (文档 ID 31412.1) 提供的脚本查询信息:
SQL> @sosi
SQL> set echo off
Please enter Name of Table Owner (Null = SYS): jingyu
Please enter Table Name to show Statistics for: t_skew
***********
Table Level
***********
Table Number Empty Average Chain Average Global User Sample Date
Name of Rows Blocks Blocks Space Count Row Len Stats Stats Size MM-DD-YYYY
--------------- ------------------ -------- ------------ ------- -------- ------- ------ ------ ------------------ ----------
T_SKEW 86,413 1,262 0 0 0 96 YES NO 86,413 08-26-2019
Column Column Distinct Number Number Global User Sample Date
Name Details Values Density Buckets Nulls Stats Stats Size MM-DD-YYYY
------------------------- ------------------------ ------------ ------- ------- ---------- ------ ------ ------------------ ----------
OWNER VARCHAR2(30) 27 0 1 0 YES NO 86,413 08-26-2019
OBJECT_NAME VARCHAR2(128) 51,864 0 1 0 YES NO 86,413 08-26-2019
SUBOBJECT_NAME VARCHAR2(30) 87 0 1 86,152 YES NO 261 08-26-2019
OBJECT_ID NUMBER(22) 2 0 2 0 YES NO 5,389 08-26-2019
DATA_OBJECT_ID NUMBER(22) 8,670 0 1 77,703 YES NO 8,710 08-26-2019
OBJECT_TYPE VARCHAR2(19) 44 0 1 0 YES NO 86,413 08-26-2019
CREATED DATE 904 0 1 0 YES NO 86,413 08-26-2019
LAST_DDL_TIME DATE 995 0 1 0 YES NO 86,413 08-26-2019
TIMESTAMP VARCHAR2(19) 1,036 0 1 0 YES NO 86,413 08-26-2019
STATUS VARCHAR2(7) 2 1 1 0 YES NO 86,413 08-26-2019
TEMPORARY VARCHAR2(1) 2 1 1 0 YES NO 86,413 08-26-2019
GENERATED VARCHAR2(1) 2 1 1 0 YES NO 86,413 08-26-2019
SECONDARY VARCHAR2(1) 2 1 1 0 YES NO 86,413 08-26-2019
NAMESPACE NUMBER(22) 20 0 1 0 YES NO 86,413 08-26-2019
EDITION_NAME VARCHAR2(30) 0 0 0 86,413 YES NO 08-26-2019
B Average Average
Index Tree Leaf Distinct Number Leaf Blocks Data Blocks Cluster Global User Sample Date
Name Unique Level Blks Keys of Rows Per Key Per Key Factor Stats Stats Size MM-DD-YYYY
--------------- --------- ----- ---- -------------- ------------------ ----------- ----------- ------------ ------ ------ ------------------ ----------
IDX_T_SKEW NONUNIQUE 1 298 2 86,413 149 617 1,234 YES NO 86,413 08-26-2019
Index Column Col Column
Name Name Pos Details
--------------- ------------------------- ---- ------------------------
IDX_T_SKEW OBJECT_ID 1 NUMBER(22)
***************
Partition Level
***************
***************
SubPartition Level
***************
SQL>
3.场景测试
3.1 首先确认bind_peeking和acs都是开启状态
--查询隐藏参数:
set linesize 333
col name for a35
col description for a66
col value for a30
SELECT i.ksppinm name,
i.ksppdesc description,
CV.ksppstvl VALUE
FROM sys.x$ksppi i, sys.x$ksppcv CV
WHERE i.inst_id = USERENV ('Instance')
AND CV.inst_id = USERENV ('Instance')
AND i.indx = CV.indx
AND i.ksppinm LIKE '%¶m%'
ORDER BY REPLACE (i.ksppinm, '_', '');
--相关隐藏参数的默认值(表示bind_peeking和acs都是开启的):
NAME DESCRIPTION VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_optim_peek_user_binds enable peeking of user binds TRUE
_optimizer_adaptive_cursor_sharing optimizer adaptive cursor sharing TRUE
_optimizer_extended_cursor_sharing optimizer extended cursor sharing UDO
_optimizer_extended_cursor_sharing_ optimizer extended cursor sharing for relational operators SIMPLE
rel
3.2 场景测试用例和测试结果
--1)场景测试用例
alter session set current_schema=jingyu;
alter session set statistics_level=all;
set lines 200 pages 200
var v1 number;
exec :v1 := 2;
select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
exec :v1 := 3;
select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
--2)场景测试结果
SQL> alter system flush shared_pool;
SQL> alter session set current_schema=jingyu;
SQL> alter session set statistics_level=all;
SQL> set lines 200 pages 200
SQL>
--绑定变量值为2,第一次执行,采用INDEX RANGE SCAN的执行计划,Plan hash value: 3167530345:
SQL> var v1 number;
SQL> exec :v1 := 2;
SQL> select count(*) from t_skew where object_id = :v1;
COUNT(*)
----------
1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1
Plan hash value: 3167530345
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 2 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 2 | INDEX RANGE SCAN| IDX_T_SKEW | 1 | 16 | 1 |00:00:00.01 | 2 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=:V1)
--绑定变量值为3,第一次执行,沿用INDEX RANGE SCAN的执行计划,Plan hash value: 3167530345:
SQL> exec :v1 := 3;
SQL> select count(*) from t_skew where object_id = :v1;
COUNT(*)
----------
86412
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1
Plan hash value: 3167530345
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | | 2 |00:00:00.10 | 301 |
| 1 | SORT AGGREGATE | | 2 | 1 | 2 |00:00:00.10 | 301 |
|* 2 | INDEX RANGE SCAN| IDX_T_SKEW | 2 | 16 | 86413 |00:00:00.06 | 301 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=:V1)
--绑定变量值为3,第二次执行,变为INDEX FAST FULL SCAN的执行计划,Plan hash value: 2333720604:
SQL> select count(*) from t_skew where object_id = :v1;
COUNT(*)
----------
86412
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7mz2mhz0nq92n, child number 1
-------------------------------------
select count(*) from t_skew where object_id = :v1
Plan hash value: 2333720604
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.07 | 502 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.07 | 502 |
|* 2 | INDEX FAST FULL SCAN| IDX_T_SKEW | 1 | 86389 | 86412 |00:00:00.04 | 502 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OBJECT_ID"=:V1)
SQL>
可以看到,当第二次执行绑定变量值为3的SQL时,执行计划自适应调整了。
3.3 场景测试深入分析
You can use the V$ views for adaptive cursor sharing to see selectivity ranges, cursor information (such as whether a cursor is bind-aware or bind-sensitive), and execution statistics:
V$SQL shows whether a cursor is bind-sensitive or bind-aware
V$SQL_CS_HISTOGRAM shows the distribution of the execution count across a three-bucket execution history histogram
V$SQL_CS_SELECTIVITY shows the selectivity ranges stored for every predicate containing a bind variable if the selectivity was used to check cursor sharing
V$SQL_CS_STATISTICS summarizes the information that the optimizer uses to determine whether to mark a cursor bind-aware.
通过v$sql查看SQL(SQL_ID = '7mz2mhz0nq92n')的child_number, executions, buffer_gets, bind-sensitive, bind-aware, is_shareable信息:
SQL> SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BS",
2 IS_BIND_AWARE AS "BA", IS_SHAREABLE AS "SH", PLAN_HASH_VALUE
3 FROM V$SQL
4 WHERE SQL_ID = '7mz2mhz0nq92n';
CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
0 2 348 Y N N 3167530345
1 1 502 Y Y Y 2333720604
--再次分别执行绑定变量值为3和2的SQL:
SQL> select count(*) from t_skew where object_id = :v1;
COUNT(*)
----------
86412
SQL> exec :v1 := 2;
SQL> select count(*) from t_skew where object_id = :v1;
COUNT(*)
----------
1
--再次查询v$sql
CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
0 2 348 Y N N 3167530345
1 2 1004 Y Y Y 2333720604
2 1 2 Y Y Y 3167530345
可以看到目前该SQL的parent cursor下挂了3个child_number(0和1和2,其中1和2的SH值为Y,意思为可共享;0的SH值为N,意思为不可共享)。
通过v$sql_cs_*查询acs的相关信息:
--V$SQL_CS_HISTOGRAM
SQL> select * from V$SQL_CS_HISTOGRAM where sql_id = '7mz2mhz0nq92n';
ADDRESS HASH_VALUE SQL_ID CHILD_NUMBER BUCKET_ID COUNT
---------------- ---------- -------------------------- ------------ ---------- ----------
0000000087F34700 3242927188 7mz2mhz0nq92n 2 0 1
0000000087F34700 3242927188 7mz2mhz0nq92n 2 1 0
0000000087F34700 3242927188 7mz2mhz0nq92n 2 2 0
0000000087F34700 3242927188 7mz2mhz0nq92n 1 0 0
0000000087F34700 3242927188 7mz2mhz0nq92n 1 1 2
0000000087F34700 3242927188 7mz2mhz0nq92n 1 2 0
0000000087F34700 3242927188 7mz2mhz0nq92n 0 0 1
0000000087F34700 3242927188 7mz2mhz0nq92n 0 1 1
0000000087F34700 3242927188 7mz2mhz0nq92n 0 2 0
--V$SQL_CS_SELECTIVITY
SQL> col PREDICATE for a30
SQL> select * from V$SQL_CS_SELECTIVITY where sql_id = '7mz2mhz0nq92n';
ADDRESS HASH_VALUE SQL_ID CHILD_NUMBER PREDICATE RANGE_ID LOW HIGH
---------------- ---------- -------------------------- ------------ ------------------------------ ---------- -------------------- --------------------
0000000087F34700 3242927188 7mz2mhz0nq92n 2 =V1 0 0.000167 0.000204
0000000087F34700 3242927188 7mz2mhz0nq92n 1 =V1 0 0.899749 1.099694
SQL>
--V$SQL_CS_STATISTICS
SQL> select * from V$SQL_CS_STATISTICS where sql_id = '7mz2mhz0nq92n';
ADDRESS HASH_VALUE SQL_ID CHILD_NUMBER BIND_SET_HASH_VALUE PE EXECUTIONS ROWS_PROCESSED BUFFER_GETS CPU_TIME
---------------- ---------- -------------------------- ------------ ------------------- -- ---------- -------------- ----------- ----------
0000000087F34700 3242927188 7mz2mhz0nq92n 2 2064090006 Y 1 4 2 0
0000000087F34700 3242927188 7mz2mhz0nq92n 1 2706503459 Y 1 172826 502 0
0000000087F34700 3242927188 7mz2mhz0nq92n 0 2064090006 Y 1 4 49 0
SQL>
4.总结
实验相关知识点的总结:4.1 清理某条SQL的执行计划
--查询SQL的ADDRESS和HASH_VALUE
SQL> select sql_id, ADDRESS, HASH_VALUE from v$sqlarea where sql_id = '7mz2mhz0nq92n';
SQL_ID ADDRESS HASH_VALUE
-------------------------- ---------------- ----------
7mz2mhz0nq92n 0000000087F34700 3242927188
--清理SQL的执行计划
SQL> exec sys.DBMS_SHARED_POOL.PURGE('0000000087F34700,3242927188','C');
4.2 bind peeking和acs特性的关闭
--均为动态参数
--bind peeking(绑定变量窥探)
alter system set "_optim_peek_user_binds"=false;
--acs(adaptive cursor sharing)
alter system set "_optimizer_extended_cursor_sharing_rel"=NONE;
alter system set "_optimizer_extended_cursor_sharing"=NONE;
alter system set "_optimizer_adaptive_cursor_sharing"=false;
特别注意:如果bind peeking是关闭的,实际上acs也就不会起作用,比如我这里只将_optim_peek_user_binds参数设置为false,再次按照3.2步骤重复同样实验,查询结果如下,不会用到acs特性,即使我没有显示禁用掉acs对应的参数:
SQL> SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BS",
2 IS_BIND_AWARE AS "BA", IS_SHAREABLE AS "SH", PLAN_HASH_VALUE
3 FROM V$SQL
4 WHERE SQL_ID = '7mz2mhz0nq92n';
CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
0 3 1506 N N Y 2333720604
--可以看到这3次执行执行计划都是一样的,因为受到OPT_PARAM('_optim_peek_user_binds' 'false')影响,采用了INDEX FAST FULL SCAN的执行计划,Plan hash value: 2333720604:
SQL> select * from table(dbms_xplan.display_cursor('7mz2mhz0nq92n',0,'advanced'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1
Plan hash value: 2333720604
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 82 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FAST FULL SCAN| IDX_T_SKEW | 43207 | 126K| 82 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T_SKEW@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
DB_VERSION('11.2.0.4')
OPT_PARAM('_optim_peek_user_binds' 'false')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
INDEX_FFS(@"SEL$1" "T_SKEW"@"SEL$1" ("T_SKEW"."OBJECT_ID"))
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OBJECT_ID"=:V1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
所以在确认acs特性是否开启时,同时也要查询bind peek的设置情况。