oracle 验证流水存在性火箭试优化
在生产中经常遇到“select * from tbl_IsExist where date=?”的SQL,经与开发人员沟通得知此SQL是验证流水存在性,若不存在则插入,若存在退出。
前台根据list的size值来判断符合条件的数据是否存在。我们构建测试表分析验证流水存在性的优化方向。
1、构建测试表
--创建测试表 drop table tbl_IsExist purge; create table tbl_IsExist (id number not null ,in_create date,in_remark varchar2(1000), primary key(id) ); --插入测试数据 insert into tbl_IsExist select rownum,to_date(trunc(dbms_random.value(to_number(to_char(sysdate-5,'J')),to_number(to_char(sysdate,'J')))),'J'),rpad('*',1000,'*') from dual connect by rownum<20000; --根据生产创建索引 SQL> create index ind_isexist_create on tbl_isexist(in_create); Index created.
由于生产环境date的数据是均匀分配的,对于in_create字段的处理符合生产规则。执行生产中捕获的SQL,查看其执行计划。
2、发现验证存在性的SQL执行计划是全表扫描,代价很高:
--设置输出参数 SQL> set linesize 2000 SQL> set autotrace on; --执行查询脚本 SQL> select * from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS'); 3962 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2271097464 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3269 | 1672K| 777 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL | TBL_ISEXIST | 3269 | 1672K| 777 (1)| 00:00:01 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3098 consistent gets 0 physical reads 0 redo size 90787 bytes sent via SQL*Net to client 3362 bytes received via SQL*Net from client 266 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3962 rows processed
表上有供选择的索引,但是SQL并没有执行,此SQL的执行计划是全表扫描,cost为777,逻辑读3098数据块。若表中的记录很多,全表扫描的执行计划将耗用大量的逻辑读。
3、开始优化,验证流水存在性SQL
①优化思路1:
表中有可用索引,优化器未采纳是否由于代价高,进一步验证:
用hint强制走索引,查看执行计划 SQL> select /*+ INDEX(T IND_ISEXIST_CREATE)*/* from tbl_isexist T where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS'); 3962 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1453123986 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3269 | 1672K| 2004 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TBL_ISEXIST | 3269 | 1672K| 2004 (1)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ISEXIST_CREATE | 3269 | | 10 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2648 consistent gets 0 physical reads 0 redo size 4126831 bytes sent via SQL*Net to client 3397 bytes received via SQL*Net from client 266 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3962 rows processed
强制走索引cost是2004比走全表扫描代价更大,oracle是基于代价的优化器,故舍弃索引扫描的执行计划
②优化思路2:
在行数上做文章,验证流水存在性是否需要将所有记录都返回。因为在java中根据list的值不为0判断存在性,设想满足条件的第一条记录存在,岂不已经验证存在性。
现改写SQL查看其执行计划:
--只取第一条符合条件的记录,加限制条件rownum=1 SQL> select * from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS') and rownum=1; Execution Plan ---------------------------------------------------------- Plan hash value: 1281543153 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 524 | 2 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | TBL_ISEXIST | 3269 | 1672K| 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IND_ISEXIST_CREATE | | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM=1) 3 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 1725 bytes sent via SQL*Net to client 471 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
我们由取全部符合条件的记录,改为只取第一行符合条件的记录,cost由777降为2,逻辑读由3098降为3,提速几百倍,甚是欣喜。高兴之余,试想是否还有可优化空间。分析执行计划:INDEX RANGE SCAN花去了1个cost,TABLE ACCESS BY INDEX ROWID BATCHED也就是回表花去了1个cost(2-1)。
回表的原理:查询的列不在索引中,需要通过索引的rowid定位表记录,把需要的字段取出,展示给客户端。
③优化思路3:
不回表降低cost,在SQL返回的列上做文章。只显示索引“IND_ISEXIST_CREATE”的列in_create,既能降低cost也能满足需求,完全可以。
SQL> select in_create from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS') and rownum=1; Execution Plan ---------------------------------------------------------- Plan hash value: 3010836204 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 9 | 1 (0)| 00:00:01| |* 1 | COUNT STOPKEY | | | | | | |* 2 | INDEX RANGE SCAN | IND_ISEXIST_CREATE | 3269 | 29421 | 1 (0)| 00:00:01| ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM=1) 2 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2 consistent gets 0 physical reads 0 redo size 556 bytes sent via SQL*Net to client 479 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
至此优化结束,代价从777降到1,逻辑读从3098降到2,可谓是火箭试的提速!
其他问题请关注目录:https://www.cnblogs.com/handhead/