Oracle Update语句中多表关联中被关联表多次全表扫描问题
Oracle Update语句中多表关联中被关联表多次全表扫描问题
前言
最近优化了一个update语句中,多表关联导致表多次全表扫描的性能问题。
尝试用merge into改写后发现原来不知道多久能运行完的语句达到秒级别执行完,因为merge into可以避免多次的全表扫描。
比较好模拟,接下来模拟一下,也好记录一下有些遇到的小问题。
事故模拟
这个是我实际生产优化的数据和语句的模拟。
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
drop table t1 purge; drop table t2 purge; create table t1 as select * from dba_objects where rownum<=100; create table t2 as select * from dba_objects where rownum<=50;
10:34:30 SYS@zkm(1)> drop table t1 purge; Table dropped. Elapsed: 00:00:00.01 10:35:27 SYS@zkm(1)> drop table t2 purge; Table dropped. Elapsed: 00:00:00.01 10:35:27 SYS@zkm(1)> create table t1 as select * from dba_objects where rownum<=100; Table created. Elapsed: 00:00:00.01 10:35:27 SYS@zkm(1)> create table t2 as select * from dba_objects where rownum<=50; Table created. Elapsed: 00:00:00.01
update的语句如下:
update t1 set (t1.owner, t1.object_name) = (select t2.owner, t2.object_name from t2 where t1.object_id = t2.object_id) where exists (select 1 from t2 where t1.object_id = t2.object_id);
开启statistics_level为all,看看执行后对被关联表t2的执行次数是多少次。
10:43:41 SYS@zkm(1)> set pagesize 9999 long 9999 line 500 timing on 10:45:14 SYS@zkm(1)> alter session set statistics_level=all; Session altered. Elapsed: 00:00:00.00 10:45:14 SYS@zkm(1)> update t1 10:45:19 2 set (t1.owner, t1.object_name) = 10:45:19 3 (select t2.owner, t2.object_name 10:45:19 4 from t2 10:45:19 5 where t1.object_id = t2.object_id) 10:45:19 6 where exists (select 1 from t2 where t1.object_id = t2.object_id); 50 rows updated. Elapsed: 00:00:00.00 10:45:20 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------- SQL_ID aq3ptnvdvmnpq, child number 0 ------------------------------------- update t1 set (t1.owner, t1.object_name) = (select t2.owner, t2.object_name from t2 where t1.object_id = t2.object_id) where exists (select 1 from t2 where t1.object_id = t2.object_id) Plan hash value: 1571583455 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.01 | 209 | | | | | 1 | UPDATE | T1 | 1 | | 0 |00:00:00.01 | 209 | | | | |* 2 | HASH JOIN RIGHT SEMI| | 1 | 100 | 50 |00:00:00.01 | 7 | 2440K| 2440K| 1515K (0)| | 3 | VIEW | VW_SQ_1 | 1 | 50 | 50 |00:00:00.01 | 3 | | | | | 4 | TABLE ACCESS FULL | T2 | 1 | 50 | 50 |00:00:00.01 | 3 | | | | | 5 | TABLE ACCESS FULL | T1 | 1 | 100 | 100 |00:00:00.01 | 4 | | | | |* 6 | TABLE ACCESS FULL | T2 | 50 | 1 | 50 |00:00:00.01 | 150 | | | | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."OBJECT_ID"="ITEM_1") 6 - filter("T2"."OBJECT_ID"=:B1) Note ----- - dynamic sampling used for this statement (level=2) 31 rows selected. Elapsed: 00:00:00.01 10:45:22 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.01 10:45:24 SYS@zkm(1)>
可以看到id=6的starts的值为50,表示id为6的t2全表扫描被执行了50次,在生产事故中,t2表是比较大的,并且扫描次数相对较多,因此update执行时间是非常久的。
这里id=1的子步骤为id=2和id=6,此处的id=1的UPDATE操作类似和“NESTED LOOPS”一样,id=2出有多少条符合条件的记录,则id=6就执行多少次。
比如这里的,t1被update的记录通过“where exists (select 1 from t2 where t1.object_id = t2.object_id)”后,仍有50行符合可以被update,因此这50行中每行都会去全表扫t2一次。
10:56:00 SYS@zkm(1)> select count(*) from t1 where exists (select 1 from t2 where t1.object_id = t2.object_id); COUNT(*) ---------- 50 Elapsed: 00:00:00.00
使用merge into改造后,效率变高(数据量小看不出执行时间的差别,你可以自己构造巨量数据去比较,这里主要看t2全表扫描的次数),
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
merge into t1 using t2 on (t1.object_id = t2.object_id) when matched then update set t1.owner = t2.owner, t1.object_name = t2.object_name;
11:00:23 SYS@zkm(1)> merge into t1 11:00:23 2 using t2 11:00:23 3 on (t1.object_id = t2.object_id) 11:00:23 4 when matched then 11:00:23 5 update set t1.owner = t2.owner, t1.object_name = t2.object_name; 50 rows merged. Elapsed: 00:00:00.00 11:00:23 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------- SQL_ID agt8fym6y8hug, child number 0 ------------------------------------- merge into t1 using t2 on (t1.object_id = t2.object_id) when matched then update set t1.owner = t2.owner, t1.object_name = t2.object_name Plan hash value: 2683531971 ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------ | 0 | MERGE STATEMENT | | 1 | | 0 |00:00:00.01 | 59 | | | | | 1 | MERGE | T1 | 1 | | 0 |00:00:00.01 | 59 | | | | | 2 | VIEW | | 1 | | 50 |00:00:00.01 | 7 | | | | |* 3 | HASH JOIN | | 1 | 50 | 50 |00:00:00.01 | 7 | 870K| 870K| 1289K (0)| | 4 | TABLE ACCESS FULL| T2 | 1 | 50 | 50 |00:00:00.01 | 3 | | | | | 5 | TABLE ACCESS FULL| T1 | 1 | 100 | 100 |00:00:00.01 | 4 | | | | ------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID") Note ----- - dynamic sampling used for this statement (level=2) 27 rows selected. Elapsed: 00:00:00.01 11:00:26 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00
对比可以知道,merge into的t2只扫描了一次,并且其他维度比如Buffers,OMem,1Mem,Used-Mem都减少了。
merge into的限制
重新构造另外的数据环境。
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
drop table t1 purge; drop table t2 purge; create table t1 ( id number,name varchar2(20)); create table t2 ( id number,name varchar2(20)); insert into t1 values (1,'t1a'); insert into t1 values (2,'t1b'); insert into t1 values (3,'t1c'); insert into t1 values (4,'t1d'); insert into t1 values (5,'t1e'); insert into t1 values (6,'t1f'); insert into t2 values (2,'t2a'); insert into t2 values (3,'t2b'); insert into t2 values (4,'t2c'); insert into t2 values (5,'t2d'); insert into t2 values (6,'t2e'); insert into t2 values (7,'t2f'); commit;
11:04:36 SYS@zkm(1)> drop table t1 purge; Table dropped. Elapsed: 00:00:00.01 11:04:42 SYS@zkm(1)> drop table t2 purge; Table dropped. Elapsed: 00:00:00.01 11:04:43 SYS@zkm(1)> create table t1 ( id number,name varchar2(20)); Table created. Elapsed: 00:00:00.01 11:04:49 SYS@zkm(1)> create table t2 ( id number,name varchar2(20)); Table created. Elapsed: 00:00:00.00 11:04:49 SYS@zkm(1)> insert into t1 values (1,'t1a'); 1 row created. Elapsed: 00:00:00.00 11:04:52 SYS@zkm(1)> insert into t1 values (2,'t1b'); 1 row created. Elapsed: 00:00:00.00 11:04:52 SYS@zkm(1)> insert into t1 values (3,'t1c'); 1 row created. Elapsed: 00:00:00.01 11:04:52 SYS@zkm(1)> insert into t1 values (4,'t1d'); 1 row created. Elapsed: 00:00:00.00 11:04:52 SYS@zkm(1)> insert into t1 values (5,'t1e'); 1 row created. Elapsed: 00:00:00.00 11:04:52 SYS@zkm(1)> insert into t1 values (6,'t1f'); 1 row created. Elapsed: 00:00:00.00 11:04:53 SYS@zkm(1)> insert into t2 values (2,'t2a'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> insert into t2 values (3,'t2b'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> insert into t2 values (4,'t2c'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> insert into t2 values (5,'t2d'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> insert into t2 values (6,'t2e'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> insert into t2 values (7,'t2f'); 1 row created. Elapsed: 00:00:00.00 11:04:56 SYS@zkm(1)> commit; Commit complete. Elapsed: 00:00:00.00 11:04:57 SYS@zkm(1)> select * from t1; ID NAME ---------- ------------------------------------------------------------ 1 t1a 2 t1b 3 t1c 4 t1d 5 t1e 6 t1f 6 rows selected. Elapsed: 00:00:00.00 11:05:15 SYS@zkm(1)> select * from t2; ID NAME ---------- ------------------------------------------------------------ 2 t2a 3 t2b 4 t2c 5 t2d 6 t2e 7 t2f 6 rows selected. Elapsed: 00:00:00.00
对于update语句,
update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);
无法改造成merge into,如下:
merge into t1 using t2 on (t1.id = t2.id) when matched then update set t1.id = t2.id, t1.name = t2.name;
因为merge into中不能对匹配字段id进行update,会报ORA-38104,
11:12:44 SYS@zkm(1)> merge into t1 11:13:18 2 using t2 11:13:18 3 on (t1.id = t2.id) 11:13:18 4 when matched then 11:13:18 5 update set t1.id = t2.id, t1.name = t2.name; on (t1.id = t2.id) * ERROR at line 3: ORA-38104: Columns referenced in the ON Clause cannot be updated: "T1"."ID" Elapsed: 00:00:00.00
这种情况下,我只能想到在t2的id字段创建索引,避免多次的全表扫描。
11:15:33 SYS@zkm(1)> set pagesize 9999 long 9999 line 500 timing on 11:15:34 SYS@zkm(1)> alter session set statistics_level=all; Session altered. Elapsed: 00:00:00.00 11:15:34 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id); 5 rows updated. Elapsed: 00:00:00.00 11:15:35 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID c124snsjck4d6, child number 1 ------------------------------------- update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id) Plan hash value: 2611650519 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | | 0 |00:00:00.01 | 22 | | 1 | UPDATE | T1 | 1 | | 0 |00:00:00.01 | 22 | |* 2 | FILTER | | 1 | | 5 |00:00:00.01 | 9 | | 3 | TABLE ACCESS FULL | T1 | 1 | 6 | 6 |00:00:00.01 | 3 | |* 4 | INDEX RANGE SCAN | IDX_ID | 6 | 1 | 5 |00:00:00.01 | 6 | | 5 | TABLE ACCESS BY INDEX ROWID| T2 | 5 | 1 | 5 |00:00:00.01 | 10 | |* 6 | INDEX RANGE SCAN | IDX_ID | 5 | 1 | 5 |00:00:00.01 | 5 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter( IS NOT NULL) 4 - access("T2"."ID"=:B1) 6 - access("T2"."ID"=:B1) Note ----- - dynamic sampling used for this statement (level=2) 31 rows selected. Elapsed: 00:00:00.01 11:15:36 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00
注意点
一开始调试的时候,我以为where条件是可以去掉的,但其实不对。
什么意思呢?比如
update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);
我以为是和
update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id);
等价的,但是其实不等价。
实际执行看看就知道了,
11:19:34 SYS@zkm(1)> drop index idx_id; Index dropped. Elapsed: 00:00:00.01 11:22:37 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id); 5 rows updated. Elapsed: 00:00:00.01 11:22:39 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00 11:22:42 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id); 6 rows updated. Elapsed: 00:00:00.00 11:22:46 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00
为什么?
where exists实际上是锁定其批量更新数据的范围,update的时候是根据被update的表t1的每一条数据去做更新的。
比如语句update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id),没有任何条件的话那么肯定是全部的行都要被进行update的,在子查询中如果不符合t1.id=t2.id的行,被更新字段是会被值为null的,如下:
11:37:31 SYS@zkm(1)> select * from t1; ID NAME ---------- ------------------------------------------------------------ 1 t1a 2 t1b 3 t1c 4 t1d 5 t1e 6 t1f 6 rows selected. Elapsed: 00:00:00.00 11:37:33 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id); 6 rows updated. Elapsed: 00:00:00.00 11:37:36 SYS@zkm(1)> select * from t1; ID NAME ---------- ------------------------------------------------------------ 2 t2a 3 t2b 4 t2c 5 t2d 6 t2e 6 rows selected. Elapsed: 00:00:00.00 11:37:39 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00
因为id=1在t2中匹配不到(可以理解为t2表中为null),但是又必须被更新(前边说了没限定条件是要全部被更新的),所以被设置为null。
可以这么看,当取出t1表中第一行(id=1,name=t1a)后,将该行中t1.id和t2.id进行等价匹配(这次进行一次t2的全表扫描),发现没有符合条件的数据,因此只能都设置为null。
因此,where exists的限定是必须要有的,我们来看看,
11:40:55 SYS@zkm(1)> select * from t1; ID NAME ---------- ------------------------------------------------------------ 1 t1a 2 t1b 3 t1c 4 t1d 5 t1e 6 t1f 6 rows selected. Elapsed: 00:00:00.00 11:41:09 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id); 5 rows updated. Elapsed: 00:00:00.00 11:41:10 SYS@zkm(1)> select * from t1; ID NAME ---------- ------------------------------------------------------------ 1 t1a 2 t2a 3 t2b 4 t2c 5 t2d 6 t2e 6 rows selected. Elapsed: 00:00:00.00 11:41:14 SYS@zkm(1)> rollback; Rollback complete. Elapsed: 00:00:00.00
可以这么看,对t1全表扫描后,取出的数据首先判断是否满足where exists的条件。
比如第一行(id=1,name=t1a)不满足where exists (select t2.id,t2.name from t2 where t1.id=t2.id),因此,不对这一行做update。
剩下的5行全部满足条件,所以进行update。