Oracle :Insert ways.
2011-07-19 11:44 Tracy. 阅读(272) 评论(0) 编辑 收藏 举报You Asked
I'am working on enhancing the performance of some long running queries in our application. The existing code uses the NOT EXISTS clause extensively and I wondered if it would be a good idea to use OUTER JOINS instead, wherever possible. For example, all else being equal, which of the following would execute faster ? Insert into Title(......) select ..... from rollup R where not exists ( select 'Found' from title T where R.source_id = T.Title_ID); Insert into Title(.......) select ...... from title T, rollup R where R.source_id = T.Title_id(+) and T.Title_id is null; Note that Rollup has about 5 million rows and Title about 1 million rows and both are indexed on the join fields. In trials, I did notice that the latter was about 30% faster but does that have a theoritical basis ? Would appreciate an early reply. Thanks, Sunder
and we said...
There are at least 3 ways to do this above (minus might be a 4'th depending on your circumstances). The way with NOT exists works well for small OUTER tables (or outer queries that have other predicates the result in "smallish" result sets before we have to do the correlated subquery). A NOT exists works something like this: for x in ( select * from rollup ) loop if ( not exists ( that query ) ) then OUTPUT end if; end it in effect runs the subquery over and over and over.... The outer join, method number 2, allows us to do more large scale, bulk type operations. Anything we do in bulk, we can probably do faster. So, by not running that subquery over and over, we gained some efficiency. It'll depend on your indexes and such as well how well this goes. The bulk operations we do -- thats your "theroretical basis". A 3'rd alternative is a NOT IN query. As long as you understand that NOT IN and NOT EXISTS are semantically different, this might run even faster yet. NOT IN will not work correctly if the subquery returns a NULL (the result set will be empty if it does). So, we have to add a filter "where title_id is NOT NULL" to the subquery to make this be the same. Here is a very very small test showing the three methods. Existence (or lack of) indexes may effect this test and so on. You might consider the NOT IN with the HASH_AJ hint (although using the CBO should have this happen automatically -- if the NOT IN runs really really slow, its not using the HASH_AJ -- you'd have to hint it in that case). this test is not a benchmark, it is just illustrative of the three methods and contains some comments on their plans. DOC>create table t as select * from all_objects; DOC>create index t_idx on t(object_id); DOC>create table t2 as select * from all_objects where mod(object_id,50) = 0; DOC>create index t2_idx on t2(object_id); DOC>*/ ops$tkyte@DEV8I.WORLD> set autotrace on explain ops$tkyte@DEV8I.WORLD> select count(object_name) 2 from t 3 where NOT EXISTS( select null 4 from t2 5 where t2.object_id = t.object_id ) 6 / COUNT(OBJECT_NAME) ------------------ 22251 Elapsed: 00:00:00.85 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'T' 4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) Here, it full scans T and does that subquery for each row -- thats the index probe we see in there, the subquery running ops$tkyte@DEV8I.WORLD> ops$tkyte@DEV8I.WORLD> select /*+ HASH_AJ */ count(object_name) 2 from t 3 where object_id NOT IN ( select object_id 4 from t2 5 where object_id IS NOT NULL ) 6 / COUNT(OBJECT_NAME) ------------------ 22251 Elapsed: 00:00:00.67 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 INDEX (FAST FULL SCAN) OF 'T_IDX' (NON-UNIQUE) 4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) There is the "anti join" method with may or may not run a little better in your environment. Its worth trying though ops$tkyte@DEV8I.WORLD> select count(t.object_name) 2 from t, t2 3 where t.object_id = t2.object_id(+) 4 and t2.object_id is NULL 5 / COUNT(T.OBJECT_NAME) -------------------- 22251 Elapsed: 00:00:00.75 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 NESTED LOOPS (OUTER) 4 3 TABLE ACCESS (FULL) OF 'T' 5 3 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) And then the outer join method -- in this PARTICULAR example it fell between the not exists and the NOT IN. Your results may vary ;) ops$tkyte@DEV8I.WORLD> set autotrace off
本文来自博客园,作者:Tracy.,转载请注明原文链接:https://www.cnblogs.com/tracy/archive/2011/07/19/2110399.html