Oracle内连接HASH JOIN/半连接HASH JOIN SEMI不处理驱动表连接列null值
Oracle内连接HASH JOIN/半连接HASH JOIN SEMI不处理驱动表连接列null值
先说内连接,比如有sql如下:
select count(*) from t,tt where t.col=tt.col;
测试表t,tt的信息如下:其中t表全是null值。
15:16:18 ZKM@zkm(37)> desc t 15:16:20 ZKM@zkm(37)> desc tt Name Null? Type Name Null? Type ------- -------- ------------- ------- -------- ------------- COL VARCHAR2(20) COL VARCHAR2(20) 15:17:40 ZKM@zkm(37)> select col,count(*) from t group by col; 15:17:43 ZKM@zkm(37)> select col,count(*) from tt group by col; COL COUNT(*) COL COUNT(*) ------ ---------- ------ ---------- 1000000 1 a 1 Elapsed: 00:00:00.16 Elapsed: 00:00:00.00
对于不同驱动表时,执行计划如下:(由于OMem、1Mem根据上次sql执行估算出来的,所以每个sql多执行几次,去最后一次结果)
select /*+ leading(t) */ count(*) from t,tt where t.col=tt.col select count(*) from t,tt where t.col=tt.col Plan hash value: 3743140366 Plan hash value: 1389617266 ----------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1590 | | | | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1596 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1590 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1596 | | | | |* 2 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 1590 | 1969K| 1969K| 464K (0)| |* 2 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 1596 | 2616K| 2616K| 786K (0)| |* 3 | TABLE ACCESS FULL| T | 1 | 3 | 0 |00:00:00.01 | 1590 | | | | |* 3 | TABLE ACCESS FULL| TT | 1 | 1 | 1 |00:00:00.01 | 6 | | | | |* 4 | TABLE ACCESS FULL| TT | 0 | 1 | 0 |00:00:00.01 | 0 | | | | |* 4 | TABLE ACCESS FULL| T | 1 | 3 | 0 |00:00:00.01 | 1590 | | | | ----------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): Predicate Information (identified by operation id): --------------------------------------------------- --------------------------------------------------- 2 - access("T"."COL"="TT"."COL") 2 - access("T"."COL"="TT"."COL") 3 - filter("T"."COL" IS NOT NULL) 3 - filter("TT"."COL" IS NOT NULL) 4 - filter("TT"."COL" IS NOT NULL) 4 - filter("T"."COL" IS NOT NULL)
值得一提的是,由于t表全是null值,这个时候t表做驱动表会导致被驱动表tt访问次数为0,看上边第一个执行计划id=4的Starts=0。
这个现象在我另外一篇博文里有提到:Oracle Hash Join被驱动表被访问0次的另外一种情况
这里还看不出来表t的连接列col全是null值是否在id=2中有没有被忽略。
一开始我是直接在t表再insert进去15000000个null值,然后执行同样的sql记录下执行计划,前后比对id=2处的时间时间消耗,OMem,1Mem,Used-Mem。
如下,逻辑读已经多了20多倍,而A-Time变为0.13s几乎无增加还是秒出,多出来的0.1s是消耗在id=3出而不是id=2,由此可证。
Plan hash value: 3743140366 ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.13 | 24388 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.13 | 24388 | | | | |* 2 | HASH JOIN | | 1 | 1 | 0 |00:00:00.13 | 24388 | 1969K| 1969K| 501K (0)| |* 3 | TABLE ACCESS FULL| T | 1 | 3 | 0 |00:00:00.13 | 24388 | | | | |* 4 | TABLE ACCESS FULL| TT | 0 | 1 | 0 |00:00:00.01 | 0 | | | | ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T"."COL"="TT"."COL") 3 - filter("T"."COL" IS NOT NULL) 4 - filter("TT"."COL" IS NOT NULL)
后来想到另外一个的办法,直接创建索引即可。
create index idx_col on t(col);
再次执行SQL如下:
可以看出来,执行计划直接连表t都不访问了,而是访问IDX_COL。防小偷。
我们知道,单列普通索引是不存储null值的,也就是在id=3就已经过滤了null,综合对比前边的A-Time,OMem,1Mem,Used-Mem可以得知确实在HASH JOIN步骤,对于null值是不会在pga中计算hash值来做比对的。
select /*+ leading(t) */ count(*) from t,tt where t.col=tt.col Plan hash value: 497415060 -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 | | | | |* 2 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 1 | 1969K| 1969K| 460K (0)| |* 3 | INDEX FULL SCAN | IDX_COL | 1 | 3 | 0 |00:00:00.01 | 1 | | | | |* 4 | TABLE ACCESS FULL| TT | 0 | 1 | 0 |00:00:00.01 | 0 | | | | -------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T"."COL"="TT"."COL") 3 - filter("T"."COL" IS NOT NULL) 4 - filter("TT"."COL" IS NOT NULL)
再说半连接,比如有sql如下:
select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt);
同样的道理,有索引和无索引的情况如下:
select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt) select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt) Plan hash value: 1176692913 Plan hash value: 1600321607 ----------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.14 | 24388 | | | | | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 1 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.14 | 24388 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 1 | | | | |* 2 | HASH JOIN SEMI | | 1 | 1 | 0 |00:00:00.14 | 24388 | 1969K| 1969K| 501K (0)| |* 2 | HASH JOIN SEMI | | 1 | 1 | 0 |00:00:00.01 | 1 | 1969K| 1969K| 467K (0)| |* 3 | TABLE ACCESS FULL| T | 1 | 3 | 0 |00:00:00.14 | 24388 | | | | |* 3 | INDEX FULL SCAN | IDX_COL | 1 | 3 | 0 |00:00:00.01 | 1 | | | | |* 4 | TABLE ACCESS FULL| TT | 0 | 1 | 0 |00:00:00.01 | 0 | | | | |* 4 | TABLE ACCESS FULL| TT | 0 | 1 | 0 |00:00:00.01 | 0 | | | | ----------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): Predicate Information (identified by operation id): --------------------------------------------------- --------------------------------------------------- 2 - access("T"."COL"="TT"."COL") 2 - access("T"."COL"="TT"."COL") 3 - filter("T"."COL" IS NOT NULL) 3 - filter("T"."COL" IS NOT NULL) 4 - filter("TT"."COL" IS NOT NULL) 4 - filter("TT"."COL" IS NOT NULL)
一样的道理,不再赘述。