代码改变世界

SQL JOIN-Hash Join

2012-10-25 23:53  Mike.Jiang  阅读(2270)  评论(0编辑  收藏  举报

1概述

hash join 在特性与merge join相同,都需要一个等值条件。当在连接条上无法命中索引,或大集合的Join, nested join和 merge join可能就无法得到很好的性能,这时我们就需要考虑用hash join.

2基本算法

Hash join 分为两个阶段,build和probe。在build阶段,会将其中一个集合作为build set,然后hash build table在连接条件上的列,并将结果存储在内存中的(命名为build hash table).  在probe阶段(将第二个集合命名为probe set),每一行hash probe set在连接条件上的列,然后与build hash table比较,如果相等,则返回。

伪代码:

for each row R1 in the build table
    begin
        calculate hash value on R1 join key(s)
        insert R1 into the appropriate hash bucket
    end
for each row R2 in the probe table
    begin
        calculate hash value on R2 join key(s)
        for each row R1 in the corresponding hash bucket
            if R1 joins with R2
                return (R1, R2)
    end

3 示例

测试数据

View Code
create table T1 (a int, b int, x char(200))

create table T2 (a int, b int, x char(200))

create table T3 (a int, b int, x char(200))

 

set nocount on

declare @i int

set @i = 0

while @i < 1000

  begin

    insert T1 values (@i * 2, @i * 5, @i)

    set @i = @i + 1

  end

set @i = 0

while @i < 10000

  begin

    insert T2 values (@i * 3, @i * 7, @i)

    set @i = @i + 1

  end

set @i = 0

while @i < 100000

  begin

    insert T3 values (@i * 5, @i * 11, @i)

    set @i = @i + 1

  end

执行SQL:

SET STATISTICS PROFILE ON
select *
from 
    (
        T1 inner join T2 on T1.a = T2.a
    )
    inner join T3 on T1.b = T3.a
option (hash join)

执行结果: