对PostgreSQL Merge Join 的理解

开始

伪代码:

http://momjian.us/main/writings/pgsql/optimizer.pdf

sort(outer);            
sort(inner);            
i = 0;            
j = 0;            
save_j = 0;            
while (i < length(outer))            
{            
    if (outer[i] == inner[j])        
        output(outer[i], inner[j]);    
            
            
    if (outer[i] <= inner[j] && j < length(inner))        
    {        
        j++;    
        if (outer[i] < inner[j])    
            save_j = j;
    }        
    else        
    {        
        i++;    
        j = save_j;    
    }        
}            

上述描述中,可以把两列排序好的数组看成 由大到小排列。

 Merge Join 先要对各表各自排序,然后从各自的排序表中抽取数据,到另一个排序表中做匹配。
通常来讲,能够使用merge join的地方,hash join 更快。

验证:

postgres=# EXPLAIN SELECT relname,nspname FROM pg_class join
        pg_namespace ON (pg_class.relnamespace = pg_namespace.oid);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Hash Join  (cost=1.14..16.02 rows=290 width=128)
   Hash Cond: (pg_class.relnamespace = pg_namespace.oid)
   ->  Seq Scan on pg_class  (cost=0.00..10.90 rows=290 width=68)
   ->  Hash  (cost=1.06..1.06 rows=6 width=68)
         ->  Seq Scan on pg_namespace  (cost=0.00..1.06 rows=6 width=68)
(5 rows)

postgres=# 



postgres=# set session enable_hashjoin=false;
SET
postgres=# EXPLAIN SELECT relname,nspname FROM pg_class join
        pg_namespace ON (pg_class.relnamespace = pg_namespace.oid);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Merge Join  (cost=23.90..28.28 rows=290 width=128)
   Merge Cond: (pg_namespace.oid = pg_class.relnamespace)
   ->  Sort  (cost=1.14..1.15 rows=6 width=68)
         Sort Key: pg_namespace.oid
         ->  Seq Scan on pg_namespace  (cost=0.00..1.06 rows=6 width=68)
   ->  Sort  (cost=22.76..23.49 rows=290 width=68)
         Sort Key: pg_class.relnamespace
         ->  Seq Scan on pg_class  (cost=0.00..10.90 rows=290 width=68)
(8 rows)

postgres=# 

[作者:技术者高健@博客园  mail: luckyjackgao@gmail.com ]

改了 join 的顺序 对 结果也没有影响。

postgres=# EXPLAIN SELECT relname,nspname FROM pg_namespace join
        pg_class ON (pg_class.relnamespace = pg_namespace.oid);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Merge Join  (cost=23.90..28.28 rows=290 width=128)
   Merge Cond: (pg_namespace.oid = pg_class.relnamespace)
   ->  Sort  (cost=1.14..1.15 rows=6 width=68)
         Sort Key: pg_namespace.oid
         ->  Seq Scan on pg_namespace  (cost=0.00..1.06 rows=6 width=68)
   ->  Sort  (cost=22.76..23.49 rows=290 width=68)
         Sort Key: pg_class.relnamespace
         ->  Seq Scan on pg_class  (cost=0.00..10.90 rows=290 width=68)
(8 rows)

postgres=# 
postgres=# EXPLAIN SELECT relname,nspname FROM pg_namespace,
        pg_class where (pg_class.relnamespace = pg_namespace.oid);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Merge Join  (cost=23.90..28.28 rows=290 width=128)
   Merge Cond: (pg_namespace.oid = pg_class.relnamespace)
   ->  Sort  (cost=1.14..1.15 rows=6 width=68)
         Sort Key: pg_namespace.oid
         ->  Seq Scan on pg_namespace  (cost=0.00..1.06 rows=6 width=68)
   ->  Sort  (cost=22.76..23.49 rows=290 width=68)
         Sort Key: pg_class.relnamespace
         ->  Seq Scan on pg_class  (cost=0.00..10.90 rows=290 width=68)
(8 rows)

postgres=# 

结束

posted @ 2012-11-08 16:49  健哥的数据花园  阅读(5391)  评论(1编辑  收藏  举报