对PostgreSQL Merge Join 的理解
开始
伪代码:
http://momjian.us/main/writings/pgsql/optimizer.pdf sort(outer); sort(inner); i = 0; j = 0; save_j = 0; while (i < length(outer)) { if (outer[i] == inner[j]) output(outer[i], inner[j]); if (outer[i] <= inner[j] && j < length(inner)) { j++; if (outer[i] < inner[j]) save_j = j; } else { i++; j = save_j; } }
上述描述中,可以把两列排序好的数组看成 由大到小排列。
Merge Join 先要对各表各自排序,然后从各自的排序表中抽取数据,到另一个排序表中做匹配。
通常来讲,能够使用merge join的地方,hash join 更快。
验证:
postgres=# EXPLAIN SELECT relname,nspname FROM pg_class join pg_namespace ON (pg_class.relnamespace = pg_namespace.oid); QUERY PLAN ------------------------------------------------------------------------- Hash Join (cost=1.14..16.02 rows=290 width=128) Hash Cond: (pg_class.relnamespace = pg_namespace.oid) -> Seq Scan on pg_class (cost=0.00..10.90 rows=290 width=68) -> Hash (cost=1.06..1.06 rows=6 width=68) -> Seq Scan on pg_namespace (cost=0.00..1.06 rows=6 width=68) (5 rows) postgres=# postgres=# set session enable_hashjoin=false; SET postgres=# EXPLAIN SELECT relname,nspname FROM pg_class join pg_namespace ON (pg_class.relnamespace = pg_namespace.oid); QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=23.90..28.28 rows=290 width=128) Merge Cond: (pg_namespace.oid = pg_class.relnamespace) -> Sort (cost=1.14..1.15 rows=6 width=68) Sort Key: pg_namespace.oid -> Seq Scan on pg_namespace (cost=0.00..1.06 rows=6 width=68) -> Sort (cost=22.76..23.49 rows=290 width=68) Sort Key: pg_class.relnamespace -> Seq Scan on pg_class (cost=0.00..10.90 rows=290 width=68) (8 rows) postgres=#
[作者:技术者高健@博客园 mail: luckyjackgao@gmail.com ]
改了 join 的顺序 对 结果也没有影响。
postgres=# EXPLAIN SELECT relname,nspname FROM pg_namespace join pg_class ON (pg_class.relnamespace = pg_namespace.oid); QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=23.90..28.28 rows=290 width=128) Merge Cond: (pg_namespace.oid = pg_class.relnamespace) -> Sort (cost=1.14..1.15 rows=6 width=68) Sort Key: pg_namespace.oid -> Seq Scan on pg_namespace (cost=0.00..1.06 rows=6 width=68) -> Sort (cost=22.76..23.49 rows=290 width=68) Sort Key: pg_class.relnamespace -> Seq Scan on pg_class (cost=0.00..10.90 rows=290 width=68) (8 rows) postgres=#
postgres=# EXPLAIN SELECT relname,nspname FROM pg_namespace, pg_class where (pg_class.relnamespace = pg_namespace.oid); QUERY PLAN ------------------------------------------------------------------------- Merge Join (cost=23.90..28.28 rows=290 width=128) Merge Cond: (pg_namespace.oid = pg_class.relnamespace) -> Sort (cost=1.14..1.15 rows=6 width=68) Sort Key: pg_namespace.oid -> Seq Scan on pg_namespace (cost=0.00..1.06 rows=6 width=68) -> Sort (cost=22.76..23.49 rows=290 width=68) Sort Key: pg_class.relnamespace -> Seq Scan on pg_class (cost=0.00..10.90 rows=290 width=68) (8 rows) postgres=#
结束