PostgreSQL技术大讲堂 - 第30讲:多表连接方式
PostgreSQL从小白到专家,是从入门逐渐能力提升的一个系列教程,内容包括对PG基础的认知、包括安装使用、包括角色权限、包括维护管理、、等内容,希望对热爱PG、学习PG的同学们有帮助,欢迎持续关注CUUG PG技术大讲堂。
第30讲:多表连接方式
第30讲预告:9月23日(周六)19:30-20:30,钉钉群直播,群号:35822460
内容1 : Nested Loop Join连接方式
内容2 : Merge Join连接方式
内容3 : Hash Join连接方式
多表连接方式
多表连接方式
三种连接方式:
nested loop join
merge join
hash join
支持所有join操作:
NATURAL INNER JOIN
INNER JOIN
LEFT/RIGHT OUTER JOIN
FULL OUTER JOIN
嵌套循环连接方式
Nested Loop Join
嵌套循环联接是最基本的联接操作,它可以用于任何联接条件。
Nested Loop Join图解
Materialized Nested Loop Join
我们使用下面的具体示例来探索执行器如何处理具体化嵌套循环连接的计划树,以及如何估计成本。
testdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id;
QUERY PLAN
-----------------------------------------------------------------------
Nested Loop (cost=0.00..750230.50 rows=5000 width=16)
Join Filter: (a.id = b.id)
-> Seq Scan on tbl_a a (cost=0.00..145.00 rows=10000 width=8)
-> Materialize (cost=0.00..98.00 rows=5000 width=8)
-> Seq Scan on tbl_b b (cost=0.00..73.00 rows=5000 width=8)
(5 rows)
Materialize成本估算
(Materialized) Nested Loop成本估算
Indexed Nested Loop Join
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_b AS b WHERE c.id = b.id;
QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop (cost=0.29..1935.50 rows=5000 width=16)
-> Seq Scan on tbl_b b (cost=0.00..73.00 rows=5000 width=8)
-> Index Scan using tbl_c_pkey on tbl_c c (cost=0.29..0.36 rows=1 width=8)
Index Cond:(id=b.id)
(4 rows)
具有外部索引扫描的嵌套循环联接的三种变体
Merge Join连接方式
Merge Join
Merge Join成本估算
testdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND b.id < 1000;
QUERY PLAN
-------------------------------------------------------------------------
Merge Join (cost=944.71..984.71 rows=1000 width=16)
Merge Cond: (a.id = b.id)
-> Sort (cost=809.39..834.39 rows=10000 width=8)
Sort Key: a.id
-> Seq Scan on tbl_a a (cost=0.00..145.00 rows=10000 width=8)
-> Sort (cost=135.33..137.83 rows=1000 width=8)
Sort Key: b.id
-> Seq Scan on tbl_b b (cost=0.00..85.50 rows=1000 width=8)
Filter: (id < 1000)
(9 rows)
Materialized Merge Join
Other Variations
强制使用merge join
testdb=# SET enable_hashjoin TO off;
testdb=# SET enable_nestloop TO off;
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_b AS b WHERE c.id = b.id AND b.id < 1000;
QUERY PLAN
--------------------------------------------------------------------------------------
Merge Join (cost=135.61..322.11 rows=1000 width=16)
Merge Cond: (c.id = b.id)
-> Index Scan using tbl_c_pkey on tbl_c c (cost=0.29..318.29 rows=10000 width=8)
-> Sort (cost=135.33..137.83 rows=1000 width=8)
Sort Key: b.id
-> Seq Scan on tbl_b b (cost=0.00..85.50 rows=1000 width=8)
Filter: (id < 1000)
(7 rows)
materialized merge join with outer index scan
testdb=# SET enable_hashjoin TO off;
testdb=# SET enable_nestloop TO off;
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_b AS b WHERE c.id = b.id AND b.id < 4500;
QUERY PLAN
--------------------------------------------------------------------------------------
Merge Join (cost=421.84..672.09 rows=4500 width=16)
Merge Cond: (c.id = b.id)
-> Index Scan using tbl_c_pkey on tbl_c c (cost=0.29..318.29 rows=10000 width=8)
-> Materialize (cost=421.55..444.05 rows=4500 width=8)
-> Sort (cost=421.55..432.80 rows=4500 width=8)
Sort Key: b.id
-> Seq Scan on tbl_b b (cost=0.00..85.50 rows=4500 width=8)
Filter: (id < 4500)
(8 rows)
indexed merge join with outer index scan
testdb=# SET enable_hashjoin TO off;
testdb=# SET enable_nestloop TO off;
testdb=# EXPLAIN SELECT * FROM tbl_c AS c, tbl_d AS d WHERE c.id = d.id AND d.id < 1000;
QUERY PLAN
--------------------------------------------------------------------------------------
Merge Join (cost=0.57..226.07 rows=1000 width=16)
Merge Cond: (c.id = d.id)
-> Index Scan using tbl_c_pkey on tbl_c c (cost=0.29..318.29 rows=10000 width=8)
-> Index Scan using tbl_d_pkey on tbl_d d (cost=0.28..41.78 rows=1000 width=8)
Index Cond: (id < 1000)
(5 rows)
Hash Join连接方式
Hash Join
In-Memory Hash Join
构建阶段:
将内部表的所有元组插入到一个批处理中
探测阶段:
将外部表的每个元组与批处理中的内部元组进行比较,如果满足连接条件,则进行连接
Hash Join
计划器处理转变
预处理
1、计划和转换CTE(如果查询中带有with列表,则计划器通过SS_process_ctes()函数处理每个with查询)
2、向上拉子查询
根据子查询的特点,改为自然连接查询。
testdb=# SELECT * FROM tbl_a AS a, (SELECT * FROM tbl_b) as b WHERE a.id = b.id;
testdb=# SELECT * FROM tbl_a AS a, tbl_b as b WHERE a.id = b.id;
3、将外部联接转换为内部联接
优化器可用规则
Getting the Cheapest Path
1、表数量小于12张,应用动态规划得到最优的计划
2、表数量大于12张,应用遗传查询优化器
参数 geqo_threshold指定的阈值(默认值为12)
3、分为不同的级别层次来处理
多表查询连接顺序选择
SGetting the Cheapest Path of a Triple-Table Query
testdb=# SELECT * FROM tbl_a AS a, tbl_b AS b, tbl_c AS c
testdb=# WHERE a.id = b.id AND b.id = c.id AND a.data < 40;
考虑3种组合:
{tbl_a,tbl_b,tbl_c}=min({tbl_a,{tbl_b,tbl_c}},{tbl_b,{tbl_a,tbl_c}},{tbl_c,{tbl_a,tbl_b}}).
创建多表查询的计划树· 此查询的EXPLAIN命令的结果如下所示