PostgreSQL中的not in、not exists、left join/is null
2021-01-04 16:44 abce 阅读(4272) 评论(0) 编辑 收藏 举报哪种方法是找出在一张表中,而不在另一张表中的记录的最佳方法呢?
1 2 3 4 5 6 | SELECT l.* FROM t_left l LEFT JOIN t_right r ON r.value = l.value WHERE r.value IS NULL ; |
1 2 3 4 5 6 7 | SELECT l.* FROM t_left l WHERE l.value NOT IN ( SELECT value FROM t_right r ); |
1 2 3 4 5 6 7 8 | SELECT l.* FROM t_left l WHERE NOT EXISTS ( SELECT NULL FROM t_right r WHERE r.value = l.value ); |
我们先把环境准备一下:
postgres 11.9
1 2 3 4 5 6 7 8 9 10 | CREATE TABLE t_left ( id INT NOT NULL PRIMARY KEY , value INT NOT NULL , stuffing VARCHAR (200) NOT NULL ); CREATE TABLE t_right ( id INT NOT NULL PRIMARY KEY , value INT NOT NULL , stuffing VARCHAR (200) NOT NULL ); |
1 2 | CREATE INDEX ix_left_value ON t_left (value); CREATE INDEX ix_right_value ON t_right (value); |
1 2 3 4 | INSERT INTO t_left SELECT s, s % 10000, RPAD( 'Value ' || s || ' ' , 200, '*' ) FROM generate_series(1, 100000) s; |
1 2 3 4 | INSERT INTO t_right SELECT s, s % 10000 + 1, RPAD( 'Value ' || s || ' ' , 200, '*' ) FROM generate_series(1, 1000000) s; |
1 2 | ANALYZE t_left; ANALYZE t_right; |
表t_left包含100000行记录,其中有1000个不同的记录;
只是测试
表t_right包含100000行记录,其中有1000个不同的记录。
t_left表有10条记录不存在与t_right表中。
NOT IN
先看看not in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | postgres=# SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE value NOT IN postgres-# ( postgres(# SELECT value postgres(# FROM t_right r postgres(# ); id | value --------+------- 20000 | 0 10000 | 0 60000 | 0 30000 | 0 40000 | 0 50000 | 0 70000 | 0 80000 | 0 90000 | 0 100000 | 0 (10 rows ) postgres=# explain analyze SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE value NOT IN postgres-# ( postgres(# SELECT value postgres(# FROM t_right r postgres(# ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..1520921531.00 rows =50000 width=8) (actual time =7329.726..40554.033 rows =10 loops=1) Workers Planned: 1 Workers Launched: 1 -> Parallel Seq Scan on t_left l (cost=0.00..1520915531.00 rows =29412 width=8) (actual time =5506.368..40405.539 rows =5 loops=2) Filter: ( NOT (SubPlan 1)) Rows Removed by Filter: 49995 SubPlan 1 -> Materialize (cost=0.00..49211.00 rows =1000000 width=4) (actual time =0.001..0.534 rows =4084 loops=100000) -> Seq Scan on t_right r (cost=0.00..40304.00 rows =1000000 width=4) (actual time =0.002..0.575 rows =3739 loops=37433) Planning Time : 0.155 ms Execution Time : 40557.112 ms (11 rows ) postgres=# |
这里之所以先讨论not in,是因为其对待null的方式特殊。
不幸的是,postgresql优化器不能利用到t_right.value被定义为not null的事实。因此,不可以返回null值。(即not in不能返回null值)
这里可以做一个小测试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | postgres=# create table aa(id int ,age int ); postgres=# insert into aa values (1,1); postgres=# insert into aa values (2,2); postgres=# insert into aa(id) values (3); postgres=# select * from aa where age not in (1,2); id | age ----+----- (0 rows ) postgres=# select * from aa where age not in (1); id | age ----+----- 2 | 2 (1 row) postgres=# |
这就是为什么postgresql对not in查询使用一个特殊的访问方法。
NOT EXISTS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | postgres=# SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE NOT EXISTS postgres-# ( postgres(# SELECT select postgres(# FROM t_right r postgres(# WHERE r.value = l.value postgres(# ); id | value --------+------- 10000 | 0 30000 | 0 20000 | 0 40000 | 0 60000 | 0 50000 | 0 80000 | 0 70000 | 0 90000 | 0 100000 | 0 (10 rows ) postgres=# explain analyze SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE NOT EXISTS postgres-# ( postgres(# SELECT value postgres(# FROM t_right r postgres(# WHERE r.value = l.value postgres(# ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.42..34410.22 rows =10 width=8) (actual time =31.555..262.307 rows =10 loops=1) Workers Planned: 1 Workers Launched: 1 -> Nested Loop Anti Join (cost=0.42..33409.22 rows =6 width=8) (actual time =43.314..257.792 rows =5 loops=2) -> Parallel Seq Scan on t_left l (cost=0.00..3619.24 rows =58824 width=8) (actual time =0.007..10.066 rows =50000 loops=2) -> Index Only Scan using ix_right_value on t_right r (cost=0.42..3.53 rows =100 width=4) (actual time =0.005..0.005 rows =1 loops=100000) Index Cond: (value = l.value) Heap Fetches: 99990 Planning Time : 0.328 ms Execution Time : 262.348 ms (10 rows ) postgres=# |
LEFT JOIN / IS NULL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | postgres=# SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE NOT EXISTS postgres-# ( postgres(# SELECT value postgres(# FROM t_right r postgres(# WHERE r.value = l.value postgres(# ); id | value --------+------- 80000 | 0 10000 | 0 20000 | 0 30000 | 0 40000 | 0 50000 | 0 60000 | 0 70000 | 0 90000 | 0 100000 | 0 (10 rows ) postgres=# explain analyze SELECT l.id, l.value postgres-# FROM t_left l postgres-# WHERE NOT EXISTS postgres-# ( postgres(# SELECT value postgres(# FROM t_right r postgres(# WHERE r.value = l.value postgres(# ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.42..34410.22 rows =10 width=8) (actual time =32.109..259.553 rows =10 loops=1) Workers Planned: 1 Workers Launched: 1 -> Nested Loop Anti Join (cost=0.42..33409.22 rows =6 width=8) (actual time =43.744..255.254 rows =5 loops=2) -> Parallel Seq Scan on t_left l (cost=0.00..3619.24 rows =58824 width=8) (actual time =0.008..9.777 rows =50000 loops=2) -> Index Only Scan using ix_right_value on t_right r (cost=0.42..3.53 rows =100 width=4) (actual time =0.005..0.005 rows =1 loops=100000) Index Cond: (value = l.value) Heap Fetches: 99990 Planning Time : 0.396 ms Execution Time : 259.608 ms (10 rows ) postgres=# |
postgresql按照相同的方式对待left join和not exists,使用相同的执行计划(nested loop anti join)。
至于NOT IN,这在语义上是不同的, PostgreSQL试图考虑这一点,并限制自己对子计划使用过滤器。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)