sql中的exists
2021-08-03 10:42 abce 阅读(1012) 评论(0) 编辑 收藏 举报刚开始工作的开发,很容易过度的使用in、not in。其实,在有些时候可以用exists、not exists来代替in和not in,实现查询性能的提升。
exists操作符时候会和in操作符产生混淆。因为他们都是用于检查某个属性是否包含在某个集合中。但是相同点也仅限于此。
exists的真正目的是检查子查询是否至少包含一条记录。例如,下面的查询会返回行1和2:
1 2 3 4 5 6 7 8 9 | WITH numbers (nr) AS ( SELECT 1 AS nr UNION ALL SELECT 2 AS nr UNION ALL SELECT 3 AS nr ), letters (letter, nr) AS ( SELECT 'A' AS letter, 1 AS nr UNION ALL SELECT 'B' AS letter, 2 AS nr ) SELECT * FROM numbers n WHERE EXISTS ( SELECT nr FROM letters WHERE nr= n.nr); |
当然,你也可以改写成in:
1 2 3 4 5 6 7 8 9 | WITH numbers (nr) AS ( SELECT 1 AS nr UNION ALL SELECT 2 AS nr UNION ALL SELECT 3 AS nr ), letters (letter, nr) AS ( SELECT 'A' AS letter, 1 AS nr UNION ALL SELECT 'B' AS letter, 2 AS nr ) SELECT * FROM numbers n WHERE n.nr IN ( SELECT nr FROM letters); |
这两种写法,都可以返回相同的记录。区别是exists会更快,因为在得到第一条满足条件的记录之后就会停止,而in会查询所有的记录(如果in返回很多行的话)。但是,要注意另一件事,exists使用的关联子查询,它可以使用left join加上一个NULL检查来替换。
执行计划
not in操作符的执行计划:
1 2 3 4 5 6 7 8 9 10 11 | =# EXPLAIN ANALYZE SELECT * FROM products WHERE id NOT IN ( SELECT product_id FROM bought_products); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on products (cost=38.25..50.50 rows =90 width=422) (actual time =0.069..0.072 rows =5 loops=1) Filter: ( NOT (hashed SubPlan 1)) Rows Removed by Filter: 2 SubPlan 1 -> Seq Scan on bought_products (cost=0.00..32.60 rows =2260 width=4) (actual time =0.006..0.009 rows =3 loops=1) Planning Time : 0.180 ms Execution Time : 0.130 ms (7 rows ) |
从执行计划可以看到,因为没有用到任何索引,对表bought_products先走的是全表扫描,然后在参与过滤。
not exists的执行计划:
1 2 3 4 5 6 7 8 9 10 11 | =# EXPLAIN ANALYZE SELECT * FROM products WHERE NOT EXISTS ( SELECT product_id FROM bought_products WHERE id = product_id); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Anti Join (cost=0.15..47.80 rows =90 width=422) (actual time =0.038..0.045 rows =5 loops=1) -> Seq Scan on products (cost=0.00..11.80 rows =180 width=422) (actual time =0.012..0.014 rows =7 loops=1) -> Index Only Scan using bought_products_pkey on bought_products (cost=0.15..0.61 rows =11 width=4) (actual time =0.003..0.003 rows =0 loops=7) Index Cond: (product_id = products.id) Heap Fetches: 2 Planning Time : 0.177 ms Execution Time : 0.080 ms (7 rows ) |
从这两个执行计划来看,最大的区别是执行时间的不同(0.130 ms vs 0.080 ms),计划时间差不多。
所以,当你有些in查询时,可以看看是否可以使用exists来替换。
再来看看exists的执行计划:
1 2 3 4 5 6 7 8 9 10 11 | =# EXPLAIN ANALYZE SELECT * FROM products WHERE EXISTS ( SELECT product_id FROM bought_products WHERE id = product_id); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Semi Join (cost=0.15..47.80 rows =90 width=422) (actual time =0.030..0.046 rows =2 loops=1) -> Seq Scan on products (cost=0.00..11.80 rows =180 width=422) (actual time =0.011..0.013 rows =7 loops=1) -> Index Only Scan using bought_products_pkey on bought_products (cost=0.15..0.61 rows =11 width=4) (actual time =0.003..0.003 rows =0 loops=7) Index Cond: (product_id = products.id) Heap Fetches: 2 Planning Time : 0.261 ms Execution Time : 0.084 ms (7 rows ) |
如果不是关联子查询呢?
1 2 3 4 5 6 7 8 9 10 11 | =# EXPLAIN ANALYZE SELECT * FROM products WHERE EXISTS ( SELECT product_id FROM bought_products); QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Result (cost=0.01..11.81 rows =180 width=422) (actual time =0.013..0.015 rows =7 loops=1) One- Time Filter: $0 InitPlan 1 ( returns $0) -> Seq Scan on bought_products (cost=0.00..32.60 rows =2260 width=0) (actual time =0.007..0.007 rows =1 loops=1) -> Seq Scan on products (cost=0.01..11.81 rows =180 width=422) (actual time =0.002..0.004 rows =7 loops=1) Planning Time : 0.077 ms Execution Time : 0.036 ms (7 rows ) |
这次,子查询使用了顺序扫描,但是计划时间只用了0.077ms。它将只返回1行,因为它是EXISTS的语义。
尽管exists和in看起来类似,但是他们是不同的。exists在子查询返回第一条满足条件的记录后就会停止。如果你的查询很慢,将in换成exists可能会提升很多性能。
此外,exists可以用left join加上null检查来取代。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
2018-08-03 Oracle 18C DBCA建库报ora-01012错误