SQL优化篇之-如何减少耗时查询的调用次数

函数调用次数与性能

在查询语句中,如果 Select 子句调用了较为耗时的函数或子查询,需要特别考虑函数调用次数对于SQL整体执行时间的影响。

一、数据准备,SQL 语句

  • 模拟较耗时的用户函数

确保执行子查询的时长是1秒。

1
2
3
4
5
6
create or replace function f001()
returns int stable language sql
as
$$
  select 1 from pg_sleep(1);
$$;
  • 模拟返回多行数据的子查询

结果集中,关联条件列含有重复值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
test=# create table t1 as select sn, id from (select generate_series(1, 3) sn),    (select generate_series(3, 8) id);
SELECT 18
test=# create table t2 as select generate_series(1, 5) id;
SELECT 5
 
test=# select * from t1;
 sn | id
----+----
  1 |  3
  2 |  3
  3 |  3
  1 |  4
  2 |  4
  3 |  4
  1 |  5
  2 |  5
  3 |  5
  1 |  6
  2 |  6
  3 |  6
  1 |  7
  2 |  7
  3 |  7
  1 |  8
  2 |  8
  3 |  8
(18 rows)
 
test=# select * from t2;
 id
----
  1
  2
  3
  4
  5
(5 rows)  

二、查询全体数据的优化方案

1、初始SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
with a as (select sn, id  from t1 )
select a.*, (select  id + f001()  sq_sum from t2 b where b.id = a.id)
from a
where 1 = 1;
 
 sn | id | sq_sum
----+----+--------
  1 |  3 |      4
  2 |  3 |      4
  3 |  3 |      4
  1 |  4 |      5
  2 |  4 |      5
  3 |  4 |      5
  1 |  5 |      6
  2 |  5 |      6
  3 |  5 |      6
  1 |  6 |
  2 |  6 |
  3 |  6 |
  1 |  7 |
  2 |  7 |
  3 |  7 |
  1 |  8 |
  2 |  8 |
  3 |  8 |
 
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..101806.05 rows=2260 width=12) (actual time=1000.638..9008.584 rows=18 loops=1)
   SubPlan 1
     ->  Seq Scan on t2 b  (cost=0.00..45.03 rows=13 width=4) (actual time=500.465..500.468 rows=0 loops=18)
           Filter: (id = t1.id)
           Rows Removed by Filter: 4
 Planning Time: 0.149 ms
 Execution Time: 9008.667 ms
(7 rows)

对于t1 表的每条记录,都要访问一次t2 ,如果 t2 表有对应的满足条件的记录,就要调用一次函数。

2、CTE

使用临时表的结果进行连接,避免循环。注意,这里CTE 有materilaized 选项,主要是把耗时的部分先执行出结果,避免与其他部分查询合并,引发多次执行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
with a as (select sn, id  from t1),
     b as materialized (select id, id + f001() sq_sum from t2 )
select a.*, (select sq_sum from b where b.id = a.id)
from a
where 1 = 1;
 
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=676.75..129868.35 rows=2260 width=12) (actual time=5004.377..5004.392 rows=18 loops=1)
   CTE b
     ->  Seq Scan on t2  (cost=0.00..676.75 rows=2540 width=8) (actual time=1000.562..5004.348 rows=5 loops=1)
   SubPlan 2
     ->  CTE Scan on b  (cost=0.00..57.15 rows=13 width=4) (actual time=166.821..278.021 rows=0 loops=18)
           Filter: (id = t1.id)
           Rows Removed by Filter: 4
 Planning Time: 0.107 ms
 Execution Time: 5004.441 ms
(9 rows)

3、LEFT JOIN 子查询

与上例一样,还是利用materialize 特性,避免了函数的多次调用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
with a as (select sn, id from t1)
select a.*, b.sq_sum
from a  left join (select id, id + f001() sq_sum  from t2 ) b on b.id = a.id
where 1 = 1 ;
 
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..86821.70 rows=28702 width=12) (actual time=3002.905..5005.627 rows=18 loops=1)
   Join Filter: (t2.id = t1.id)
   Rows Removed by Join Filter: 81
   ->  Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.007..0.011 rows=18 loops=1)
   ->  Materialize  (cost=0.00..689.45 rows=2540 width=8) (actual time=55.621..278.088 rows=5 loops=18)
         ->  Seq Scan on t2  (cost=0.00..676.75 rows=2540 width=8) (actual time=1001.176..5005.568 rows=5 loops=1)
 Planning Time: 0.180 ms
 Execution Time: 5005.650 ms
(8 rows)

三、查询局部数据(过滤主表的条件)的优化方案

1、初始SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
with a as (select sn, id from t1)
select a.*, (select  id + f001() sq_sum from  t2  b where b.id = a.id)
from a
where 1 = 1 and a.id = 3;
 
 sn | id | sq_sum
----+----+--------
  1 |  3 |      4
  2 |  3 |      4
  3 |  3 |      4
(3 rows)
 
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..533.61 rows=11 width=12) (actual time=1000.356..3002.635 rows=3 loops=1)
   Filter: (id = 3)
   Rows Removed by Filter: 15
   SubPlan 1
     ->  Seq Scan on t2 b  (cost=0.00..45.03 rows=13 width=4) (actual time=1000.859..1000.866 rows=1 loops=3)
           Filter: (id = t1.id)
           Rows Removed by Filter: 4
 Planning Time: 0.164 ms
 Execution Time: 3002.657 ms
(9 rows)

2、CTE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
with a as (select sn, id from t1),
     b as materialized (select id, id + f001() sq_sum from t2)
select a.*, (select sq_sum from b where b.id = a.id) sq_sum
from a
where 1 = 1 and a.id=3 ;
 
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=676.75..1343.65 rows=11 width=12) (actual time=5004.958..5004.965 rows=3 loops=1)
   Filter: (id = 3)
   Rows Removed by Filter: 15
   CTE b
     ->  Seq Scan on t2  (cost=0.00..676.75 rows=2540 width=8) (actual time=1001.387..5004.930 rows=5 loops=1)
   SubPlan 2
     ->  CTE Scan on b  (cost=0.00..57.15 rows=13 width=4) (actual time=1001.132..1668.316 rows=1 loops=3)
           Filter: (id = t1.id)
           Rows Removed by Filter: 4
 Planning Time: 0.105 ms
 Execution Time: 5004.983 ms
(11 rows)

3、LEFT JOIN 子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
explain analyse
with a as (select sn, id from t1)
select a.*, b.sq_sum
from a left  join (select id, id + f001() sq_sum from t2) b on b.id = a.id
where 1 = 1 and a.id = 3 ;
 
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..85.46 rows=143 width=12) (actual time=1000.972..1000.980 rows=3 loops=1)
   Join Filter: (t2.id = t1.id)
   ->  Seq Scan on t1  (cost=0.00..38.25 rows=11 width=8) (actual time=0.008..0.009 rows=3 loops=1)
         Filter: (id = 3)
         Rows Removed by Filter: 15
   ->  Materialize  (cost=0.00..45.10 rows=13 width=8) (actual time=333.654..333.656 rows=1 loops=3)
         ->  Seq Scan on t2  (cost=0.00..45.03 rows=13 width=8) (actual time=1000.959..1000.964 rows=1 loops=1)
               Filter: (id = 3)
               Rows Removed by Filter: 4
 Planning Time: 0.107 ms
 Execution Time: 1000.996 ms
(11 rows)

四、查询局部数据(过滤从表的条件)的优化方案

1、初始SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id))
select a.*, (select  id + f001()  sq_sum from  (select  generate_series(1, 5) id) b where b.id = a.id) sq_sum
from a
where 1 = 1 and sq_sum = 6 ;<br>
 sn | id | sq_sum
----+----+--------
  1 |  5 |      6
  2 |  5 |      6
  3 |  5 |      6
(3 行记录)
 
时间:6007.526 ms (00:06.008)
                                                         QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..3.43 rows=3 width=12) (actual time=4004.455..6006.681 rows=3 loops=1)
   ->  Subquery Scan on "SYSINTERNAL-4-1"  (cost=0.00..2.27 rows=1 width=4) (actual time=3003.273..3003.289 rows=1 loops=1)
         Filter: ((SubPlan 2) = 6)
         Rows Removed by Filter: 5
         ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4) (actual time=0.001..0.005 rows=6 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
         SubPlan 2
           ->  Subquery Scan on b_1  (cost=0.00..0.36 rows=1 width=4) (actual time=500.541..500.544 rows=0 loops=6)
                 Filter: (b_1.id = "SYSINTERNAL-4-1".id)
                 Rows Removed by Filter: 4
                 ->  ProjectSet  (cost=0.00..0.04 rows=5 width=4) (actual time=0.001..0.003 rows=5 loops=6)
                       ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=6)
   ->  ProjectSet  (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.006 rows=3 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
   SubPlan 1
     ->  Subquery Scan on b  (cost=0.00..0.36 rows=1 width=4) (actual time=1001.116..1001.122 rows=1 loops=3)
           Filter: (b.id = "SYSINTERNAL-4-1".id)
           Rows Removed by Filter: 4
           ->  ProjectSet  (cost=0.00..0.04 rows=5 width=4) (actual time=0.001..0.006 rows=5 loops=3)
                 ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=3)
 Planning Time: 0.121 ms
 Execution Time: 6006.709 ms
(22 行记录)
 
时间:6007.303 ms (00:06.007)

2、CTE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
explain analyse
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)),
     b as (select id, id + f001() sq_sum from generate_series(1, 5) id)
select a.*, (select sq_sum from b where b.id = a.id) sq_sum
from a
where 1 = 1  and sq_sum = 6 ;
 
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.31..2.54 rows=3 width=12) (actual time=5005.414..5005.422 rows=3 loops=1)
   CTE b
     ->  Function Scan on generate_series id  (cost=0.00..1.31 rows=5 width=8) (actual time=1001.093..5005.376 rows=5 loops=1)
   ->  Subquery Scan on "SYSINTERNAL-4-1"  (cost=0.00..0.80 rows=1 width=4) (actual time=5005.406..5005.410 rows=1 loops=1)
         Filter: ((SubPlan 3) = 6)
         Rows Removed by Filter: 5
         ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.005 rows=6 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
         SubPlan 3
           ->  CTE Scan on b b_1  (cost=0.00..0.11 rows=1 width=4) (actual time=500.540..834.232 rows=0 loops=6)
                 Filter: (id = "SYSINTERNAL-4-1".id)
                 Rows Removed by Filter: 4
   ->  ProjectSet  (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
   SubPlan 2
     ->  CTE Scan on b  (cost=0.00..0.11 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3)
           Filter: (id = "SYSINTERNAL-4-1".id)
           Rows Removed by Filter: 4
 Planning Time: 0.143 ms
 Execution Time: 5005.449 ms
(20 行记录)
 
时间:5006.115 ms (00:05.006)

3、LEFT JOIN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
explain analyse
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id))
select a.*, b.sq_sum
from a  left  join (select id, id + f001() sq_sum  from generate_series(1, 5) id) b on b.id = a.id
where 1 = 1 and sq_sum = 6 ;
 
                                                                  QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.39..1.62 rows=3 width=12) (actual time=5005.285..5005.291 rows=3 loops=1)
   ->  Hash Join  (cost=1.39..1.53 rows=1 width=8) (actual time=5005.281..5005.285 rows=1 loops=1)
         Hash Cond: ((generate_series(3, 8)) = b.id)
         ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.005 rows=6 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
         ->  Hash  (cost=1.38..1.38 rows=1 width=8) (actual time=5005.270..5005.270 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Subquery Scan on b  (cost=0.00..1.38 rows=1 width=8) (actual time=5005.264..5005.266 rows=1 loops=1)
                     Filter: (b.sq_sum = 6)
                     Rows Removed by Filter: 4
                     ->  Function Scan on generate_series id  (cost=0.00..1.31 rows=5 width=8) (actual time=1001.089..5005.254 rows=5 loops=1)
   ->  ProjectSet  (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
 Planning Time: 0.113 ms
 Execution Time: 5005.320 ms
(15 行记录)
 
时间:5005.873 ms (00:05.006)

4、LATERAL 连接 CTE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
explain analyse
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)),
     b as (select id, id + f001() sq_sum from generate_series(1, 5) id)
select a.*, b.sq_sum
from a join lateral (select * from b where b.id = a.id ) b on true
where 1 = 1  and sq_sum = 6;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.44..1.67 rows=3 width=12) (actual time=5005.492..5005.499 rows=3 loops=1)
   CTE b
     ->  Function Scan on generate_series id  (cost=0.00..1.31 rows=5 width=8) (actual time=1001.103..5005.440 rows=5 loops=1)
   ->  Hash Join  (cost=0.12..0.27 rows=1 width=8) (actual time=5005.485..5005.489 rows=1 loops=1)
         Hash Cond: ((generate_series(3, 8)) = b.id)
         ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.006 rows=6 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
         ->  Hash  (cost=0.11..0.11 rows=1 width=8) (actual time=5005.472..5005.472 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  CTE Scan on b  (cost=0.00..0.11 rows=1 width=8) (actual time=5005.457..5005.459 rows=1 loops=1)
                     Filter: (sq_sum = 6)
                     Rows Removed by Filter: 4
   ->  ProjectSet  (cost=0.00..0.03 rows=3 width=4) (actual time=0.004..0.005 rows=3 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.205 ms
 Execution Time: 5005.531 ms
(16 行记录)
 
时间:5006.765 ms (00:05.007)

5、LATERAL 连接子查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
explain analyse
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id))
select a.*, b.sq_sum
from a left join lateral (select id, id + f001() sq_sum  from generate_series(1, 5) id where id = a.id ) b on true
where 1 = 1 and sq_sum = 6 ;
   
                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2.44 rows=18 width=12) (actual time=3003.267..3003.280 rows=3 loops=1)
   ->  Nested Loop  (cost=0.00..2.15 rows=6 width=8) (actual time=3003.257..3003.268 rows=1 loops=1)
         ->  ProjectSet  (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.007 rows=6 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.002 rows=1 loops=1)
         ->  Subquery Scan on b  (cost=0.00..0.33 rows=1 width=4) (actual time=500.541..500.542 rows=0 loops=6)
               Filter: (b.sq_sum = 6)
               Rows Removed by Filter: 0
               ->  Function Scan on generate_series id  (cost=0.00..0.32 rows=1 width=8) (actual time=500.539..500.540 rows=0 loops=6)
                     Filter: (id = (generate_series(3, 8)))
                     Rows Removed by Filter: 4
   ->  Materialize  (cost=0.00..0.08 rows=3 width=4) (actual time=0.007..0.009 rows=3 loops=1)
         ->  ProjectSet  (cost=0.00..0.03 rows=3 width=4) (actual time=0.005..0.006 rows=3 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
 Planning Time: 0.125 ms
 Execution Time: 3003.306 ms
(15 行记录)
 
时间:3003.950 ms (00:03.004)

总结

  • Select 子句中,表达式会逐行运算,总时长与结果集成正比。
  • CTE子句,先计算全部结果,然后关联主表,总时长是可控。
  • 使用子查询和 LATERAL 连接,可以避免结果集的重复计算。

 

posted @   KINGBASE研究院  阅读(151)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
历史上的今天:
2021-12-15 LIKE与等式查询比较
点击右上角即可分享
微信分享提示