递归查询两种写法与性能差异
对于递归查询,KingbaseES 用户可以选择使用connect by ,或者使用 with recursive 。下面,我们举例来看下二者的差别。
一、构造数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | create table test_recursive(id integer,pid integer,name varchar,description text); insert into test_recursive(id,name,description) select generate_series( 1 , 100000 ), 'a' ||generate_series( 1 , 100000 ),repeat( 'desc' , 500 ); update test_recursive set pid= 1 where id between 2 and 10 ; update test_recursive set pid=mod(id, 9 )+ 2 where id between 11 and 100 ; update test_recursive set pid=mod(id, 90 )+ 11 where id between 101 and 1000 ; update test_recursive set pid=mod(id, 900 )+ 101 where id between 1001 and 10000 ; update test_recursive set pid=mod(id, 9000 )+ 1001 where id between 10001 and 100000 ; create table test_recursive_random(id integer,pid integer,name varchar,description text); insert into test_recursive_random select * from test_recursive order by random; create index ind_test_recursive_random_id on test_recursive_random(id); create index ind_test_recursive_random_pid on test_recursive_random(pid); vacuum full test_recursive_random; analyze test_recursive_random; |
本例子构造了5层的数据,有排序与非排序两种数据。
二、使用Connect By
connect by的查询性能:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | test=# explain analyze select id,pid, name from test_recursive_random start with id=1 connect by prior id = pid ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Recursive Union (cost=0.29..427.33 rows =101 width=14) (actual time =0.061..574.857 rows =100000 loops=1) -> Index Scan using ind_test_recursive_random_id on test_recursive_random (cost=0.29..8.31 rows =1 width=14) (actual time =0.019..0.020 rows =1 loops=1) Index Cond: (id = 1) -> Nested Loop (cost=4.50..41.80 rows =10 width=14) (actual time =0.003..0.004 rows =1 loops=100000) -> WorkTable Scan on "connect" (cost=0.00..0.02 rows =1 width=4) (actual time =0.000..0.000 rows =1 loops=100000) -> Bitmap Heap Scan on test_recursive_random (cost=4.50..41.68 rows =10 width=14) (actual time =0.002..0.003 rows =1 loops=100000) Recheck Cond: (pid = ( PRIOR test_recursive_random.id)) Heap Blocks: exact=99583 -> Bitmap Index Scan on ind_test_recursive_random_pid (cost=0.00..4.49 rows =10 width=0) (actual time =0.002..0.002 rows =1 loops=100000) Index Cond: (pid = ( PRIOR test_recursive_random.id)) Planning Time : 0.143 ms Execution Time : 578.736 ms (12 rows ) |
三、使用 With Recursive
1、with recursive 采用nestloop 时的性能
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | test=# explain analyze with recursive tmp1 as ( test(# select id,pid, name from test_recursive_random where id=1 test(# union all test(# select a.id,a.pid,a. name from test_recursive_random a inner join tmp1 b on a.pid=b.id ) test-# select * from tmp1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on tmp1 (cost=4208.44..4228.46 rows =1001 width=40) (actual time =0.037..478.930 rows =100000 loops=1) CTE tmp1 -> Recursive Union (cost=0.29..4208.44 rows =1001 width=14) (actual time =0.034..437.604 rows =100000 loops=1) -> Index Scan using ind_test_recursive_random_id on test_recursive_random (cost=0.29..8.31 rows =1 width=14) (actual time =0.033..0.034 rows =1 loops=1) Index Cond: (id = 1) -> Nested Loop (cost=4.50..418.01 rows =100 width=14) (actual time =39.048..69.125 rows =16666 loops=6) -> WorkTable Scan on tmp1 b (cost=0.00..0.20 rows =10 width=4) (actual time =0.004..2.447 rows =16667 loops=6) -> Bitmap Heap Scan on test_recursive_random a (cost=4.50..41.68 rows =10 width=14) (actual time =0.002..0.003 rows =1 loops=100000) Recheck Cond: (pid = b.id) Heap Blocks: exact=99583 -> Bitmap Index Scan on ind_test_recursive_random_pid (cost=0.00..4.49 rows =10 width=0) (actual time =0.002..0.002 rows =1 loops=100000) Index Cond: (pid = b.id) Planning Time : 0.148 ms Execution Time : 484.934 ms (14 rows ) |
2、使用hash join:260ms
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | test=# explain analyze with recursive tmp1 as ( test(# select id,pid, name from test_recursive_random where id=1 test(# union all test(# select a.id,a.pid,a. name from test_recursive_random a inner join tmp1 b on a.pid=b.id ) test-# select * from tmp1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on tmp1 (cost=24211.58..24231.60 rows =1001 width=40) (actual time =0.022..272.399 rows =100000 loops=1) CTE tmp1 -> Recursive Union (cost=0.29..24211.58 rows =1001 width=14) (actual time =0.020..230.228 rows =100000 loops=1) -> Index Scan using ind_test_recursive_random_id on test_recursive_random (cost=0.29..8.31 rows =1 width=14) (actual time =0.018..0.019 rows =1 loops=1) Index Cond: (id = 1) -> Hash Join (cost=0.33..2418.32 rows =100 width=14) (actual time =15.254..34.548 rows =16666 loops=6) Hash Cond: (a.pid = b.id) -> Seq Scan on test_recursive_random a (cost=0.00..2042.00 rows =100000 width=14) (actual time =0.005..8.708 rows =100000 loops=6) -> Hash (cost=0.20..0.20 rows =10 width=4) (actual time =6.263..6.263 rows =16667 loops=6) Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3073kB -> WorkTable Scan on tmp1 b (cost=0.00..0.20 rows =10 width=4) (actual time =0.003..2.379 rows =16667 loops=6) Planning Time : 0.159 ms Execution Time : 277.910 ms (13 rows ) |
四、执行计划差异分析
- Connect by 查询执行逻辑:查询是通过 pid = prior id ,也就是将前条记录的 id 作为值,传给 pid 进行索引扫描。逻辑上可以看做是逐个分支查询,上个分支查询结束,再进行下个分支扫描。loop = 100000,就是表示针对每条记录,都要访问一次索引。只能使用nestloop。
- With recursive 查询逻辑:是按层次查询,上层结果都返回后,再执行下层查询。每层可以根据所有ctid进行排序,也就是 Bitmap Index Scan,将所有ctid都返回,排序,再访问表,效率提高。另外,由于是每层数据返回后,再去关联查找下层数据,可以使用hash join,提升访问效率。 rows=16666 loop = 6,表示需要访问6个批次,每次平均 16666 条记录。
五、Oracle Connect By 查询性能
1、执行计划一:针对每条记录进行一次连接
相当于KingbaseES 的connect by 操作。
以下是同样数据量的情况下,Oracle connect by 查询的性能:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | SQL> select id,pid, name from test_recursive start with id=1 connect by prior id = pid ; 100000 rows selected. Elapsed: 00:00:00.98 Execution Plan ---------------------------------------------------------- Plan hash value: 2099392185 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 12 | 384 | 18 (12)| 00:00:01 | |* 1 | CONNECT BY WITH FILTERING | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | TEST_RECURSIVE | 1 | 32 | 2 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | IND_TEST_RECURSIVE_ID | 1 | | 1 (0)| 00:00:01 | | 4 | NESTED LOOPS | | 11 | 495 | 14 (0)| 00:00:01 | | 5 | CONNECT BY PUMP | | | | | | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST_RECURSIVE | 11 | 352 | 12 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | IND_TEST_RECURSIVE_PID | 11 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access( "PID" = PRIOR "ID" ) 3 - access( "ID" =1) 7 - access( "connect$_by$_pump$_002" . "prior id " = "PID" ) Note ----- - dynamic statistics used: dynamic sampling ( level =2) - this is an adaptive plan Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 101983 consistent gets 0 physical reads 0 redo size 2337649 bytes sent via SQL*Net to client 73769 bytes received via SQL*Net from client 6668 SQL*Net roundtrips to / from client 8 sorts (memory) 0 sorts (disk) 100000 rows processed |
2、执行计划二:以每层为单位,进行一次连接
相当于KingbaseES 的 with recursive 操作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | Execution Plan ---------------------------------------------------------- Plan hash value: 2244445759 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 452 | 34804 | 6221 (1)| 00:00:01 | |* 1 | CONNECT BY WITH FILTERING| | | | | | |* 2 | TABLE ACCESS FULL | TEST_RECURSIVE | 84 | 6468 | 2073 (1)| 00:00:01 | |* 3 | HASH JOIN | | 368 | 33120 | 4146 (1)| 00:00:01 | | 4 | CONNECT BY PUMP | | | | | | | 5 | TABLE ACCESS FULL | TEST_RECURSIVE | 87690 | 6593K| 2073 (1)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access( "PID" = PRIOR "ID" ) 2 - filter( "ID" =1) 3 - access( "connect$_by$_pump$_002" . "prior id " = "PID" ) Statistics ---------------------------------------------------------- 41 recursive calls 0 db block gets 53776 consistent gets 75 physical reads 0 redo size 3026540 bytes sent via SQL*Net to client 73798 bytes received via SQL*Net from client 6668 SQL*Net roundtrips to / from client 12 sorts (memory) 0 sorts (disk) 100000 rows processed |
从 consistent gets 可以大致算出来,表的大小 7652 * (6+1)层 。
KINGBASE研究院
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!