PostgreSQL 与 Oracle 访问分区表执行计划差异

熟悉Oracle 的DBA都知道,Oracle 访问分区表时,对于没有提供分区条件的,也就是在无法使用分区剪枝情况下,优化器会根据全局的统计信息制定执行计划,该执行计划针对所有分区适用。在分析该方法利弊之前,我们先来看个例子,以确保对分区表的执行计划有所了解:

一、Oracle 

构建数据:

复制代码
create table part_tab01(part_key char(1),state char(1),desc_content varchar(4000))
partition by range(part_key)
(
  partition part_0 values less than(1),
  partition part_1 values less than(2)
);

insert into part_tab01 select '0','0',rpad('a',1000,'a') from dba_objects where rownum<10001;
insert into part_tab01 select '1','1',rpad('a',1000,'a') from dba_objects where rownum<10001;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select * from part_tab01;
insert into part_tab01 select '1','0',rpad('a',1000,'a') from dba_objects where rownum<11;
insert into part_tab01 select '0','1',rpad('a',1000,'a') from dba_objects where rownum<11;

create index idx_part_tab01_state on part_tab01(state) local;
复制代码

从数据的分布可以得出结论,最优的访问方法应该是:对于不同的分区、访问不同的state 值,应采用不同的表访问方法。

实际Oracle 执行计划如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
SQL> select * from part_tab01 where state='1';
 
640010 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4116343635
 
--------------------------------------------------------------------------------------------------
| Id  | Operation           | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |            |   640K|   613M| 49576   (1)| 00:00:02 |       |       |
|   1 |  PARTITION RANGE ALL|            |   640K|   613M| 49576   (1)| 00:00:02 |     1 |     2 |
|*  2 |   TABLE ACCESS FULL | PART_TAB01 |   640K|   613M| 49576   (1)| 00:00:02 |     1 |     2 |
--------------------------------------------------------------------------------------------------
<br>结论:在没有指定分区键的情况下,根据全局的统计信息,采用全表访问。<br>
SQL> select * from part_tab01 where state='1' and part_key='0';
 
10 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1952449058
 
-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                      |    10 | 10050 |     5   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |                      |    10 | 10050 |     5   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PART_TAB01           |    10 | 10050 |     5   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | IDX_PART_TAB01_STATE |    10 |       |     3   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------------------------------------------
 
SQL> select * from part_tab01 where state='1' and part_key='1';
 
640000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4278184147
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |   640K|   613M| 24793   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|            |   640K|   613M| 24793   (1)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS FULL    | PART_TAB01 |   640K|   613M| 24793   (1)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------------<br><br>结论:在提供分区条件的情况下,优化器可以根据不同的分区统计信息给出不同的执行计划。

可以看到,在没有分区条件的情况下,Oracle 是针对全表采用统一的执行。实际针对该SQL,最好的访问方法应该是:part_0 全表,part_1 索引

二、PostgreSQL 执行计划

构建数据:

复制代码
create table part_tab01(part_key char(1),state char(1),desc_content text)
partition by range(part_key)
(
  partition part_0 values less than(1),
  partition part_1 values less than(2)
);

insert into part_tab01 select '0','0',repeat('a',1000) from generate_series(1,1000000);
insert into part_tab01 select '0','1',repeat('b',1000) from generate_series(1,10);
insert into part_tab01 select '1','1',repeat('a',1000) from generate_series(1,1000000);
insert into part_tab01 select '1','0',repeat('b',1000) from generate_series(1,10);

create index idx_part_tab01_state on part_tab01(state);
复制代码

 执行计划:即使没有提供分区条,针对不同分区,有不同的执行计划。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
test=# explain analyze select * from part_tab01 where state='1';
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..160363.43 rows=1000000 width=1008) (actual time=0.022..484.005 rows=1000010 loops=1)
   ->  Index Scan using part_tab01_part_0_state_idx on part_tab01_part_0  (cost=0.42..4.44 rows=1 width=1008) (actual time=0.022..0.024 rows=10 loops=1)
         Index Cond: (state = '1'::bpchar)
   ->  Seq Scan on part_tab01_part_1  (cost=0.00..155358.99 rows=999999 width=1008) (actual time=0.011..424.713 rows=1000000 loops=1)
         Filter: (state = '1'::bpchar)
         Rows Removed by Filter: 10
 Planning Time: 0.293 ms
 Execution Time: 515.549 ms
(8 rows)
 
test=# explain analyze select * from part_tab01 where state='0';
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..160363.68 rows=1000014 width=1008) (actual time=0.022..517.127 rows=1000010 loops=1)
   ->  Seq Scan on part_tab01_part_0  (cost=0.00..155359.16 rows=1000013 width=1008) (actual time=0.022..451.523 rows=1000000 loops=1)
         Filter: (state = '0'::bpchar)
         Rows Removed by Filter: 10
   ->  Index Scan using part_tab01_part_1_state_idx on part_tab01_part_1  (cost=0.42..4.44 rows=1 width=1008) (actual time=0.032..0.035 rows=10 loops=1)
         Index Cond: (state = '0'::bpchar)
 Planning Time: 0.090 ms
 Execution Time: 547.486 ms
(8 rows)

三、结论

从本例可以看出,在不同分区数据分布不同的场景下,PostgreSQL针对不同分区有独立的执行计划是更优方法。现实中典型的场景,如:按时间分区的工单表,历史分区可能大部分工单是结束状态,而当前分区工单可能大部分是非结束状态,因此,针对历史与当前分区,可能需要不同的执行计划。

结论:PostgreSQL 会针对不同的分区制定不同的执行计划,执行计划可能更合理,但是,由于需要读取每个分区的统计数据,不可避免对于执行计划的生成有影响,特别是在分区数量非常多的情况下,生成执行计划的效率就非常低。

posted @   KINGBASE研究院  阅读(176)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!
历史上的今天:
2021-09-06 KingbaseES 客户端工具安装
2021-09-06 KingbaseFlySync V1R6 管控平台Linux命令行安装
点击右上角即可分享
微信分享提示