postgresql分区表探索(pg_pathman)
使用场景
许多系统在在使用几年之后数据量不断膨胀,这个时候单表数据量超过2000w+,数据库的查询也越来越慢,而随着时间的推移许多历史数据的重要性可能逐渐下降。这时候就可以考虑使用分区表来将冷热数据分区存储。
常用的使用场景比如sql分析的日志记录,常用的分区字段有按照创建时间、省份、以及业务类型,具体使用需要结合需求
Postgresql官方的建议是单表大小超过了服务器内存大小可以考虑分区(大概的了解了下按照现代的服务器物理性能,单表大小不超过32GB,两千万记录)
分区概念
分区的概念即是将逻辑上的一张大表分割成物理上的小块,分区不仅能带来查询效率上的提升,也能给维护和管理带来方便。
说明
postgresql在9.6以前的版本就支持分区,但都是基于触发器性能并不是很好,pg10目前内置了分区但根据pg社区里的一些测试看出pg10分区性能不如pg_pathman。这里主要测试pg_pathman的range分区
安装
安装插件pg_pathman:连接
创建扩展
--创建扩展
create extension pg_pathman;
--查看扩展是否安装成功,或者\dx
select * from pg_extension
RANGE分区
需要注意的是分区的字段必须是非空,类似于案件的立案日期
和结案日期
就不能用作分区字段
--查看表数据
db_jcxxzypt=# select count(*) from db_jcxx.t_jcxxzy_tjaj;
count
----------
17507701
--添加非空约束(分区字段要非空)
db_jcxxzypt=# alter table t_jcxxzy_tjaj alter COLUMN d_slrq set not null;
--创建分区表,1700w+数据按照年份创建分区表。使用非堵塞式的迁移方法。
select create_range_partitions(
't_jcxxzy_tjaj'::regclass, --主表oid
'd_slrq', --分区字段,一定要not null约束
'2000-01-01 00:00:00'::timestamp, --开始时间
interval '1 year', --分区间隔、一年
20, --分区表数量
false -- 不立即将数据从主表迁移到分区
);
--迁移到分区表
select partition_table_concurrently('t_jcxxzy_tjaj'::regclass,
10000, --一个事务批量迁移多少记录 1-10000
1.0);
--查看后台的数据迁移任务
select * from pathman_concurrent_part_tasks;
--查看分区表
db_jcxxzypt=# \d+ db_jcxx.t_jcxxzy_tjaj
Table "db_jcxx.t_jcxxzy_tjaj"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------------+-----------+----------+---
c_bh | character(32) | not null | extended | | ID
c_xzdm | character varying(300) | | extended | | 行政代码
省略字段...
Indexes:
"t_jcxxzy_tjaj_new1_pkey" PRIMARY KEY, btree (c_bh)
"idx_jcxxzy_tjaj_ajdsrs" btree (n_ajdsrs)
"idx_ttjaj_cajly" btree (c_ajly)
"idx_ttjaj_dslrq" btree (d_slrq)
"idx_ttjaj_new1_ctwhbm" btree (c_twhbm)
"idx_ttjaj_xylx" btree (c_xylx)
Child tables: db_jcxx.t_jcxxzy_tjaj_1,
db_jcxx.t_jcxxzy_tjaj_2,
db_jcxx.t_jcxxzy_tjaj_3,
db_jcxx.t_jcxxzy_tjaj_4,
db_jcxx.t_jcxxzy_tjaj_5,
db_jcxx.t_jcxxzy_tjaj_6
Options: parallel_workers=2
--分区完成后建议禁用主表
select set_enable_parent('t_jcxxzy_tjaj'::regclass,false);
--分区表数据量
db_jcxxzypt=# select relname as tablename, reltuples::int as rowCounts from pg_class where relkind = 'r' and relname like 't_jcxxzy_tjaj%' order by rowCounts desc;
tablename | rowcounts
-----------------+-----------
t_jcxxzy_tjaj_4 | 3662374
t_jcxxzy_tjaj_2 | 3661425
t_jcxxzy_tjaj_1 | 3660449
t_jcxxzy_tjaj_3 | 3658622
t_jcxxzy_tjaj_5 | 2864830
t_jcxxzy_tjaj | 0
t_jcxxzy_tjaj_6 | 0
(7 rows)
1700w数据大概迁移了一个多小时,如果表有索引可以先删除索引,数据迁移完成后再建索引,因为在创建分区的时候,所有的分区表都会单独创建索引,这也是不能保证全局唯一的原因。
使用count计算c_xylx='02'的数据 分区vs不分区
--不分区
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=90147.38..90147.39 rows=1 width=8) (actual time=844.279..844.279 rows=1 loops=1)
-> Index Only Scan using idx_ttjaj_xylx on t_jcxxzy_tjaj (cost=0.44..82870.01 rows=2910947 width=0) (ac
tual time=0.041..569.953 rows=2916043 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
Planning time: 0.226 ms
Execution time: 844.334 ms
(6 rows)
--不分区执行时间
db_jcxxzypt=# \timing
Timing is off.
db_jcxxzypt=# select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE c_xylx = '02';
count
---------
2916043
(1 row)
Time: 543.206 ms
--分区后执行计划
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=89754.14..89754.15 rows=1 width=8) (actual time=1215.401..1215.401 rows=1 loops=1)
-> Append (cost=0.43..82510.65 rows=2897393 width=0) (actual time=0.039..942.783 rows=2916043 loops=1)
-> Index Only Scan using t_jcxxzy_tjaj_1_c_xylx_idx on t_jcxxzy_tjaj_1 (cost=0.43..17406.09 rows=611295 width=0) (actual time=0.039..127.923 rows=609209 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
-> Index Only Scan using t_jcxxzy_tjaj_2_c_xylx_idx on t_jcxxzy_tjaj_2 (cost=0.43..17105.00 rows=600718 width=0) (actual time=0.023..126.972 rows=609727 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
-> Index Only Scan using idx_ttjaj_c_xylx on t_jcxxzy_tjaj_3 (cost=0.43..16936.90 rows=594770 width=0) (actual time=0.032..124.370 rows=608945 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
-> Index Only Scan using t_jcxxzy_tjaj_4_c_xylx_idx on t_jcxxzy_tjaj_4 (cost=0.43..17313.76 rows=608076 width=0) (actual time=0.037..129.107 rows=611274 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
-> Index Only Scan using t_jcxxzy_tjaj_5_c_xylx_idx on t_jcxxzy_tjaj_5 (cost=0.43..13740.76 rows=482533 width=0) (actual time=0.037..99.022 rows=476888 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
-> Index Only Scan using i_t_jcxxzy_tjaj_h2_6 on t_jcxxzy_tjaj_6 (cost=0.12..8.14 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (c_xylx = '02'::text)
Heap Fetches: 0
Planning time: 0.948 ms
Execution time: 1215.495 ms
(22 rows)
Time: 1236.152 ms
--分区后执行时间
db_jcxxzypt=# \timing
Timing is on.
db_jcxxzypt=# select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE c_xylx = '02';
count
---------
2916043
(1 row)
Time: 592.745 ms
可以看出分区后c_xylx='02'的每个分区都存在,执行计划显示扫描了所有分区,分区后的时间和未分区的时间相差不大
按照日期范围求c_xylx='02'的数据
--未分区执行计划
--首先创建联合索引
create index i_t_jcxxzy_tjaj_h2 on t_jcxxzy_tjaj(d_slrq,c_xylx);
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE d_slrq >='2016-01-01' and d_slrq <'2016-10-31' and c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=368120.65..368120.66 rows=1 width=8) (actual time=799.274..799.274 rows=1 loops=1)
-> Bitmap Heap Scan on t_jcxxzy_tjaj (cost=18338.24..367801.05 rows=127840 width=0) (actual time=137.97
8..786.398 rows=126533 loops=1)
Recheck Cond: ((d_slrq >= '2016-01-01'::date) AND (d_slrq < '2016-10-31'::date) AND ((c_xylx)::text
= '02'::text))
Rows Removed by Index Recheck: 1490760
Heap Blocks: exact=35508 lossy=82085
-> Bitmap Index Scan on i_t_jcxxzy_tjaj_h2 (cost=0.00..18306.28 rows=127840 width=0) (actual time
=127.441..127.441 rows=126533 loops=1)
Index Cond: ((d_slrq >= '2016-01-01'::date) AND (d_slrq < '2016-10-31'::date) AND ((c_xylx)::
text = '02'::text))
Planning time: 0.383 ms
Execution time: 799.350 ms
(9 rows)
Time: 801.140 ms
--未分区执行时间
db_jcxxzypt=# select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE d_slrq >='2016-01-01' and d_slrq <'2016-10-31' and c_xylx = '02';
count
--------
126533
(1 row)
Time: 772.393 ms
--创建索引
create index i_t_jcxxzy_tjaj_h2_1 on t_jcxxzy_tjaj_1(d_slrq,c_xylx);
create index i_t_jcxxzy_tjaj_h2_2 on t_jcxxzy_tjaj_2(d_slrq,c_xylx);
create index i_t_jcxxzy_tjaj_h2_3 on t_jcxxzy_tjaj_3(d_slrq,c_xylx);
create index i_t_jcxxzy_tjaj_h2_4 on t_jcxxzy_tjaj_4(d_slrq,c_xylx);
create index i_t_jcxxzy_tjaj_h2_5 on t_jcxxzy_tjaj_5(d_slrq,c_xylx);
create index i_t_jcxxzy_tjaj_h2_6 on t_jcxxzy_tjaj_6(d_slrq,c_xylx);
--分区后执行计划
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE d_slrq >='2016-01-01' and d_slrq <'2016-10-31' and c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=17438.16..17438.17 rows=1 width=8) (actual time=106.158..106.158 rows=1 loops=1)
-> Append (cost=0.43..17120.03 rows=127253 width=0) (actual time=0.319..94.105 rows=126533 loops=1)
-> Index Only Scan using i_t_jcxxzy_tjaj_h2_5 on t_jcxxzy_tjaj_5 (cost=0.43..17120.03 rows=127253
width=0) (actual time=0.318..79.701 rows=126533 loops=1)
Index Cond: ((d_slrq < '2016-10-31'::date) AND (c_xylx = '02'::text))
Heap Fetches: 0
Planning time: 0.488 ms
Execution time: 106.216 ms
(7 rows)
Time: 107.383 ms
--此处执行计划直接判断d_slrq < '2016-10-31'而不判断d_slrq >='2016-01-01',原因是该分区表的分区约束就是d_slrq >='2016-01-01'开始
db_jcxxzypt=# \d+ t_jcxxzy_tjaj_5
Table "db_jcxx.t_jcxxzy_tjaj_5"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------------+-----------+----------+---
c_bh | character(32) | not null | extended | |
c_xzdm | character varying(300) | | extended | |
......
Check constraints:
"pathman_t_jcxxzy_tjaj_5_check" CHECK (d_slrq >= '2016-01-01'::date AND d_slrq < '2020-01-01'::date)
Inherits: t_jcxxzy_tjaj
--分区后执行时间
db_jcxxzypt=# select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE d_slrq >='2016-01-01' and d_slrq <'2016-10-31' and c_xylx = '02';
count
--------
126533
(1 row)
Time: 97.369 ms
从执行计划可以看出分区后只扫描了t_jcxxzy_tjaj_5这张表、并且使用了index only scan、时间要比不分区快很多
--跨分区的日期查询 分别从t_jcxxzy_tjaj_4、t_jcxxzy_tjaj_5两张表获取数据
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj WHERE d_slrq >='2015-12-01' and d_slrq <'2016-1-31' and c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=3458.09..3458.10 rows=1 width=8) (actual time=25.379..25.380 rows=1 loops=1)
-> Append (cost=0.43..3395.52 rows=25029 width=0) (actual time=0.119..22.684 rows=25622 loops=1)
-> Index Only Scan using i_t_jcxxzy_tjaj_h2_4 on t_jcxxzy_tjaj_4 (cost=0.43..1655.53 rows=12119 w
idth=0) (actual time=0.117..11.829 rows=13032 loops=1)
Index Cond: ((d_slrq >= '2015-12-01'::date) AND (c_xylx = '02'::text))
Heap Fetches: 0
-> Index Only Scan using i_t_jcxxzy_tjaj_h2_5 on t_jcxxzy_tjaj_5 (cost=0.43..1739.99 rows=12910 w
idth=0) (actual time=0.184..7.693 rows=12590 loops=1)
Index Cond: ((d_slrq < '2016-01-31'::date) AND (c_xylx = '02'::text))
Heap Fetches: 0
Planning time: 5.857 ms
Execution time: 25.461 ms
(10 rows)
Time: 32.039 ms
获取日期分区扫描了t_jcxxzy_tjaj_4、和t_jcxxzy_tjaj_5来统计d_slrq >='2015-12-01' and d_slrq <'2016-1-31'
再有日期范围条件下,可以只扫描分区表t_jcxxzy_tjaj_5来获取数据,使用分区表时,每个表的索引是独立的,每个分区表的索引都只针对一个小的分区表。分区的效率要比未分区高很多
sum()、avg()、group by 对比
--未分区
db_jcxxzypt=# select count(n_ajdsrs),n_ajdsrs from t_jcxxzy_tjaj group by n_ajdsrs;
count | n_ajdsrs
---------+----------
4378357 | 0
4377009 | 1
4374162 | 2
4378172 | 3
(4 rows)
Time: 4770.810 ms
db_jcxxzypt=# select sum(n_ajdsrs) from t_jcxxzy_tjaj
;
sum
----------
26259849
(1 row)
Time: 4059.588 ms
db_jcxxzypt=# select avg(n_ajdsrs) from t_jcxxzy_tjaj
;
avg
--------------------
1.4999028427491904
(1 row)
Time: 4098.815 ms
--分区后
db_jcxxzypt=# select count(n_ajdsrs),n_ajdsrs from t_jcxxzy_tjaj group by n_ajdsrs;
count | n_ajdsrs
---------+----------
4378357 | 0
4377009 | 1
4374162 | 2
4378172 | 3
(4 rows)
Time: 4050.820 ms
db_jcxxzypt=# select sum(n_ajdsrs) from t_jcxxzy_tjaj;
sum
----------
26259849
(1 row)
Time: 2543.786 ms
db_jcxxzypt=# select avg(n_ajdsrs) from t_jcxxzy_tjaj;
avg
--------------------
1.4999028427491904
(1 row)
Time: 2727.279 ms
RANGE分区效率对比
针对t_jcxxzy_tjaj表的1750w数据range分区后,按照分区数,查询效率对比
查询方式 | 未分区 | 5分区(平均360w) | 20分区(平均90w) |
---|---|---|---|
c_xylx = '02' | 543.206 ms | 599.155 ms | 612.299 ms |
d_slrq+c_xylx = '02' | 772.393 ms | 97.369 ms | 77.807 ms |
group by n_ajdsrs | 4976.328 ms | 4770.810 ms | 4107.329 ms |
avg(n_ajdsrs) | 4098.815 ms | 2727.279 ms | 2643.653 ms |
sum(n_ajdsrs) | 4059.588 ms | 2543.786ms | 2535.021 ms |
5分区和20分区的区别不大,而针对c_xylx='02'的所有分区扫描和不分区的效率相差不大,但是针对分区键的查询效率上非常明显,一些聚合函数的效率也要高。
单独查询分区表
--只查询分区表t_jcxxzy_tjaj_5
db_jcxxzypt=# explain analyze select count(*) from db_jcxx.t_jcxxzy_tjaj_5 WHERE d_slrq >='2017-01-01' and d_slrq <'2017-1-31' and c_xylx = '02';
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=4456.74..4456.75 rows=1 width=8) (actual time=29.910..29.911 rows=1 loops=1)
-> Index Only Scan using i_t_jcxxzy_tjaj_h2_5 on t_jcxxzy_tjaj_5 (cost=0.43..4383.39 rows=29342 width=0
) (actual time=0.157..26.854 rows=29497 loops=1)
Index Cond: ((d_slrq >= '2017-01-01'::date) AND (d_slrq < '2017-01-31'::date) AND (c_xylx = '02'::t
ext))
Heap Fetches: 0
Planning time: 0.272 ms
Execution time: 29.969 ms
(6 rows)
Time: 30.910 ms
分区表也可以单独使用
常用的函数接口
--数据迁移完成后,建议禁用主表,这样执行计划就不会出现主表了。实际测试如果不禁用主表可能大部分的扫描时间都在主表。
select set_enable_parent('t_jcxxzy_tjaj'::regclass,false);
--新增分区(向后扩展),新增分区是在原来的基础上扩展
db_jcxxzypt=# select append_range_partition('db_jcxx.t_jcxxzy_tjaj'::regclass);
append_range_partition
------------------------
t_jcxxzy_tjaj_9
(1 row)
--新增分区(向前添加)
db_jcxxzypt=# select prepend_range_partition('t_jcxxzy_tjaj'::regclass);
prepend_range_partition
-------------------------
t_jcxxzy_tjaj_11
(1 row)
db_jcxxzypt=# \d+ t_jcxxzy_tjaj_11
Table "db_jcxx.t_jcxxzy_tjaj_11"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------------+-----------+----------+---
c_bh | character(32) | not null | extended | |
--省略了部分字段和索引...
Check constraints:
"pathman_t_jcxxzy_tjaj_11_check" CHECK (d_slrq >= '1996-01-01'::date AND d_slrq < '2000-01-01'::date)
Inherits: t_jcxxzy_tjaj
--删除单个范围分区,false表示分区数据迁移到主表
db_jcxxzypt=# select drop_range_partition('t_jcxxzy_tjaj_11',false);
NOTICE: 0 rows copied from t_jcxxzy_tjaj_11
drop_range_partition
----------------------
t_jcxxzy_tjaj_11
(1 row)
-- 删除所有分区表,并将数据迁移到主表。false表示分区数据迁移到主表
select drop_partitions('t_jcxxzy_tjaj_7'::regclass, false);
--合并分区,必须为相邻分区
select merge_range_partitions('t_jcxxzy_tjaj_10':: REGCLASS, 't_jcxxzy_tjaj_11' ::REGCLASS)
--分裂范围分区,将分区表分裂为两个分区,仅支持范围分区表
select split_range_partition('t_jcxxzy_tjaj_6'::REGCLASS, -- 分区oid
'2022-01-01 00:00:00'::timestamp, -- 分裂值
't_jcxxzy_tjaj_6_1')
--自动扩展分区表
select set_auto('t_jcxxzy_tjaj'::REGCLASS, true)
--插入受理日期为2100-05-19这条数据
db_jcxxzypt=# INSERT INTO "db_jcxx"."t_jcxxzy_tjaj" ("c_bh", "d_slrq") VALUES ('7be7f21958e248a1b69a140f1151d4f4', '2100-05-19');
INSERT 0 1
db_jcxxzypt=# \d+ t_jcxxzy_tjaj
Table "db_jcxx.t_jcxxzy_tjaj"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+--------------------------------+-----------+----------+---
c_bh | character(32) | not null | extended | | ID
--省略字段...
Child tables: t_jcxxzy_tjaj_1,
t_jcxxzy_tjaj_12,
t_jcxxzy_tjaj_13,
t_jcxxzy_tjaj_14,
t_jcxxzy_tjaj_15,
t_jcxxzy_tjaj_16,
t_jcxxzy_tjaj_17,
t_jcxxzy_tjaj_18,
t_jcxxzy_tjaj_19,
t_jcxxzy_tjaj_2,
t_jcxxzy_tjaj_20,
t_jcxxzy_tjaj_21,
t_jcxxzy_tjaj_22,
t_jcxxzy_tjaj_23,
t_jcxxzy_tjaj_24,
t_jcxxzy_tjaj_25,
t_jcxxzy_tjaj_26,
t_jcxxzy_tjaj_27,
t_jcxxzy_tjaj_28,
t_jcxxzy_tjaj_29,
t_jcxxzy_tjaj_3,
t_jcxxzy_tjaj_4,
t_jcxxzy_tjaj_5,
t_jcxxzy_tjaj_6,
t_jcxxzy_tjaj_6_1,
t_jcxxzy_tjaj_9
Options: parallel_workers=2
发现在原来t_jcxxzy_tjaj_11的自处上自动创建了许多扩展表、意思是他会根据插入数据的日期取匹配一直创建。如果有脏数据那么就会创建许多扩展、所以不建议打开
不建议打开自动扩展表,如果有脏数据那么会一直创建多个分区表。可以使用定时任务定时的来创建分区表。
解除分区表与主表的关系、删除分区表
--解除分区表和主表关系
db_jcxxzypt=# ALTER TABLE t_jcxxzy_tjaj_30 NO INHERIT t_jcxxzy_tjaj;
ALTER TABLE
Time: 2.922 ms
解除关系后该表还是存在、可以单独使用
--删除分区表
DROP TABLE t_jcxxzy_tjaj_30;
如果分区表的数据已经过期需要删除,直接删除分区表即可,比delete更快,因为delete只是将数据标记为删除,还需要vacuum。
结语
1.针对已经存在的表进行分区,最好将数据迁移完后在建索引
2.如果数据表已经存在,建议先建立分区表然后使用非堵塞式的迁移接口
3.如果要充分使用分区表的查询优势,必须使用分区时的字段作为过滤条件
4.需要注意分区后就没有全局唯一性了,各个分区之间是可以有重复的uuid
5.对于分区键条件查询,效率非常高
6.分区的字段必须是非空,类似于案件的立案日期
和结案日期
就不能用作分区字段
7.VACUUM或ANALYZE t_jcxxzy_tjaj只会对主表起作用,要想分析表,需要分别分析每个分区表。
8.分区的备份可以单独备份各个分区,但是如果要别分所有分区只能备份整个schema
9.数据迁移到分区表后建议禁用主表,如果主表未执行vacuum操作,那么执行计划会全表扫描主表,非常耗时。