pg_pathman 的初步使用
os: centos 7.4
postgresql: 9.6
pg_pathman是postgresql管理分区插件,postgresql 9.6、10 的内置分区管理也一直都在完善。使用哪种方式来管理,用户自己决定。不过pg_pathman 确实很方便。
由于pg_pathman使用了custom scan provider api,所以只支持PostgreSQL 9.5以及以上的版本
The pg_pathman module provides optimized partitioning mechanism and functions to manage partitions.
The extension is compatible with:
PostgreSQL 9.5, 9.6, 10;
Postgres Pro Standard 9.5, 9.6;
Postgres Pro Enterprise;
PostgreSQL <= 10 supports partitioning via table inheritance: each partition must be created as a child table with CHECK CONSTRAINT:
CREATE TABLE test (id SERIAL PRIMARY KEY, title TEXT);
CREATE TABLE test_1 (CHECK ( id >= 100 AND id < 200 )) INHERITS (test);
CREATE TABLE test_2 (CHECK ( id >= 200 AND id < 300 )) INHERITS (test);
PostgreSQL 10 provides native partitioning:
CREATE TABLE test(id int4, value text) PARTITION BY RANGE(id);
CREATE TABLE test_1 PARTITION OF test FOR VALUES FROM (1) TO (10);
CREATE TABLE test_2 PARTITION OF test FOR VALUES FROM (10) TO (20);
下载
$ git clone https://github.com/postgrespro/pg_pathman.git
$ git branch -a
$ git checkout PGPRO
安装
export PGHOME=/usr/pgsql-9.6
export PGDATA=/var/lib/pgsql/9.6/main
export MANPATH=$PGHOME/share/man:$MANPATH
export LD_LIBRARY_PATH=$PGHOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH
$ make USE_PGXS=1
$ make install USE_PGXS=1
$ ls -l /usr/pgsql-9.6/lib/ |grep -i path
-rwxr-xr-x 1 postgres postgres 253688 Jul 3 17:02 pg_pathman.so
$ ls -l /usr/pgsql-9.6/share/extension/ |grep -i path
-rw-r--r-- 1 postgres postgres 55882 Jul 3 17:02 pg_pathman--1.0--1.1.sql
-rw-r--r-- 1 postgres postgres 36973 Jul 3 17:02 pg_pathman--1.1--1.2.sql
-rw-r--r-- 1 postgres postgres 29154 Jul 3 17:02 pg_pathman--1.2--1.3.sql
-rw-r--r-- 1 postgres postgres 43021 Jul 3 17:02 pg_pathman--1.3--1.4.sql
-rw-r--r-- 1 postgres postgres 57125 Jul 3 17:02 pg_pathman--1.4.sql
-rw-r--r-- 1 postgres postgres 131 Jul 3 17:02 pg_pathman.control
$ vi postgresql.conf
shared_preload_libraries = 'pg_pathman, pg_stat_statements'
$ psql
postgres=# \c peiybdb
You are now connected to database "peiybdb" as user "postgres".
peiybdb=# show shared_preload_libraries;
shared_preload_libraries
--------------------------------
pg_pathman, pg_stat_statements
(1 row)
peiybdb=# select * from pg_available_extensions where name like '%path%';
name | default_version | installed_version | comment
------------+-----------------+-------------------+----------------------------------
pg_pathman | 1.4 | 1.4 | Partitioning tool for PostgreSQL
(1 row)
peiybdb=# CREATE EXTENSION pg_pathman;
CREATE EXTENSION
peiybdb=# select * from pg_extension where 1=1 and extname like '%path%';
extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
------------+----------+--------------+----------------+------------+---------------+--------------
pg_pathman | 10 | 2200 | f | 1.4 | {16403,16414} | {"",""}
(1 row)
更新
ALTER EXTENSION pg_pathman UPDATE TO "X.Y";
SET pg_pathman.enable = t;
使用
range 分区
peiybdb=# create table tmp_t1(id int, name text, insert_timestamp timestamp not null);
CREATE TABLE
peiybdb=# insert into tmp_t1 select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
INSERT 0 10000
peiybdb=# select min(insert_timestamp),max(insert_timestamp) from tmp_t1;
min | max
---------------------+---------------------
1991-02-15 00:00:00 | 2018-07-02 00:00:00
(1 row)
分区表有一些限制条件
1、分区列not null约束
2、分区个数必须能覆盖已有的所有记录
创建1000个分区,每个分区包含一个月的数据。
peiybdb=# select create_range_partitions(
'tmp_t1'::regclass, -- 主表oid
'insert_timestamp', -- 分区字段,一定要not null约束
'1990-01-01 00:00:00'::timestamp, -- 开始时间
interval '1 month', -- 分区间隔,一个月
1000, -- 分区表数量
false -- 不立即将数据从主表迁移到分区
);
执行非堵塞迁移
peiybdb=# \df+ partition_table_concurrently
List of functions
-[ RECORD 1 ]-------+--------------------------------------------------------------------------------------------
Schema | public
Name | partition_table_concurrently
Result data type | void
Argument data types | relation regclass, batch_size integer DEFAULT 1000, sleep_time double precision DEFAULT 1.0
Type | normal
Volatility | volatile
Parallel | unsafe
Owner | postgres
Security | invoker
Access privileges |
Language | c
Source code | partition_table_concurrently
Description |
peiybdb=# partition_table_concurrently(
relation REGCLASS, -- 主表OID
batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录
sleep_time FLOAT8 DEFAULT 1.0 -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
)
peiybdb=# select partition_table_concurrently('tmp_t1'::regclass,1000,1.0);
禁用主表
peiybdb=# select set_enable_parent('tmp_t1'::regclass, false);
迁移后,查看数据
peiybdb=# select count(*) from tmp_t1;
count
-------
10000
(1 row)
peiybdb=# select count(*) from only tmp_t1;
count
-------
0
(1 row)
pathman_partition_list 展示的是具体的分区表
peiybdb=# select * from pathman_partition_list limit 100;
parent | partition | parttype | expr | range_min | range_max
--------+------------+----------+------------------+---------------------+---------------------
tmp_t1 | tmp_t1_1 | 2 | insert_timestamp | 1990-01-01 00:00:00 | 1990-02-01 00:00:00
tmp_t1 | tmp_t1_2 | 2 | insert_timestamp | 1990-02-01 00:00:00 | 1990-03-01 00:00:00
tmp_t1 | tmp_t1_3 | 2 | insert_timestamp | 1990-03-01 00:00:00 | 1990-04-01 00:00:00
tmp_t1 | tmp_t1_4 | 2 | insert_timestamp | 1990-04-01 00:00:00 | 1990-05-01 00:00:00
tmp_t1 | tmp_t1_5 | 2 | insert_timestamp | 1990-05-01 00:00:00 | 1990-06-01 00:00:00
tmp_t1 | tmp_t1_6 | 2 | insert_timestamp | 1990-06-01 00:00:00 | 1990-07-01 00:00:00
tmp_t1 | tmp_t1_7 | 2 | insert_timestamp | 1990-07-01 00:00:00 | 1990-08-01 00:00:00
tmp_t1 | tmp_t1_8 | 2 | insert_timestamp | 1990-08-01 00:00:00 | 1990-09-01 00:00:00
tmp_t1 | tmp_t1_9 | 2 | insert_timestamp | 1990-09-01 00:00:00 | 1990-10-01 00:00:00
tmp_t1 | tmp_t1_10 | 2 | insert_timestamp | 1990-10-01 00:00:00 | 1990-11-01 00:00:00
这是如果插入的分区值不在分区表的范围内,pg_pathman会自动创建相关的分区表,这个确实很实用、很方便。
hash 分区
peiybdb=# create table tmp_hash(id int not null, name text, insert_timestamp timestamp not null);
CREATE TABLE
peiybdb=# insert into tmp_hash select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
INSERT 0 10000
peiybdb=# select min(id),max(id) from tmp_hash;
min | max
-----+-------
1 | 10000
(1 row)
创建50个hash分区。
peiybdb=# select create_hash_partitions(
'tmp_hash'::regclass, -- 主表oid
'id', -- 分区字段,一定要not null约束
50, -- 分区表数量
false -- 不立即将数据从主表迁移到分区
);
执行非堵塞迁移
peiybdb=# partition_table_concurrently(
relation REGCLASS, -- 主表OID
batch_size INTEGER DEFAULT 1000, -- 一个事务批量迁移多少记录
sleep_time FLOAT8 DEFAULT 1.0 -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
)
peiybdb=# select partition_table_concurrently('tmp_hash'::regclass,1000,1.0);
禁用主表
peiybdb=# select set_enable_parent('tmp_hash'::regclass, false);
查看结果
peiybdb=# select count(1) from tmp_hash;
count
-------
10000
(1 row)
peiybdb=# select count(1) from only tmp_hash;
count
-------
0
(1 row)
查看一些相关信息
peiybdb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------------------------+----------+----------+------------+-------------
public | pathman_cache_stats | view | postgres | 0 bytes |
public | pathman_concurrent_part_tasks | view | postgres | 0 bytes |
public | pathman_config | table | postgres | 16 kB |
public | pathman_config_params | table | postgres | 16 kB |
public | pathman_partition_list | view | postgres | 0 bytes |
建议
- 分区列必须有not null约束
- 分区个数必须能覆盖已有的所有记录
- 建议使用非堵塞式迁移接口(select partition_table_concurrently(‘tmp_t1’::regclass,1000,1.0);)
- 建议数据迁移完成后,禁用主表 (select set_enable_parent(‘tmp_t1’::regclass, false);)
参考:
https://github.com/postgrespro/pg_pathman
https://github.com/postgrespro/pg_pathman/wiki