pg_pathman 的初步使用

os: centos 7.4
postgresql: 9.6

pg_pathman是postgresql管理分区插件,postgresql 9.6、10 的内置分区管理也一直都在完善。使用哪种方式来管理,用户自己决定。不过pg_pathman 确实很方便。
由于pg_pathman使用了custom scan provider api,所以只支持PostgreSQL 9.5以及以上的版本

The pg_pathman module provides optimized partitioning mechanism and functions to manage partitions.

The extension is compatible with:

PostgreSQL 9.5, 9.6, 10;
Postgres Pro Standard 9.5, 9.6;
Postgres Pro Enterprise;

PostgreSQL <= 10 supports partitioning via table inheritance: each partition must be created as a child table with CHECK CONSTRAINT:

CREATE TABLE test (id SERIAL PRIMARY KEY, title TEXT);
CREATE TABLE test_1 (CHECK ( id >= 100 AND id < 200 )) INHERITS (test);
CREATE TABLE test_2 (CHECK ( id >= 200 AND id < 300 )) INHERITS (test);

PostgreSQL 10 provides native partitioning:

CREATE TABLE test(id int4, value text) PARTITION BY RANGE(id);
CREATE TABLE test_1 PARTITION OF test FOR VALUES FROM (1) TO (10);
CREATE TABLE test_2 PARTITION OF test FOR VALUES FROM (10) TO (20);

下载

$ git clone https://github.com/postgrespro/pg_pathman.git
$ git branch -a
$ git checkout PGPRO

安装

export PGHOME=/usr/pgsql-9.6 
export PGDATA=/var/lib/pgsql/9.6/main
export MANPATH=$PGHOME/share/man:$MANPATH
export LD_LIBRARY_PATH=$PGHOME/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
export PATH=$PGHOME/bin:$PATH

$ make USE_PGXS=1
$ make install USE_PGXS=1
$ ls -l /usr/pgsql-9.6/lib/ |grep -i path
-rwxr-xr-x 1 postgres postgres 253688 Jul  3 17:02 pg_pathman.so
$ ls -l /usr/pgsql-9.6/share/extension/ |grep -i path
-rw-r--r-- 1 postgres postgres 55882 Jul  3 17:02 pg_pathman--1.0--1.1.sql
-rw-r--r-- 1 postgres postgres 36973 Jul  3 17:02 pg_pathman--1.1--1.2.sql
-rw-r--r-- 1 postgres postgres 29154 Jul  3 17:02 pg_pathman--1.2--1.3.sql
-rw-r--r-- 1 postgres postgres 43021 Jul  3 17:02 pg_pathman--1.3--1.4.sql
-rw-r--r-- 1 postgres postgres 57125 Jul  3 17:02 pg_pathman--1.4.sql
-rw-r--r-- 1 postgres postgres   131 Jul  3 17:02 pg_pathman.control


$ vi postgresql.conf

shared_preload_libraries = 'pg_pathman, pg_stat_statements'

$ psql
postgres=# \c peiybdb
You are now connected to database "peiybdb" as user "postgres".
peiybdb=# show shared_preload_libraries;
    shared_preload_libraries    
--------------------------------
 pg_pathman, pg_stat_statements
(1 row)

peiybdb=# select * from pg_available_extensions where name like '%path%';
    name    | default_version | installed_version |             comment              
------------+-----------------+-------------------+----------------------------------
 pg_pathman | 1.4             | 1.4               | Partitioning tool for PostgreSQL
(1 row)

peiybdb=# CREATE EXTENSION pg_pathman;
CREATE EXTENSION

peiybdb=# select * from pg_extension where 1=1 and extname like '%path%';
  extname   | extowner | extnamespace | extrelocatable | extversion |   extconfig   | extcondition 
------------+----------+--------------+----------------+------------+---------------+--------------
 pg_pathman |       10 |         2200 | f              | 1.4        | {16403,16414} | {"",""}
(1 row)

更新

ALTER EXTENSION pg_pathman UPDATE TO "X.Y";
SET pg_pathman.enable = t;

使用

range 分区

peiybdb=# create table tmp_t1(id int, name text, insert_timestamp timestamp not null);
CREATE TABLE
peiybdb=# insert into tmp_t1 select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
INSERT 0 10000
peiybdb=# select min(insert_timestamp),max(insert_timestamp) from tmp_t1;
         min         |         max         
---------------------+---------------------
 1991-02-15 00:00:00 | 2018-07-02 00:00:00
(1 row)

分区表有一些限制条件
1、分区列not null约束
2、分区个数必须能覆盖已有的所有记录

创建1000个分区,每个分区包含一个月的数据。

peiybdb=# select create_range_partitions(
'tmp_t1'::regclass,                 -- 主表oid
'insert_timestamp',                 -- 分区字段,一定要not null约束
'1990-01-01 00:00:00'::timestamp,   -- 开始时间
interval '1 month',                 -- 分区间隔,一个月
1000,                               -- 分区表数量
false                               -- 不立即将数据从主表迁移到分区
);

执行非堵塞迁移

peiybdb=# \df+ partition_table_concurrently
List of functions
-[ RECORD 1 ]-------+--------------------------------------------------------------------------------------------
Schema              | public
Name                | partition_table_concurrently
Result data type    | void
Argument data types | relation regclass, batch_size integer DEFAULT 1000, sleep_time double precision DEFAULT 1.0
Type                | normal
Volatility          | volatile
Parallel            | unsafe
Owner               | postgres
Security            | invoker
Access privileges   | 
Language            | c
Source code         | partition_table_concurrently
Description         |

peiybdb=# partition_table_concurrently(
relation   REGCLASS,              -- 主表OID
batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
sleep_time FLOAT8 DEFAULT 1.0     -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
)    

peiybdb=# select partition_table_concurrently('tmp_t1'::regclass,1000,1.0);

禁用主表

peiybdb=# select set_enable_parent('tmp_t1'::regclass, false);

迁移后,查看数据

peiybdb=# select count(*) from  tmp_t1;
 count 
-------
 10000
(1 row)

peiybdb=# select count(*) from only tmp_t1;
 count 
-------
     0
(1 row)

pathman_partition_list 展示的是具体的分区表

peiybdb=# select * from pathman_partition_list limit 100;
 parent | partition  | parttype |       expr       |      range_min      |      range_max      
--------+------------+----------+------------------+---------------------+---------------------
 tmp_t1 | tmp_t1_1   |        2 | insert_timestamp | 1990-01-01 00:00:00 | 1990-02-01 00:00:00
 tmp_t1 | tmp_t1_2   |        2 | insert_timestamp | 1990-02-01 00:00:00 | 1990-03-01 00:00:00
 tmp_t1 | tmp_t1_3   |        2 | insert_timestamp | 1990-03-01 00:00:00 | 1990-04-01 00:00:00
 tmp_t1 | tmp_t1_4   |        2 | insert_timestamp | 1990-04-01 00:00:00 | 1990-05-01 00:00:00
 tmp_t1 | tmp_t1_5   |        2 | insert_timestamp | 1990-05-01 00:00:00 | 1990-06-01 00:00:00
 tmp_t1 | tmp_t1_6   |        2 | insert_timestamp | 1990-06-01 00:00:00 | 1990-07-01 00:00:00
 tmp_t1 | tmp_t1_7   |        2 | insert_timestamp | 1990-07-01 00:00:00 | 1990-08-01 00:00:00
 tmp_t1 | tmp_t1_8   |        2 | insert_timestamp | 1990-08-01 00:00:00 | 1990-09-01 00:00:00
 tmp_t1 | tmp_t1_9   |        2 | insert_timestamp | 1990-09-01 00:00:00 | 1990-10-01 00:00:00
 tmp_t1 | tmp_t1_10  |        2 | insert_timestamp | 1990-10-01 00:00:00 | 1990-11-01 00:00:00

这是如果插入的分区值不在分区表的范围内,pg_pathman会自动创建相关的分区表,这个确实很实用、很方便。

hash 分区

peiybdb=# create table tmp_hash(id int not null, name text, insert_timestamp timestamp not null);
CREATE TABLE
peiybdb=# insert into tmp_hash select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
INSERT 0 10000
peiybdb=# select min(id),max(id) from tmp_hash;
 min |  max  
-----+-------
   1 | 10000
(1 row)

创建50个hash分区。

peiybdb=# select create_hash_partitions(
'tmp_hash'::regclass,               -- 主表oid
'id',                               -- 分区字段,一定要not null约束
50,                                 -- 分区表数量
false                               -- 不立即将数据从主表迁移到分区
);

执行非堵塞迁移

peiybdb=# partition_table_concurrently(
relation   REGCLASS,              -- 主表OID
batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
sleep_time FLOAT8 DEFAULT 1.0     -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
)    

peiybdb=# select partition_table_concurrently('tmp_hash'::regclass,1000,1.0);

禁用主表

peiybdb=# select set_enable_parent('tmp_hash'::regclass, false);

查看结果

peiybdb=# select count(1) from tmp_hash;
 count 
-------
 10000
(1 row)

peiybdb=# select count(1) from only tmp_hash;
 count 
-------
     0
(1 row)

查看一些相关信息

peiybdb=# \d+
                                   List of relations
 Schema |             Name              |   Type   |  Owner   |    Size    | Description 
--------+-------------------------------+----------+----------+------------+-------------
 public | pathman_cache_stats           | view     | postgres | 0 bytes    | 
 public | pathman_concurrent_part_tasks | view     | postgres | 0 bytes    | 
 public | pathman_config                | table    | postgres | 16 kB      | 
 public | pathman_config_params         | table    | postgres | 16 kB      | 
 public | pathman_partition_list        | view     | postgres | 0 bytes    |

建议

  1. 分区列必须有not null约束
  2. 分区个数必须能覆盖已有的所有记录
  3. 建议使用非堵塞式迁移接口(select partition_table_concurrently(‘tmp_t1’::regclass,1000,1.0);)
  4. 建议数据迁移完成后,禁用主表 (select set_enable_parent(‘tmp_t1’::regclass, false);)

参考:
https://github.com/postgrespro/pg_pathman
https://github.com/postgrespro/pg_pathman/wiki

https://github.com/postgrespro/pg_pathman_build

posted @ 2018-07-03 20:35  peiybpeiyb  阅读(595)  评论(0编辑  收藏  举报