Postgres分表
这个需求就是一个典型的按时间创建分区表,首先看一下步骤:
- 创建父表
先创建一张“父表”,所有分区表都从它继承,这个表中没有数据,也不要在这个表上定义任何检查约束及索引,现在我们就先创建这样一张表,但之前先建一个序列:
CREATE SEQUENCE "public"."control_alarm_info_uid_seq"
INCREMENT 1
MINVALUE 1
MAXVALUE 99999999
START 1
CACHE 1;
ALTER TABLE "public"."control_alarm_info_uid_seq " OWNER TO "postgres";
接下来创建“父表”,
-- Table: control_alarm_info
-- DROP TABLE control_alarm_info;
CREATE TABLE control_alarm_info
(
uid bigserial NOT NULL,
status bigint,
create_time timestamp without time zone,
update_time timestamp without time zone,
creator character varying(32),
store_id integer,
store_name character varying(255),
target_id integer,
person_name character varying(255),
telphone character varying(255),
alarm_time character varying(255),
face_picurl character varying(255),
camera_info_id character varying(255),
camera_name character varying(255),
event_log_id character varying(255),
similarity real,
person_code character varying(64),
alarm_type character varying(255),
bkg_picurl character varying(255),
app_key character varying(64)
)
WITH (
OIDS=FALSE
);
ALTER TABLE control_alarm_info4
OWNER TO postgres;
- 按时间触发,创建n个子表
- 创建n个子表,每个子表都是继承于父表
由于每个分区表都是从父表继承的,所以分区表不会增加任何字段,下面我们按需求创建4张分区子表,分别用于存放9月、10月、11月和12月的日志数据:
create table control_alarm_info_201809
(CHECK (alarm_time >= '2018-09-01' AND alarm_time < '2018-10-01'))
INHERITS (control_alarm_info);
create table control_alarm_info_201810
(CHECK (alarm_time >= '2018-10-01' AND alarm_time < '2018-11-01'))
INHERITS (control_alarm_info);
create table control_alarm_info_201811
(CHECK (alarm_time >= '2018-11-01' AND alarm_time < '2018-12-01'))
INHERITS (control_alarm_info);
接下来在这4张分区表的每个分区键上建立索引:
在上面的语句中我们添加了一个约束表示只允许插入本月的数据,接下来在这4张分区表的每个分区键上建立索引:
create index control_alarm_info_201809_alarm_time ON control_alarm_info_201809 (alarm_time);
create index control_alarm_info_201810_alarm_time ON control_alarm_info_201810 (alarm_time);
create index control_alarm_info_201811_alarm_time ON control_alarm_info_201811(alarm_time);
查询时查询条件中包含这些索引时才会提高查询效率,如果能定位到一张子表内,效率更高。
- 定义一个规则(Rule)或触发器(Trigger),把对主表的数据插入重定向到合适的分区表
如何才能让不同日期的数据自动的插入与其对应的分区子表中呢?有两种解决方案,分别是:规则(Rule)和触发器(Trigger),相比触发器,Rule的开销更大,所以我在这里就不做过多介绍了,下面直接介绍Trigger的方式。
Trigger通常会结合自定义函数(Function)来实现分区插入,Function负责根据条件选择插入,而Trigger则负责Function的自动调用。首先定义Function,功能很简单,即根据日期区间insert数据即可:
Drop trigger control_alarm_info_insert_trigger;
CREATE
OR REPLACE FUNCTION control_alarm_info_insert_trigger () RETURNS TRIGGER AS $$
BEGIN
IF (
NEW .alarm_time >= '2018-09-01'
AND NEW .alarm_time < '2018-10-01'
) THEN
INSERT INTO control_alarm_info_201809
VALUES
(NEW .*) ;
ELSEIF (
NEW .alarm_time >= '2018-10-01'
AND NEW .alarm_time < '2018-11-01'
) THEN
INSERT INTO control_alarm_info_201810
VALUES
(NEW .*) ;
ELSEIF (
NEW .alarm_time >= '2018-11-01'
AND NEW .alarm_time < '2018-12-01'
) THEN
INSERT INTO control_alarm_info_201811
VALUES
(NEW .*) ;
ELSE
RAISE EXCEPTION 'Date out of range!' ;
END
IF ; RETURN NULL ;
END ; $$ LANGUAGE plpgsql;
最后再创建触发器用于执行刚才的Function:
CREATE TRIGGER control_alarm_info_insert_trigger BEFORE INSERT ON control_alarm_info
FOR EACH ROW
EXECUTE PROCEDURE control_alarm_info_insert_trigger();
未分表时200万数据中查询最后一条告警记录,2.505s
分表后200万数据中查询最后一条告警记录,0.221s,提升了10倍
插入性能
未分表时200万数据中插入10万条数据,需要104s