巧用dblink 实现多进程并行查询
概述
对于分区表的大数据统计分析,由于数据量巨大,往往需要采用并行。但是数据库并行的效率相比分进程分表统计还是有比较大的差距。本文通过巧用dblink,实现分进程分分区统计数据。
例子
kingbase=# \d t751
分区表 "public.t751"
栏位 | 类型 | 校对规则 | 可空的 | 预设
------+-----------------------------+----------+----------+------
id | bigint | | not null |
code | text | | |
c1 | timestamp without time zone | | |
c2 | text | | |
c3 | numeric | | |
c4 | integer | | |
c5 | integer | | |
c6 | integer | | |
v2 | numeric | | |
v3 | timestamp without time zone | | |
分区键值: HASH (c4, c5, c6)
分区的数量:1000(可以使用 \d+ 来列出它们)
表空间:"nvmtbs01"
kingbase=# select pg_size_pretty(sum(pg_relation_size(relid))) from pg_partition_tree('t751');
pg_size_pretty
----------------
20 GB
并行方式访问
kingbase=# explain analyze select c4, c5, c6, sum(v2) v2 from t751 group by c4, c5, c6; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=5656234.57..14706860.46 rows=8000000 width=44) (actual time=15764.781..19110.336 rows=11781 loops=1) Group Key: t751_872.c4, t751_872.c5, t751_872.c6 -> Gather Merge (cost=5656234.57..13806860.46 rows=64000000 width=44) (actual time=15764.142..19103.239 rows=11902 loops=1) Workers Planned: 8 Workers Launched: 7 -> Partial GroupAggregate (cost=5655234.43..5911484.32 rows=8000000 width=44) (actual time=15478.852..18479.480 rows=1488 loops=8) Group Key: t751_872.c4, t751_872.c5, t751_872.c6 -> Sort (cost=5655234.43..5686484.41 rows=12499991 width=22) (actual time=15478.239..17066.239 rows=12500000 loops=8) Sort Key: t751_872.c4, t751_872.c5, t751_872.c6 Sort Method: quicksort Memory: 1408511kB Worker 0: Sort Method: quicksort Memory: 1371267kB Worker 1: Sort Method: quicksort Memory: 1364497kB Worker 2: Sort Method: quicksort Memory: 1374769kB Worker 3: Sort Method: quicksort Memory: 1355381kB Worker 4: Sort Method: quicksort Memory: 1352989kB Worker 5: Sort Method: quicksort Memory: 1343974kB Worker 6: Sort Method: quicksort Memory: 1386845kB -> Parallel Append (cost=0.00..3156369.51 rows=12499991 width=22) (actual time=0.034..5045.596 rows=12500000 loops=8) -> Parallel Seq Scan on t751_872 (cost=0.00..5849.56 rows=81756 width=22) (actual time=0.025..60.070 rows=196215 loops=1) -> Parallel Seq Scan on t751_470 (cost=0.00..5753.10 rows=80410 width=22) (actual time=0.068..73.100 rows=192984 loops=1) ... -> Parallel Seq Scan on t751_416 (cost=0.00..556.80 rows=10380 width=22) (actual time=0.009..5.736 rows=17646 loops=1) -> Parallel Seq Scan on t751_885 (cost=0.00..315.77 rows=5877 width=22) (actual time=0.010..2.460 rows=9991 loops=1) Planning Time: 26.450 ms Execution Time: 19180.545 ms
注意:这里实际采用的是groupagg , 如果采用hashagg,执行效率会高很多
使用dblink方式
创建函数
create or replace function f_t_part(para int default 0) returns table ( c4 int, c5 int, c6 int, v2 numeric ) language plpgsql immutable parallel safe as $$ declare vtabnam text; atabnam text[]; begin perform dblink_disconnect(conn) from unnest(dblink_get_connections()) conn; perform dblink_connect('conn_' || sn, 'dbname=' || current_database()) from generate_series(1, para) sn; if para <= 0 then for vtabnam in select relid from pg_partition_tree('t751') loop for c4,c5,c6,v2 in execute format('select c4, c5, c6, sum(v2) v2 from only %s group by c4, c5, c6 ;',vtabnam) loop return next ; end loop; end loop; else for atabnam in select string_to_array(string_agg(relid, ','), ',') relid from pg_partition_tree('t751') with ordinality , lateral ( select ordinality / para sn) where isleaf group by sn loop perform dblink_is_busy(conn) = 0 from unnest(dblink_get_connections()) conn; perform conn,tabnam, dblink_send_query(conn,format('select c4, c5, c6, sum(v2) v2 from only %s group by c4, c5, c6 ;',tabnam)) sq from (select unnest(dblink_get_connections()) conn, unnest(atabnam) tabnam) v where tabnam is not null; for c4,c5,c6,v2 in select tab.* from unnest(dblink_get_connections()) conn , dblink_get_result(conn) as tab(c4 int, c5 int, c6 int, v2 numeric) loop return next ; end loop; end loop; end if; perform dblink_disconnect(conn) from unnest(dblink_get_connections()) conn; return; end; $$;
执行效率
ingbase=# explain analyze select * from ft751_01(); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Function Scan on ft751_01 (cost=0.25..10.25 rows=1000 width=44) (actual time=32738.962..32739.288 rows=11781 loops=1) Planning Time: 0.055 ms Execution Time: 32741.655 ms (3 行记录) kingbase=# explain analyze ^Jselect * from ft751_01(16); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Function Scan on ft751_01 (cost=0.25..10.25 rows=1000 width=44) (actual time=5397.644..5398.189 rows=11781 loops=1) Planning Time: 0.044 ms Execution Time: 5398.887 ms
结论
实际验证,二者性能上差距并不大。用dblink 方式主要是可以更灵活的使用并行数量。这里的目的主要是向大家提供灵活使用并行的一种方法。
KINGBASE研究院