lightdb hash index的性能和限制
在开源pg中,pg 11才正式比较接近可用的支持哈希索引,之前的版本在wal、锁并发方面要么不可用、要么限制较多。和数据结构中的hash特性类似,持久化的hash索引也是分桶存储的,如下:
Hash index uses four kinds of pages (gray rectangles):
- Meta page - page number zero, which contains information on what is inside the index.
- Bucket pages - main pages of the index, which store data as "hash code - TID" pairs.
- Overflow pages - structured the same way as bucket pages and used when one page is insufficient for a bucket.
- Bitmap pages - which keep track of overflow pages that are currently clear and can be reused for other buckets.
Down arrows starting at index page elements represent TIDs, that is, references to table rows.
除了btree外,lightdb也是支持hash index的,但是总体来说支持的特性范围均不如btree索引,比如parallel没有btree索引智能,不支持=之外的操作,不支持bitmap index scan,不支持哈希唯一索引(但可以通过unique not null约束变量实现)等等,但在内存存储不使用固定长度的场景下还是很有优势的。如下:
lightdb@postgres=# create table t(id int); WARNING: LightDB DDL check warn! no primary key! DETAIL: If your system do not have data replication requirment, just ignore it CREATE TABLE lightdb@postgres=# create index idx_t on t using hash(id); CREATE INDEX lightdb@postgres=# insert into t select id from generate_series(1,10000000) id; INSERT 0 10000000 lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ---------------------------------------------------------------------------------------- Gather (cost=854748.00..854748.11 rows=1 width=8) Workers Planned: 1 Single Copy: true -> Aggregate (cost=844748.00..844748.01 rows=1 width=8) -> Hash Join (cost=308311.00..819748.00 rows=10000000 width=0) Hash Cond: (t1.id = t2.id) -> Seq Scan on t t1 (cost=0.00..144248.00 rows=10000000 width=4) -> Hash (cost=144248.00..144248.00 rows=10000000 width=4) -> Seq Scan on t t2 (cost=0.00..144248.00 rows=10000000 width=4) (9 rows) Time: 0.953 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 10584.519 ms (00:10.585)
下面是OLTP的示例:
lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Gather (cost=10004.03..10008.49 rows=4 width=4) (actual time=16.539..19.560 rows=4 loops=1) Output: id Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Bitmap Heap Scan on public.t (cost=4.03..8.09 rows=4 width=4) (actual time=0.378..0.382 rows=4 loops=1) Output: id Recheck Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Worker 0: actual time=0.378..0.382 rows=4 loops=1 -> Bitmap Index Scan on idx_t (cost=0.00..4.03 rows=4 width=0) (actual time=0.296..0.296 rows=4 loops=1) Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Worker 0: actual time=0.296..0.296 rows=4 loops=1 Planning Time: 0.285 ms Execution Time: 19.812 ms (14 rows) Time: 21.320 ms
===============
对应的btree如下:
lightdb@postgres=# create index idx_t on t(id); CREATE INDEX Time: 7298.174 ms (00:07.298) lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=10000.43..10007.21 rows=4 width=4) (actual time=9.571..11.403 rows=4 loops=1) Output: id Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Index Only Scan using idx_t on public.t (cost=0.43..6.81 rows=4 width=4) (actual time=0.176..0.185 rows=4 loops=1) Output: id Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Heap Fetches: 4 Worker 0: actual time=0.176..0.185 rows=4 loops=1 Planning Time: 0.380 ms Execution Time: 11.588 ms (12 rows) Time: 12.863 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 6334.176 ms (00:06.334) lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=525399.81..525399.82 rows=1 width=8) -> Gather (cost=525399.69..525399.80 rows=1 width=8) Workers Planned: 1 -> Partial Aggregate (cost=515399.69..515399.70 rows=1 width=8) -> Merge Join (cost=0.87..500693.81 rows=5882353 width=0) Merge Cond: (t1.id = t2.id) -> Parallel Index Only Scan using idx_t on t t1 (cost=0.43..180493.96 rows=5882353 width=4) -> Index Only Scan using idx_t on t t2 (cost=0.43..221670.43 rows=10000000 width=4) (8 rows) Time: 1.441 ms ====== 换成hash join lightdb@postgres=# set enable_mergejoin to off; SET Time: 0.681 ms lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=644560.95..644560.96 rows=1 width=8) -> Gather (cost=644560.83..644560.94 rows=1 width=8) Workers Planned: 1 -> Partial Aggregate (cost=634560.83..634560.84 rows=1 width=8) -> Parallel Hash Join (cost=277001.81..619854.95 rows=5882353 width=0) Hash Cond: (t1.id = t2.id) -> Parallel Index Only Scan using idx_t on t t1 (cost=0.43..180493.96 rows=5882353 width=4) -> Parallel Hash (cost=180493.96..180493.96 rows=5882353 width=4) -> Parallel Index Only Scan using idx_t on t t2 (cost=0.43..180493.96 rows=5882353 width=4) (9 rows) Time: 1.492 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 9897.255 ms (00:09.897)
无论是否添加唯一索引都差不多。如下:
lightdb@postgres=# alter table t add unique(id); ALTER TABLE Time: 6454.915 ms (00:06.455) lightdb@postgres=# create index idx_t on t using hash(id); CREATE INDEX Time: 24519.773 ms (00:24.520) lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 10515.582 ms (00:10.516)
上面是int类型的情况,下面来看下字符串类型值相对比较长的情况。
-- hash索引
lightdb@postgres=# alter table t add column v varchar(100); ALTER TABLE Time: 1.614 ms lightdb@postgres=# update t set v = uuid(); UPDATE 10000000 Time: 74764.381 ms (01:14.764) lightdb@postgres=# drop index idx_v ; DROP INDEX Time: 358.055 ms lightdb@postgres=# create index idx_v on t using hash(v); CREATE INDEX Time: 26212.556 ms (00:26.213) lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=16827.289..16843.024 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=16806.915..16806.986 rows=1 loops=1) Output: count(1) Worker 0: actual time=16806.915..16806.986 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5346.963..16158.927 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5346.963..16158.927 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.208..2068.391 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.208..2068.391 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5334.495..5334.558 rows=10000000 loops=1) Output: t2.v Buckets: 65536 Batches: 256 Memory Usage: 3149kB Worker 0: actual time=5334.495..5334.558 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.054..2165.095 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.054..2165.095 rows=10000000 loops=1 Planning Time: 1.016 ms Execution Time: 16843.381 ms (23 rows) Time: 16845.926 ms (00:16.846) -- 因为内存小,batches交互多,所以增加work_mem消除I/O影响。 lightdb@postgres=# set work_mem to '128MB'; SET Time: 0.608 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=22352.372..22388.694 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=22335.328..22335.363 rows=1 loops=1) Output: count(1) Worker 0: actual time=22335.328..22335.363 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5627.623..21684.815 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5627.623..21684.815 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.262..1901.155 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.262..1901.155 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5600.002..5600.028 rows=10000000 loops=1) Output: t2.v Buckets: 2097152 Batches: 8 Memory Usage: 100708kB Worker 0: actual time=5600.002..5600.028 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.188..2076.519 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.188..2076.519 rows=10000000 loops=1 Planning Time: 0.415 ms Execution Time: 22388.947 ms (23 rows) Time: 22390.522 ms (00:22.391) lightdb@postgres=# set work_mem to '1280MB'; SET Time: 0.651 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=796915.94..796916.05 rows=1 width=8) (actual time=18028.588..18412.918 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=786915.94..786915.95 rows=1 width=8) (actual time=18017.839..18017.861 rows=1 loops=1) Output: count(1) Worker 0: actual time=18017.839..18017.861 rows=1 loops=1 -> Hash Join (cost=318457.98..761915.94 rows=9999999 width=0) (actual time=7028.907..17394.653 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=7028.907..17394.653 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.141..1507.013 rows=10000000 loops=1) Output: t1.id, t1.v Worker 0: actual time=0.141..1507.013 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=6881.732..6881.746 rows=10000000 loops=1) Output: t2.v Buckets: 16777216 Batches: 1 Memory Usage: 804901kB Worker 0: actual time=6881.732..6881.746 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.128..1781.326 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.128..1781.326 rows=10000000 loops=1 Planning Time: 0.327 ms Execution Time: 18413.120 ms (23 rows) Time: 18414.684 ms (00:18.415)
-- b树索引 lightdb@postgres=# drop index idx_v ; DROP INDEX Time: 133.610 ms lightdb@postgres=# create index idx_v on t(v); CREATE INDEX Time: 99610.086 ms (01:39.610) lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=635535.74..635535.76 rows=1 width=8) (actual time=55059.701..55482.026 rows=1 loops=1) Output: count(1) -> Gather (cost=635535.53..635535.74 rows=2 width=8) (actual time=54912.085..55481.978 rows=3 loops=1) Output: (PARTIAL count(1)) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=625535.53..625535.54 rows=1 width=8) (actual time=54888.937..54888.960 rows=1 loops=3) Output: PARTIAL count(1) Worker 0: actual time=54886.263..54886.305 rows=1 loops=1 Worker 1: actual time=54868.875..54868.890 rows=1 loops=1 -> Parallel Hash Join (cost=309296.10..615118.86 rows=4166666 width=0) (actual time=26238.000..54499.870 rows=3333333 loops=3) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=26224.367..54500.826 rows=3357480 loops=1 Worker 1: actual time=26223.631..54484.862 rows=3362240 loops=1 -> Parallel Index Only Scan using idx_v on public.t t1 (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.112..23401.164 rows=3333333 loops=3) Output: t1.v Heap Fetches: 10000000 Worker 0: actual time=0.114..23476.367 rows=3357480 loops=1 Worker 1: actual time=0.100..23249.643 rows=3362240 loops=1 -> Parallel Hash (cost=257212.22..257212.22 rows=4166666 width=37) (actual time=26176.344..26176.362 rows=3333333 loops=3) Output: t2.v Buckets: 16777216 Batches: 1 Memory Usage: 835968kB Worker 0: actual time=26223.984..26224.022 rows=3279920 loops=1 Worker 1: actual time=26223.158..26223.169 rows=3353980 loops=1 -> Parallel Index Only Scan using idx_v on public.t t2 (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.198..23040.507 rows=3333333 loops=3) Output: t2.v Heap Fetches: 10000000 Worker 0: actual time=0.136..23046.634 rows=3279920 loops=1 Worker 1: actual time=0.327..22978.942 rows=3353980 loops=1 Planning Time: 0.559 ms Execution Time: 55482.322 ms (31 rows) Time: 55484.995 ms (00:55.485)
从上可知,在字符串下,hash索引相比btree索引性能高3倍以上。hash索引还是很有价值的。
其它性能测试可参考:https://www.pgcon.org/2017/schedule/attachments/458_durable-hash-indexes-postgresql.pdf。
实现层面解读:https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/pg-hashing/slides.html