lightdb hash index的性能和限制

　　在开源pg中，pg 11才正式比较接近可用的支持哈希索引，之前的版本在wal、锁并发方面要么不可用、要么限制较多。和数据结构中的hash特性类似，持久化的hash索引也是分桶存储的，如下：

Hash index uses four kinds of pages (gray rectangles):

Meta page - page number zero, which contains information on what is inside the index.
Bucket pages - main pages of the index, which store data as "hash code - TID" pairs.
Overflow pages - structured the same way as bucket pages and used when one page is insufficient for a bucket.
Bitmap pages - which keep track of overflow pages that are currently clear and can be reused for other buckets.

Down arrows starting at index page elements represent TIDs, that is, references to table rows.

　　除了btree外，lightdb也是支持hash index的，但是总体来说支持的特性范围均不如btree索引，比如parallel没有btree索引智能，不支持=之外的操作，不支持bitmap index scan，不支持哈希唯一索引（但可以通过unique not null约束变量实现）等等，但在内存存储不使用固定长度的场景下还是很有优势的。如下：

lightdb@postgres=# create table t(id int);
WARNING:  LightDB DDL check warn! no primary key!
DETAIL:  If your system do not have data replication requirment, just ignore it
CREATE TABLE
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX

lightdb@postgres=# insert into t select id from generate_series(1,10000000) id;
INSERT 0 10000000

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Gather  (cost=854748.00..854748.11 rows=1 width=8)
   Workers Planned: 1
   Single Copy: true
   ->  Aggregate  (cost=844748.00..844748.01 rows=1 width=8)
         ->  Hash Join  (cost=308311.00..819748.00 rows=10000000 width=0)
               Hash Cond: (t1.id = t2.id)
               ->  Seq Scan on t t1  (cost=0.00..144248.00 rows=10000000 width=4)
               ->  Hash  (cost=144248.00..144248.00 rows=10000000 width=4)
                     ->  Seq Scan on t t2  (cost=0.00..144248.00 rows=10000000 width=4)
(9 rows)

Time: 0.953 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10584.519 ms (00:10.585)

下面是OLTP的示例：

lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10004.03..10008.49 rows=4 width=4) (actual time=16.539..19.560 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Bitmap Heap Scan on public.t  (cost=4.03..8.09 rows=4 width=4) (actual time=0.378..0.382 rows=4 loops=1)
         Output: id
         Recheck Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Worker 0:  actual time=0.378..0.382 rows=4 loops=1
         ->  Bitmap Index Scan on idx_t  (cost=0.00..4.03 rows=4 width=0) (actual time=0.296..0.296 rows=4 loops=1)
               Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
               Worker 0:  actual time=0.296..0.296 rows=4 loops=1
 Planning Time: 0.285 ms
 Execution Time: 19.812 ms
(14 rows)

Time: 21.320 ms

===============

对应的btree如下：

lightdb@postgres=# create index idx_t on t(id);
CREATE INDEX
Time: 7298.174 ms (00:07.298)
lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10000.43..10007.21 rows=4 width=4) (actual time=9.571..11.403 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Only Scan using idx_t on public.t  (cost=0.43..6.81 rows=4 width=4) (actual time=0.176..0.185 rows=4 loops=1)
         Output: id
         Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Heap Fetches: 4
         Worker 0:  actual time=0.176..0.185 rows=4 loops=1
 Planning Time: 0.380 ms
 Execution Time: 11.588 ms
(12 rows)

Time: 12.863 ms

lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 6334.176 ms (00:06.334)

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=525399.81..525399.82 rows=1 width=8)
   ->  Gather  (cost=525399.69..525399.80 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=515399.69..515399.70 rows=1 width=8)
               ->  Merge Join  (cost=0.87..500693.81 rows=5882353 width=0)
                     Merge Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Index Only Scan using idx_t on t t2  (cost=0.43..221670.43 rows=10000000 width=4)
(8 rows)

Time: 1.441 ms

====== 换成hash join
lightdb@postgres=# set enable_mergejoin to off;
SET
Time: 0.681 ms
lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=644560.95..644560.96 rows=1 width=8)
   ->  Gather  (cost=644560.83..644560.94 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=634560.83..634560.84 rows=1 width=8)
               ->  Parallel Hash Join  (cost=277001.81..619854.95 rows=5882353 width=0)
                     Hash Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Parallel Hash  (cost=180493.96..180493.96 rows=5882353 width=4)
                           ->  Parallel Index Only Scan using idx_t on t t2  (cost=0.43..180493.96 rows=5882353 width=4)
(9 rows)

Time: 1.492 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 9897.255 ms (00:09.897)

无论是否添加唯一索引都差不多。如下：

lightdb@postgres=# alter table t add unique(id);
ALTER TABLE
Time: 6454.915 ms (00:06.455)
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX
Time: 24519.773 ms (00:24.520)
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10515.582 ms (00:10.516)

上面是int类型的情况，下面来看下字符串类型值相对比较长的情况。

-- hash索引
lightdb@postgres=# alter table t add column v varchar(100);
ALTER TABLE
Time: 1.614 ms
lightdb@postgres=# update t set v = uuid();
UPDATE 10000000
Time: 74764.381 ms (01:14.764)

lightdb@postgres=# drop index idx_v ;
DROP INDEX
Time: 358.055 ms
lightdb@postgres=# create index idx_v on t using hash(v);
CREATE INDEX
Time: 26212.556 ms (00:26.213)
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=16827.289..16843.024 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=16806.915..16806.986 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=16806.915..16806.986 rows=1 loops=1
         ->  Hash Join  (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5346.963..16158.927 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=5346.963..16158.927 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.208..2068.391 rows=10000000 loops=1)
                     Output: t1.v
                     Worker 0:  actual time=0.208..2068.391 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5334.495..5334.558 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 65536  Batches: 256  Memory Usage: 3149kB
                     Worker 0:  actual time=5334.495..5334.558 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.054..2165.095 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.054..2165.095 rows=10000000 loops=1
 Planning Time: 1.016 ms
 Execution Time: 16843.381 ms
(23 rows)

Time: 16845.926 ms (00:16.846)

-- 因为内存小，batches交互多，所以增加work_mem消除I/O影响。
lightdb@postgres=# set work_mem to '128MB';
SET
Time: 0.608 ms
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=22352.372..22388.694 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=22335.328..22335.363 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=22335.328..22335.363 rows=1 loops=1
         ->  Hash Join  (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5627.623..21684.815 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=5627.623..21684.815 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.262..1901.155 rows=10000000 loops=1)
                     Output: t1.v
                     Worker 0:  actual time=0.262..1901.155 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5600.002..5600.028 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 2097152  Batches: 8  Memory Usage: 100708kB
                     Worker 0:  actual time=5600.002..5600.028 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.188..2076.519 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.188..2076.519 rows=10000000 loops=1
 Planning Time: 0.415 ms
 Execution Time: 22388.947 ms
(23 rows)

Time: 22390.522 ms (00:22.391)
lightdb@postgres=# set work_mem to '1280MB';
SET
Time: 0.651 ms
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=796915.94..796916.05 rows=1 width=8) (actual time=18028.588..18412.918 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=786915.94..786915.95 rows=1 width=8) (actual time=18017.839..18017.861 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=18017.839..18017.861 rows=1 loops=1
         ->  Hash Join  (cost=318457.98..761915.94 rows=9999999 width=0) (actual time=7028.907..17394.653 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=7028.907..17394.653 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.141..1507.013 rows=10000000 loops=1)
                     Output: t1.id, t1.v
                     Worker 0:  actual time=0.141..1507.013 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=6881.732..6881.746 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 16777216  Batches: 1  Memory Usage: 804901kB
                     Worker 0:  actual time=6881.732..6881.746 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.128..1781.326 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.128..1781.326 rows=10000000 loops=1
 Planning Time: 0.327 ms
 Execution Time: 18413.120 ms
(23 rows)

Time: 18414.684 ms (00:18.415)

-- b树索引
lightdb@postgres=# drop index idx_v ;
DROP INDEX
Time: 133.610 ms
lightdb@postgres=# create index idx_v on t(v);
CREATE INDEX
Time: 99610.086 ms (01:39.610)
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=635535.74..635535.76 rows=1 width=8) (actual time=55059.701..55482.026 rows=1 loops=1)
   Output: count(1)
   ->  Gather  (cost=635535.53..635535.74 rows=2 width=8) (actual time=54912.085..55481.978 rows=3 loops=1)
         Output: (PARTIAL count(1))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=625535.53..625535.54 rows=1 width=8) (actual time=54888.937..54888.960 rows=1 loops=3)
               Output: PARTIAL count(1)
               Worker 0:  actual time=54886.263..54886.305 rows=1 loops=1
               Worker 1:  actual time=54868.875..54868.890 rows=1 loops=1
               ->  Parallel Hash Join  (cost=309296.10..615118.86 rows=4166666 width=0) (actual time=26238.000..54499.870 rows=3333333 loops=3)
                     Hash Cond: ((t1.v)::text = (t2.v)::text)
                     Worker 0:  actual time=26224.367..54500.826 rows=3357480 loops=1
                     Worker 1:  actual time=26223.631..54484.862 rows=3362240 loops=1
                     ->  Parallel Index Only Scan using idx_v on public.t t1  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.112..23401.164 rows=3333333 loops=3)
                           Output: t1.v
                           Heap Fetches: 10000000
                           Worker 0:  actual time=0.114..23476.367 rows=3357480 loops=1
                           Worker 1:  actual time=0.100..23249.643 rows=3362240 loops=1
                     ->  Parallel Hash  (cost=257212.22..257212.22 rows=4166666 width=37) (actual time=26176.344..26176.362 rows=3333333 loops=3)
                           Output: t2.v
                           Buckets: 16777216  Batches: 1  Memory Usage: 835968kB
                           Worker 0:  actual time=26223.984..26224.022 rows=3279920 loops=1
                           Worker 1:  actual time=26223.158..26223.169 rows=3353980 loops=1
                           ->  Parallel Index Only Scan using idx_v on public.t t2  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.198..23040.507 rows=3333333 loops=3)
                                 Output: t2.v
                                 Heap Fetches: 10000000
                                 Worker 0:  actual time=0.136..23046.634 rows=3279920 loops=1
                                 Worker 1:  actual time=0.327..22978.942 rows=3353980 loops=1
 Planning Time: 0.559 ms
 Execution Time: 55482.322 ms
(31 rows)

Time: 55484.995 ms (00:55.485)

从上可知，在字符串下，hash索引相比btree索引性能高3倍以上。hash索引还是很有价值的。

其它性能测试可参考：https://www.pgcon.org/2017/schedule/attachments/458_durable-hash-indexes-postgresql.pdf。

实现层面解读：https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/pg-hashing/slides.html

posted @ 2024-06-07 17:25 zhjh256 阅读(5) 评论(0) 编辑收藏举报

刷新页面返回顶部

lightdb hash index的性能和限制

公告