KingbaseES BitmapScan 解读

bitmap scan 解读

什么是bitmap scan?

bitmap scan是pg中表的扫描计划的一种。

bitmap scan的作用就是通过建立位图的方式,将index scan 回表过程中对页访问的随机性IO转换为顺行性行为,从而减少查询过程中IO的消耗。

其优化理念是通过bitmap 的生成过程中增加内存和CPU字段消耗来减少IO消耗。

bitmap scan 如何工作?

bitmap scan 一般是发生在对多个单列索引,使用组合查询的SQL中。

我们构建一个测试表来复现bitmap scan :

1、创建测试表
drop table app_family2;
CREATE TABLE app_family2 (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char));
insert into app_family2 select generate_series(1,10000000),
generate_series(1,10000000),'aaaa','aaa','bbb' from dual ;
create index  idx_test1 on app_family2(application_id);

kingbase=# explain analyze select *  from app_family2 where family_id>'5000' and application_id<'50000' ;
                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family2  (cost=83319.69..224331.61 rows=2471187 width=27) (actual time=1405.862..1405.863 rows=0 loops=1)
   Recheck Cond: ((application_id)::text < '50000'::text)
   Filter: ((family_id)::text > '5000'::text)
   Rows Removed by Filter: 4444449
   Heap Blocks: exact=32924
   ->  Bitmap Index Scan on idx_test1  (cost=0.00..82701.89 rows=4463128 width=0) (actual time=576.156..576.157 rows=4444449 loops=1)
         Index Cond: ((application_id)::text < '50000'::text)
 Planning Time: 0.100 ms
 Execution Time: 1405.939 ms
(9 rows)

Bitmap SCAN是对查询条件中的每个索引构造一个bitmap串。bitmap串中,每个bit位对应一个heap page, heap page中有符合条件的行,该bit位就会被标记为1。

当所有索引列的位图创建完成,就对它们执行位与运算(BitmapAnd)。

注:下面的演示中为了方便,只演示了两个索引列的情况,另外不一定都是位与运算,还可能是位或运算(BitmapOR)。

比如:

+---------------------------------------------+    
|100000000001000000010000000000000111100000000| bitmap 1    
|000001000001000100010000000001000010000000010| bitmap 2    
 &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&    
|000000000001000000010000000000000010000000000| Combined bitmap    
+-----------+-------+--------------+----------+    
            |       |              |    
            v       v              v    
Used to scan the heap only for matching pages:    
+---------------------------------------------+    
|___________X_______X______________X__________|    
+---------------------------------------------+
————————————————

当所有索引列的位图创建完成,就对它们执行位与运算(BitmapAnd)或者还可能是位或运算(BitmapOR)。

位图堆扫描(Bitmap Heap Scan)然后寻找每一页的开头,并读取符合条件的页面。

对比index scan

index scan : 是对于给定的查询,先扫描一遍索引,从索引中找到符合要求的记录的指针,再定位到表中具体的page去取。

Bitmap scan:普通的index scan每次从索引中获取一个元组指针,并立即访问表中的元组,那么一个 PAGE 有可能被多次访问,而位图扫描一次性从索引中获取所有的元组指针,使用内存中的“位图”数据结构对它们进行排序,然后按物理元组位置顺序访问表元组。它的核心思想是单个page在扫描期间只访问一次。

影响bitmapscan效率的因素

从bitmap scan的原理上分析,

首先bitmap index的生成步骤相当于是对索引中获取元组指针,然后根据元组指针构建bitmap index ,bitmap index 里面就是按照元组的物理顺序排序的指针。

然后 bitmap heap scan 就需要根据bitmap index 的中元组顺序,顺序读取page。

所以这里面涉及两个性能点 1、bitmap index 中元组的排序,如果从索引获取的元组乱序就会导致排序过程会耗时。2,元组的顺序读取,如果表因为大量的update和delete 导致碎片化严重,再进行扫描的时候一次io扫描范围可能只能扫描到很少的有效page, 这样就需要更多的IO次数,导致性能下降。

下面我们做一个测试来验证以上猜想。

再原有的例子上我们可以看到同样的sql语句,新建的表和索引 执行10次sql的平均执行时间是:

次数 时间(ms)
1 1405.939
2 1371.683
3 1758.093
4 1347.242
5 1377.847
6 1410.497
7 1359.287
8 1399.621
9 1366.159
10 1366.151
平均 1416.2519
kingbase=# explain analyze select *  from app_family2 where family_id>'5000' and application_id<'50000' ;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family2  (cost=83319.69..224331.61 rows=2471187 width=27) (actual time=1366.118..1366.119 rows=0 loops=1)
   Recheck Cond: ((application_id)::text < '50000'::text)
   Filter: ((family_id)::text > '5000'::text)
   Rows Removed by Filter: 4444449
   Heap Blocks: exact=32924
   ->  Bitmap Index Scan on idx_test1  (cost=0.00..82701.89 rows=4463128 width=0) (actual time=541.293..541.293 rows=4444449 loops=1)
         Index Cond: ((application_id)::text < '50000'::text)
 Planning Time: 0.127 ms
 Execution Time: 1366.151 ms
(9 rows)
模拟几次update 操作
declare
begin
for i in 1..100 loop
update app_family2 set family_number='te2' where application_id between i*11000 and i*12000 ;
commit;
end loop;
end;
/

同样的sql 执行10次可以看到执行时间普遍变长

1 1655.903
2 1588.9
3 1560.934
4 1565.523
5 1567.24
6 1589.487
7 1560.742
8 1572.691
9 1679.959
10 1695.155
平均 1603.6534

从执行计划对比可以看到bitmap index scan 和bitmap heap scan 阶段时间都变长了。

kingbase=# explain analyze select *  from app_family2 where family_id>'5000' and application_id<'50000';
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family2  (cost=228502.40..495304.62 rows=2475421 width=26) (actual time=1695.119..1695.121 rows=0 loops=1)
   Recheck Cond: ((application_id)::text < '50000'::text)
   Filter: ((family_id)::text > '5000'::text)
   Rows Removed by Filter: 4444449
   Heap Blocks: exact=32971
   ->  Bitmap Index Scan on idx_test1  (cost=0.00..227883.54 rows=4493748 width=0) (actual time=788.648..788.648 rows=4444449 loops=1)
         Index Cond: ((application_id)::text < '50000'::text)
 Planning Time: 0.128 ms
 Execution Time: 1695.155 ms
(9 rows)
进行vacuum 然后对比时间可以看到时间并没有提升。
kingbase=# explain analyze select *  from app_family2 where family_id>'5000' and application_id<'50000';
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family2  (cost=228502.63..495305.30 rows=2475437 width=26) (actual time=1558.251..1558.253 rows=0 loops=1)
   Recheck Cond: ((application_id)::text < '50000'::text)
   Filter: ((family_id)::text > '5000'::text)
   Rows Removed by Filter: 4444449
   Heap Blocks: exact=32971
   ->  Bitmap Index Scan on idx_test1  (cost=0.00..227883.77 rows=4493778 width=0) (actual time=760.155..760.156 rows=4444449 loops=1)
         Index Cond: ((application_id)::text < '50000'::text)
 Planning Time: 0.097 ms
 Execution Time: 1558.317 ms
(9 rows)
进行vacuum full 然后对比时间,可以看到执行时间基本和新建表时一样。
kingbase=# explain analyze select *  from app_family2 where family_id>'5000' and application_id<'50000';
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family2  (cost=83844.63..225243.96 rows=2474139 width=26) (actual time=1346.106..1346.107 rows=0 loops=1)
   Recheck Cond: ((application_id)::text < '50000'::text)
   Filter: ((family_id)::text > '5000'::text)
   Rows Removed by Filter: 4444449
   Heap Blocks: exact=32912
   ->  Bitmap Index Scan on idx_test1  (cost=0.00..83226.10 rows=4491422 width=0) (actual time=535.140..535.140 rows=4444449 loops=1)
         Index Cond: ((application_id)::text < '50000'::text)
 Planning Time: 0.104 ms
 Execution Time: 1346.175 ms
(9 rows)

分析原因:进行vaccum full 之后整个物理元组都是顺序且密集,一次IO可以扫描更多的page。

总结

总结下来, bitmap scan 对中块的物理顺序是有依赖的。如果执行计划必须要执行bitmap scan 的前提下,再进行了大量update之后。只有vacuum full 才能提升bitmap 的查询效率。vacuum 并不能提升bitmap 的查询效率。

posted @ 2024-04-03 17:01  KINGBASE研究院  阅读(53)  评论(0编辑  收藏  举报