KingbaseES BitmapScan 解读
bitmap scan 解读
什么是bitmap scan?
bitmap scan是pg中表的扫描计划的一种。
bitmap scan的作用就是通过建立位图的方式,将index scan 回表过程中对页访问的随机性IO转换为顺行性行为,从而减少查询过程中IO的消耗。
其优化理念是通过bitmap 的生成过程中增加内存和CPU字段消耗来减少IO消耗。
bitmap scan 如何工作?
bitmap scan 一般是发生在对多个单列索引,使用组合查询的SQL中。
我们构建一个测试表来复现bitmap scan :
1、创建测试表
drop table app_family2;
CREATE TABLE app_family2 (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char));
insert into app_family2 select generate_series(1,10000000),
generate_series(1,10000000),'aaaa','aaa','bbb' from dual ;
create index idx_test1 on app_family2(application_id);
kingbase=# explain analyze select * from app_family2 where family_id>'5000' and application_id<'50000' ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family2 (cost=83319.69..224331.61 rows=2471187 width=27) (actual time=1405.862..1405.863 rows=0 loops=1)
Recheck Cond: ((application_id)::text < '50000'::text)
Filter: ((family_id)::text > '5000'::text)
Rows Removed by Filter: 4444449
Heap Blocks: exact=32924
-> Bitmap Index Scan on idx_test1 (cost=0.00..82701.89 rows=4463128 width=0) (actual time=576.156..576.157 rows=4444449 loops=1)
Index Cond: ((application_id)::text < '50000'::text)
Planning Time: 0.100 ms
Execution Time: 1405.939 ms
(9 rows)
Bitmap SCAN是对查询条件中的每个索引构造一个bitmap串。bitmap串中,每个bit位对应一个heap page, heap page中有符合条件的行,该bit位就会被标记为1。
当所有索引列的位图创建完成,就对它们执行位与运算(BitmapAnd)。
注:下面的演示中为了方便,只演示了两个索引列的情况,另外不一定都是位与运算,还可能是位或运算(BitmapOR)。
比如:
+---------------------------------------------+
|100000000001000000010000000000000111100000000| bitmap 1
|000001000001000100010000000001000010000000010| bitmap 2
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
|000000000001000000010000000000000010000000000| Combined bitmap
+-----------+-------+--------------+----------+
| | |
v v v
Used to scan the heap only for matching pages:
+---------------------------------------------+
|___________X_______X______________X__________|
+---------------------------------------------+
————————————————
当所有索引列的位图创建完成,就对它们执行位与运算(BitmapAnd)或者还可能是位或运算(BitmapOR)。
位图堆扫描(Bitmap Heap Scan)然后寻找每一页的开头,并读取符合条件的页面。
对比index scan
index scan : 是对于给定的查询,先扫描一遍索引,从索引中找到符合要求的记录的指针,再定位到表中具体的page去取。
Bitmap scan:普通的index scan每次从索引中获取一个元组指针,并立即访问表中的元组,那么一个 PAGE 有可能被多次访问,而位图扫描一次性从索引中获取所有的元组指针,使用内存中的“位图”数据结构对它们进行排序,然后按物理元组位置顺序访问表元组。它的核心思想是单个page在扫描期间只访问一次。
影响bitmapscan效率的因素
从bitmap scan的原理上分析,
首先bitmap index的生成步骤相当于是对索引中获取元组指针,然后根据元组指针构建bitmap index ,bitmap index 里面就是按照元组的物理顺序排序的指针。
然后 bitmap heap scan 就需要根据bitmap index 的中元组顺序,顺序读取page。
所以这里面涉及两个性能点 1、bitmap index 中元组的排序,如果从索引获取的元组乱序就会导致排序过程会耗时。2,元组的顺序读取,如果表因为大量的update和delete 导致碎片化严重,再进行扫描的时候一次io扫描范围可能只能扫描到很少的有效page, 这样就需要更多的IO次数,导致性能下降。
下面我们做一个测试来验证以上猜想。
再原有的例子上我们可以看到同样的sql语句,新建的表和索引 执行10次sql的平均执行时间是:
次数 | 时间(ms) |
---|---|
1 | 1405.939 |
2 | 1371.683 |
3 | 1758.093 |
4 | 1347.242 |
5 | 1377.847 |
6 | 1410.497 |
7 | 1359.287 |
8 | 1399.621 |
9 | 1366.159 |
10 | 1366.151 |
平均 | 1416.2519 |
kingbase=# explain analyze select * from app_family2 where family_id>'5000' and application_id<'50000' ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family2 (cost=83319.69..224331.61 rows=2471187 width=27) (actual time=1366.118..1366.119 rows=0 loops=1)
Recheck Cond: ((application_id)::text < '50000'::text)
Filter: ((family_id)::text > '5000'::text)
Rows Removed by Filter: 4444449
Heap Blocks: exact=32924
-> Bitmap Index Scan on idx_test1 (cost=0.00..82701.89 rows=4463128 width=0) (actual time=541.293..541.293 rows=4444449 loops=1)
Index Cond: ((application_id)::text < '50000'::text)
Planning Time: 0.127 ms
Execution Time: 1366.151 ms
(9 rows)
模拟几次update 操作
declare
begin
for i in 1..100 loop
update app_family2 set family_number='te2' where application_id between i*11000 and i*12000 ;
commit;
end loop;
end;
/
同样的sql 执行10次可以看到执行时间普遍变长
1 | 1655.903 |
2 | 1588.9 |
3 | 1560.934 |
4 | 1565.523 |
5 | 1567.24 |
6 | 1589.487 |
7 | 1560.742 |
8 | 1572.691 |
9 | 1679.959 |
10 | 1695.155 |
平均 | 1603.6534 |
从执行计划对比可以看到bitmap index scan 和bitmap heap scan 阶段时间都变长了。
kingbase=# explain analyze select * from app_family2 where family_id>'5000' and application_id<'50000';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family2 (cost=228502.40..495304.62 rows=2475421 width=26) (actual time=1695.119..1695.121 rows=0 loops=1)
Recheck Cond: ((application_id)::text < '50000'::text)
Filter: ((family_id)::text > '5000'::text)
Rows Removed by Filter: 4444449
Heap Blocks: exact=32971
-> Bitmap Index Scan on idx_test1 (cost=0.00..227883.54 rows=4493748 width=0) (actual time=788.648..788.648 rows=4444449 loops=1)
Index Cond: ((application_id)::text < '50000'::text)
Planning Time: 0.128 ms
Execution Time: 1695.155 ms
(9 rows)
进行vacuum 然后对比时间可以看到时间并没有提升。
kingbase=# explain analyze select * from app_family2 where family_id>'5000' and application_id<'50000';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family2 (cost=228502.63..495305.30 rows=2475437 width=26) (actual time=1558.251..1558.253 rows=0 loops=1)
Recheck Cond: ((application_id)::text < '50000'::text)
Filter: ((family_id)::text > '5000'::text)
Rows Removed by Filter: 4444449
Heap Blocks: exact=32971
-> Bitmap Index Scan on idx_test1 (cost=0.00..227883.77 rows=4493778 width=0) (actual time=760.155..760.156 rows=4444449 loops=1)
Index Cond: ((application_id)::text < '50000'::text)
Planning Time: 0.097 ms
Execution Time: 1558.317 ms
(9 rows)
进行vacuum full 然后对比时间,可以看到执行时间基本和新建表时一样。
kingbase=# explain analyze select * from app_family2 where family_id>'5000' and application_id<'50000';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family2 (cost=83844.63..225243.96 rows=2474139 width=26) (actual time=1346.106..1346.107 rows=0 loops=1)
Recheck Cond: ((application_id)::text < '50000'::text)
Filter: ((family_id)::text > '5000'::text)
Rows Removed by Filter: 4444449
Heap Blocks: exact=32912
-> Bitmap Index Scan on idx_test1 (cost=0.00..83226.10 rows=4491422 width=0) (actual time=535.140..535.140 rows=4444449 loops=1)
Index Cond: ((application_id)::text < '50000'::text)
Planning Time: 0.104 ms
Execution Time: 1346.175 ms
(9 rows)
分析原因:进行vaccum full 之后整个物理元组都是顺序且密集,一次IO可以扫描更多的page。
总结
总结下来, bitmap scan 对中块的物理顺序是有依赖的。如果执行计划必须要执行bitmap scan 的前提下,再进行了大量update之后。只有vacuum full 才能提升bitmap 的查询效率。vacuum 并不能提升bitmap 的查询效率。