kingbase ES like 语句优化
KingbaseES 产品支持基于 Like 表达式的通配符模式匹配,通过合理的应用索引,可以提高查询性能。
like子句分以下四种情况
1、精确匹配型 like 'values'
2、右%号匹配like 'values%'
3、左%号匹配like '%values'
4、双%号匹配like '%values%'
针对这四种情况优化使用的索引也不一样
精确匹配型 like 'values'
创建测试表
drop table app_family;
CREATE TABLE app_family (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char),
CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));
insert into app_family select generate_series(1,1000000),generate_series(1,1000000),'aaaa','aaa','bbb' from dual ;
insert into app_family values('88888888','88888888','liketest','likefulltest','likerigthtest');
不加索引查看执行计划,
可以看到走了全表扫描。
select * from app_family where family_number like 'liketest';
kingbase=# explain analyze select * from app_family where family_number like 'liketest';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..13606.44 rows=1 width=25) (actual time=102.347..104.901 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on app_family (cost=0.00..12606.34 rows=1 width=25) (actual time=96.376..96.377 rows=0 loops=3)
Filter: ((family_number)::text ~~ 'liketest'::text)
Rows Removed by Filter: 333333
Planning Time: 0.120 ms
Execution Time: 104.928 ms
(8 rows)
创建btree索引对比执行计划,
通过btree索引like 精准匹配走了索引用时更少。
kingbase=# create index idx_like_test on app_family(family_number);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liketest';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_like_test on app_family (cost=0.42..4.45 rows=1 width=25) (actual time=0.029..0.030 rows=1 loops=1)
Index Cond: ((family_number)::text = 'liketest'::text)
Filter: ((family_number)::text ~~ 'liketest'::text)
Planning Time: 0.297 ms
Execution Time: 0.045 ms
(5 rows)
kingbase=#
右%号匹配like 'values%'
在之前实验的基础上,看一下右%查询是否会使用btree索引,可以看到并没有走索引,而是通过全表扫描
kingbase=# explain analyze select * from app_family where family_number like 'liket%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..13606.44 rows=1 width=25) (actual time=87.354..88.903 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on app_family (cost=0.00..12606.34 rows=1 width=25) (actual time=83.612..83.613 rows=0 loops=3)
Filter: ((family_number)::text ~~ 'liket%'::text)
Rows Removed by Filter: 333333
Planning Time: 0.080 ms
Execution Time: 88.920 ms
(8 rows)
kingbase=#
为了解决这个问题,我们可以通过以下方式来创建索引。
创建索引使用“C”collate
通过在创建索引时指定collate "C" 可以看到like 重新走了索引,效率大大提升
kingbase=# drop index idx_like_test;
DROP INDEX
kingbase=# create index idx_like_test on app_family(family_number collate "C");
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liket%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_like_test on app_family (cost=0.42..8.45 rows=1 width=25) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (((family_number)::text >= 'liket'::text) AND ((family_number)::text < 'likeu'::text))
Filter: ((family_number)::text ~~ 'liket%'::text)
Planning Time: 0.292 ms
Execution Time: 0.036 ms
(5 rows)
kingbase=#
指定操作符
kingbase=# drop index idx_like_test;
DROP INDEX
kingbase=# create index idx_like_test_ops on app_family(family_number text_pattern_ops);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liket%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_like_test_ops on app_family (cost=0.42..8.45 rows=1 width=25) (actual time=0.024..0.025 rows=1 loops=1)
Index Cond: (((family_number)::text ~>=~ 'liket'::text) AND ((family_number)::text ~<~ 'likeu'::text))
Filter: ((family_number)::text ~~ 'liket%'::text)
Planning Time: 0.306 ms
Execution Time: 0.040 ms
(5 rows)
可以看到通过指定text_pattern_ops 操作符 like 也是走了索引,执行效率和“C”collate一样快
当然操作符还可以根据不同的类型进行区分,可以是text_pattern_ops、varchar_pattern_ops、bpchar_pattern_ops
左%号匹配like '%values'和双%号匹配like '%values%'
这两种like 查询放在一起是因为这两种情况的like查询btree索引已经使用 无论采用上面的那种方式都不会走索引。
这时候我们据需要引入gin索引。
kingbase ES 创建gin索引的示例如下:
kingbase=# create extension sys_trgm;
CREATE EXTENSION
kingbase=# create index idx_like_test_gin on app_family using gin(family_number gin_trgm_ops);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like '%ikete%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family (cost=500.00..504.01 rows=1 width=25) (actual time=0.057..0.058 rows=1 loops=1)
Recheck Cond: ((family_number)::text ~~ '%ikete%'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_like_test_gin (cost=0.00..500.00 rows=1 width=0) (actual time=0.052..0.052 rows=1 loops=1)
Index Cond: ((family_number)::text ~~ '%ikete%'::text)
Planning Time: 0.142 ms
Execution Time: 0.075 ms
(7 rows)
kingbase=# explain analyze select * from app_family where family_number like '%iketest';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on app_family (cost=1000.00..1004.01 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=1)
Recheck Cond: ((family_number)::text ~~ '%iketest'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_like_test_gin (cost=0.00..1000.00 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: ((family_number)::text ~~ '%iketest'::text)
Planning Time: 0.080 ms
Execution Time: 0.036 ms
(7 rows)
可以看到通过使用gin索引无论是like '%values%' 还是like '%values',都走了索引,执行效率也明显提升。
通过以上几个实验可以看到
Btree 索引应用于 like 模糊查询时,只能做前匹配(a like‘abc%’)或者精确匹配(a like ‘abc’),不能做 后匹配(a like ‘%abc’)和中间匹配(a like ‘%abc%’),这是由 btree 索引只能做大于、大于等于、等于、小 于、小于等于等操作决定的。 从执行计划中也可以看出,btree 针对like 还是将like 查询转换成了>= 和<= 查询。