kingbase ES like 语句优化

KingbaseES 产品支持基于 Like 表达式的通配符模式匹配,通过合理的应用索引,可以提高查询性能。

like子句分以下四种情况

1、精确匹配型 like 'values'

2、右%号匹配like 'values%'

3、左%号匹配like '%values'

4、双%号匹配like '%values%'

针对这四种情况优化使用的索引也不一样

精确匹配型 like 'values'

创建测试表

drop table app_family;
CREATE TABLE app_family (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char),
CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));
insert into app_family select generate_series(1,1000000),generate_series(1,1000000),'aaaa','aaa','bbb' from dual ;

insert into app_family values('88888888','88888888','liketest','likefulltest','likerigthtest');

不加索引查看执行计划,

可以看到走了全表扫描。

select * from app_family where family_number like 'liketest';

kingbase=# explain analyze select * from app_family where family_number like 'liketest';
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13606.44 rows=1 width=25) (actual time=102.347..104.901 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on app_family  (cost=0.00..12606.34 rows=1 width=25) (actual time=96.376..96.377 rows=0 loops=3)
         Filter: ((family_number)::text ~~ 'liketest'::text)
         Rows Removed by Filter: 333333
 Planning Time: 0.120 ms
 Execution Time: 104.928 ms
(8 rows)

创建btree索引对比执行计划,

通过btree索引like 精准匹配走了索引用时更少。

kingbase=# create index idx_like_test on app_family(family_number);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liketest';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_like_test on app_family  (cost=0.42..4.45 rows=1 width=25) (actual time=0.029..0.030 rows=1 loops=1)
   Index Cond: ((family_number)::text = 'liketest'::text)
   Filter: ((family_number)::text ~~ 'liketest'::text)
 Planning Time: 0.297 ms
 Execution Time: 0.045 ms
(5 rows)

kingbase=#

右%号匹配like 'values%'

在之前实验的基础上,看一下右%查询是否会使用btree索引,可以看到并没有走索引,而是通过全表扫描

kingbase=# explain analyze select * from app_family where family_number like 'liket%';
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13606.44 rows=1 width=25) (actual time=87.354..88.903 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on app_family  (cost=0.00..12606.34 rows=1 width=25) (actual time=83.612..83.613 rows=0 loops=3)
         Filter: ((family_number)::text ~~ 'liket%'::text)
         Rows Removed by Filter: 333333
 Planning Time: 0.080 ms
 Execution Time: 88.920 ms
(8 rows)

kingbase=#

为了解决这个问题,我们可以通过以下方式来创建索引。

创建索引使用“C”collate

通过在创建索引时指定collate "C" 可以看到like 重新走了索引,效率大大提升

kingbase=# drop index idx_like_test;
DROP INDEX
kingbase=# create index idx_like_test on app_family(family_number collate "C");
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liket%';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_like_test on app_family  (cost=0.42..8.45 rows=1 width=25) (actual time=0.023..0.024 rows=1 loops=1)
   Index Cond: (((family_number)::text >= 'liket'::text) AND ((family_number)::text < 'likeu'::text))
   Filter: ((family_number)::text ~~ 'liket%'::text)
 Planning Time: 0.292 ms
 Execution Time: 0.036 ms
(5 rows)

kingbase=#

指定操作符

kingbase=# drop index idx_like_test;
DROP INDEX
kingbase=# create index idx_like_test_ops on app_family(family_number text_pattern_ops);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like 'liket%';
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_like_test_ops on app_family  (cost=0.42..8.45 rows=1 width=25) (actual time=0.024..0.025 rows=1 loops=1)
   Index Cond: (((family_number)::text ~>=~ 'liket'::text) AND ((family_number)::text ~<~ 'likeu'::text))
   Filter: ((family_number)::text ~~ 'liket%'::text)
 Planning Time: 0.306 ms
 Execution Time: 0.040 ms
(5 rows)

可以看到通过指定text_pattern_ops 操作符 like 也是走了索引,执行效率和“C”collate一样快

当然操作符还可以根据不同的类型进行区分,可以是text_pattern_ops、varchar_pattern_ops、bpchar_pattern_ops

左%号匹配like '%values'和双%号匹配like '%values%'

这两种like 查询放在一起是因为这两种情况的like查询btree索引已经使用 无论采用上面的那种方式都不会走索引。

这时候我们据需要引入gin索引。

kingbase ES 创建gin索引的示例如下:

kingbase=# create extension sys_trgm;
CREATE EXTENSION
kingbase=# create index idx_like_test_gin on app_family using gin(family_number gin_trgm_ops);
CREATE INDEX
kingbase=# explain analyze select * from app_family where family_number like '%ikete%';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family  (cost=500.00..504.01 rows=1 width=25) (actual time=0.057..0.058 rows=1 loops=1)
   Recheck Cond: ((family_number)::text ~~ '%ikete%'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_like_test_gin  (cost=0.00..500.00 rows=1 width=0) (actual time=0.052..0.052 rows=1 loops=1)
         Index Cond: ((family_number)::text ~~ '%ikete%'::text)
 Planning Time: 0.142 ms
 Execution Time: 0.075 ms
(7 rows)

kingbase=# explain analyze select * from app_family where family_number like '%iketest';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on app_family  (cost=1000.00..1004.01 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=1)
   Recheck Cond: ((family_number)::text ~~ '%iketest'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_like_test_gin  (cost=0.00..1000.00 rows=1 width=0) (actual time=0.010..0.010 rows=1 loops=1)
         Index Cond: ((family_number)::text ~~ '%iketest'::text)
 Planning Time: 0.080 ms
 Execution Time: 0.036 ms
(7 rows)

可以看到通过使用gin索引无论是like '%values%' 还是like '%values',都走了索引,执行效率也明显提升。

通过以上几个实验可以看到

Btree 索引应用于 like 模糊查询时,只能做前匹配(a like‘abc%’)或者精确匹配(a like ‘abc’),不能做 后匹配(a like ‘%abc’)和中间匹配(a like ‘%abc%’),这是由 btree 索引只能做大于、大于等于、等于、小 于、小于等于等操作决定的。 从执行计划中也可以看出,btree 针对like 还是将like 查询转换成了>= 和<= 查询。

posted @ 2023-10-19 19:49  KINGBASE研究院  阅读(60)  评论(0编辑  收藏  举报