PostgreSQL模糊查询
1、创建测试表、插入数据
postgres=# create table test_like(id int, name varchar);
CREATE TABLE
postgres=# insert into test_like select generate_series(1,1000000),md5(random()::varchar);
INSERT 0 1000000
postgres=# analyze test_like ;
ANALYZE
2、未创建索引的执行计划
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
--------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=33 width=37)
Filter: ((name)::text ~~ '111%'::text)
(4 rows)
3、普通btree索引不走索引
postgres=# create index idx_test_like_name on test_like (name);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
--------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=33 width=37)
Filter: ((name)::text ~~ '111%'::text)
(4 rows)
4、前模糊匹配查询
1)collate "C"
postgres=# create index idx_test_like_name_c on test_like (name collate "C");
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
--------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=2.99..108.88 rows=33 width=37)
Filter: ((name)::text ~~ '111%'::text)
-> Bitmap Index Scan on idx_test_like_name_c (cost=0.00..2.98 rows=56 width=0)
Index Cond: (((name)::text >= '111'::text) AND ((name)::text < '112'::text))
(6 rows)
2)操作符类varchar_pattern_ops方式
postgres=# create index idx_test_like_name_ops on test_like (name varchar_pattern_ops);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=2.97..105.20 rows=33 width=37)
Filter: ((name)::text ~~ '111%'::text)
-> Bitmap Index Scan on idx_test_like_name_ops (cost=0.00..2.96 rows=54 width=0)
Index Cond: (((name)::text ~>=~ '111'::text) AND ((name)::text ~<~ '112'::text))
(6 rows)
但后匹配、中间匹配不支持:
postgres=# explain select * from test_like where name like '%111%1' ;
QUERY PLAN
--------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=33 width=37)
Filter: ((name)::text ~~ '%111%1'::text)
(4 rows)
postgres=# explain select * from test_like where name like '%111' ;
QUERY PLAN
--------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=33 width=37)
Filter: ((name)::text ~~ '%111'::text)
(4 rows)
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=2.98..107.04 rows=33 width=37)
Filter: ((name)::text ~~ '111%'::text)
-> Bitmap Index Scan on idx_test_like_name_ops (cost=0.00..2.97 rows=55 width=0)
Index Cond: (((name)::text ~>=~ '111'::text) AND ((name)::text ~<~ '112'::text))
(6 rows)
postgres=# explain select * from test_like where name like '%111%' ;
QUERY PLAN
--------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=33 width=37)
Filter: ((name)::text ~~ '%111%'::text)
(4 rows)
5、方式二:使用pg_trim插件
The pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.
A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.
Note: pg_trgm ignores non-word characters (non-alphanumerics) when extracting trigrams from a string. Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string. For example, the set of trigrams in the string "cat" is " c", " ca", "cat", and "at ". The set of trigrams in the string "foo|bar" is " f", " fo", "foo", "oo ", " b", " ba", "bar", and "ar ".
postgres=# create extension pg_trgm;
CREATE EXTENSION
postgres=# create index idx_test_like_name_trgm on test_like using gin (name gin_trgm_ops);
CREATE INDEX
postgres=# analyze test_like ;
ANALYZE
postgres=# explain select * from test_like where name like '111%' ;
QUERY PLAN
----------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=18.26..81.59 rows=33 width=37)
Recheck Cond: ((name)::text ~~ '111%'::text)
-> Bitmap Index Scan on idx_test_like_name_trgm (cost=0.00..18.25 rows=33 width=0)
Index Cond: ((name)::text ~~ '111%'::text)
(6 rows)
postgres=# explain select * from test_like where name like '%111%' ;
QUERY PLAN
------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=62.16..3816.37 rows=6731 width=37)
Recheck Cond: ((name)::text ~~ '%111%'::text)
-> Bitmap Index Scan on idx_test_like_name_trgm (cost=0.00..60.48 rows=6731 width=0)
Index Cond: ((name)::text ~~ '%111%'::text)
(6 rows)
postgres=# explain select * from test_like where name like '%111' ;
QUERY PLAN
----------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=12.26..75.59 rows=33 width=37)
Recheck Cond: ((name)::text ~~ '%111'::text)
-> Bitmap Index Scan on idx_test_like_name_trgm (cost=0.00..12.25 rows=33 width=0)
Index Cond: ((name)::text ~~ '%111'::text)
(6 rows)
postgres=# explain select * from test_like where name like '%111%1' ;
QUERY PLAN
---------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=6.26..69.59 rows=33 width=37)
Recheck Cond: ((name)::text ~~ '%111%1'::text)
-> Bitmap Index Scan on idx_test_like_name_trgm (cost=0.00..6.25 rows=33 width=0)
Index Cond: ((name)::text ~~ '%111%1'::text)
(6 rows)
缺陷 :就像上面描述的那样,pg_trgm把字符串切成多个三元组,然后做匹配,当匹配关键词少于三个的时候,只有前缀匹配会走索引:
postgres=# explain select * from test_like where name like '%11%' ;
QUERY PLAN
-----------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Seq Scan on test_like (cost=0.00..7950.56 rows=40384 width=37)
Filter: ((name)::text ~~ '%11%'::text)
(4 rows)
postgres=# explain select * from test_like where name like '%111%' ;
QUERY PLAN
------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: dn001, dn002, dn003
-> Bitmap Heap Scan on test_like (cost=34.08..2908.28 rows=3365 width=37)
Recheck Cond: ((name)::text ~~ '%111%'::text)
-> Bitmap Index Scan on idx_test_like_name_trgm (cost=0.00..33.24 rows=3365 width=0)
Index Cond: ((name)::text ~~ '%111%'::text)
(6 rows)