GIN and RUM 索引性能比较
gin索引字段entry构造的TREE,在末端posting tree|list 里面存储的是entry对应的行号. 别无其他信息。rum索引,与GIN类似,但是在posting list|tree的每一个ctid(itempoint)后面会追加一些属性值。因此,有些场景,使用rum 索引,性能会优很多。以下举个例子比较下。
Note: KingbaseES v8r6c5b0023 版本,附带了rum插件。
一、构造数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | create table t1 as select name ,short_desc from pg_settings; alter table t1 add column tsv tsvector; update t1 set tsv=to_tsvector(short_desc); --number for 紧邻的只有一条 test=# select short_desc from t1 where to_tsvector(short_desc) @@ to_tsquery( 'number <-> for' ); short_desc ------------------------- Top SQL number for kddm (1 row) --同时包含number and for 的有7条 test=# select short_desc from t1 where to_tsvector(short_desc) @@ to_tsquery( 'number & for' ); short_desc ------------------------------------------------------------------------------- Sets the number of digits displayed for floating-point values . Sets the maximum number of simultaneously open files for each server process. Sets the number of connection slots reserved for superusers. Top SQL number for kddm Sets the number of disk-page buffers in shared memory for WAL. Sets the number of WAL files held for standby servers. Sets the number of locks used for concurrent xlog insertions. (7 rows ) |
二、例子1:距离查询
1、gin 索引
创建gin 索引:
1 | create index ind_t1_gin on t1 using gin(tsv); |
查看gin索引执行计划:通过索引返回7条记录,也就是索引没有包含位置的信息,需要访问表数据。
1 2 3 4 5 6 7 8 9 10 11 12 | test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery( 'number <-> for' ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=12.32..30.68 rows =9 width=55) (actual time =0.111..0.195 rows =1 loops=1) Recheck Cond: (tsv @@ to_tsquery( 'number <-> for' ::text)) Rows Removed by Index Recheck: 6 Heap Blocks: exact=4 -> Bitmap Index Scan on ind_t1_gin (cost=0.00..12.32 rows =9 width=0) (actual time =0.058..0.058 rows =7 loops=1) Index Cond: (tsv @@ to_tsquery( 'number <-> for' ::text)) Planning Time : 0.199 ms Execution Time : 0.227 ms (8 rows ) |
2、rum 索引
创建索引:
1 2 3 4 | test=# create extension rum; CREATE EXTENSION test=# create index ind_t1_rum on t1 using rum(tsv); CREATE INDEX |
查看执行计划:可以看到索引返回的记录就一条,也就是索引包含有位置信息。
1 2 3 4 5 6 7 8 9 10 11 | test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery( 'number <-> for' ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on t1 (cost=12.32..30.68 rows =9 width=55) (actual time =0.041..0.042 rows =1 loops=1) Recheck Cond: (tsv @@ to_tsquery( 'number <-> for' ::text)) Heap Blocks: exact=1 -> Bitmap Index Scan on ind_t1_rum (cost=0.00..12.32 rows =9 width=0) (actual time =0.038..0.039 rows =1 loops=1) Index Cond: (tsv @@ to_tsquery( 'number <-> for' ::text)) Planning Time : 0.297 ms Execution Time : 0.068 ms (7 rows ) |
三、例子2:相关性排序
1、gin 索引
需要所有符合的数据,再进行排序
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | test=# create index ind_t1_gin on t1 using gin(tsv); CREATE INDEX test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery( 'number & for' ) order by tsv <=> to_tsquery( 'number & for' ) limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=33.00..33.00 rows =1 width=59) (actual time =0.134..0.135 rows =1 loops=1) -> Sort (cost=33.00..33.02 rows =9 width=59) (actual time =0.133..0.134 rows =1 loops=1) Sort Key : ((tsv <=> to_tsquery( 'number & for' ::text))) Sort Method: top -N heapsort Memory: 25kB -> Bitmap Heap Scan on t1 (cost=12.32..32.95 rows =9 width=59) (actual time =0.102..0.125 rows =7 loops=1) Recheck Cond: (tsv @@ to_tsquery( 'number & for' ::text)) Heap Blocks: exact=4 -> Bitmap Index Scan on ind_t1_gin (cost=0.00..12.32 rows =9 width=0) (actual time =0.084..0.084 rows =7 loops=1) Index Cond: (tsv @@ to_tsquery( 'number & for' ::text)) Planning Time : 0.237 ms Execution Time : 0.177 ms (11 rows ) |
2、rum 索引
1 2 3 4 5 6 7 8 9 10 | test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery( 'number & for' ) order by tsv <=> to_tsquery( 'number & for' ) limit 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Limit (cost=8.25..12.52 rows =1 width=59) (actual time =0.077..0.077 rows =1 loops=1) -> Index Scan using ind_t1_rum on t1 (cost=8.25..46.68 rows =9 width=59) (actual time =0.076..0.076 rows =1 loops=1) Index Cond: (tsv @@ to_tsquery( 'number & for' ::text)) Order By : (tsv <=> to_tsquery( 'number & for' ::text)) Planning Time : 0.236 ms Execution Time : 0.101 ms (6 rows ) |
KINGBASE研究院
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· AI与.NET技术实操系列(二):开始使用ML.NET
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 园子的第一款AI主题卫衣上架——"HELLO! HOW CAN I ASSIST YOU TODAY
· Docker 太简单,K8s 太复杂?w7panel 让容器管理更轻松!