mysql千万级内模糊查询的实现方式

昨晚辗转反侧，灵光闪现，突然想到了覆盖索引+主动回表的方式，管你几个字段，我只要一个普通索引。

所以千万级大表的like模糊查询能不能做？

能

废话不多说，那就搞一搞。

建表

create table emp
(
    id       int unsigned auto_increment
        primary key,
    empno    mediumint unsigned default 0  not null,
    ename    varchar(20)        default '' not null,
    job      varchar(9)         default '' not null,
    mgr      mediumint unsigned default 0  not null,
    hiredate date                          not null,
    sal      decimal(7, 2)                 not null,
    comm     decimal(7, 2)                 not null,
    deptno   mediumint unsigned default 0  not null
)
    charset = utf8;

导入千万级数据

方法在这里

bigdata> select count(*) from emp
[2021-08-19 11:08:25] 1 row retrieved starting from 1 in 2 s 900 ms (execution: 2 s 874 ms, fetching: 26 ms)

未建索引下的模糊查询

bigdata> select ename, empno, job from emp where ename like '%S%'
[2021-08-19 11:14:25] 2,765,363 rows retrieved starting from 1 in 9 s 360 ms (execution: 8 ms, fetching: 9 s 352 ms)

仅右模糊的就不考虑了，都知道是走索引的。

上法宝，覆盖索引

不幸的是，直接卡在了创建索引这一步，因为表已经千万数据了，直接建索引机器就卡死了，顺便搜索了一下解决方案，总结的很好，但是我不用😄我直接truncate删除表索引和数据

检查索引/表是否删除干净

use information_schema;
# 查看指定库的指定表的大小
select concat(round(sum(DATA_LENGTH/1024/1024),2),'MB') as data  from TABLES where table_schema='bigdata' and table_name='emp';
# 查看指定库的指定表的索引大小
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), 'MB') AS 'Total Index Size' FROM TABLES  WHERE table_schema = 'bigdata' and table_name='emp';

创建索引

create index emp_ename_idx on emp (ename);

再次导入数据

Call insert_emp10000(0,10000000);
[2021-08-19 14:18:53] completed in 2 h 22 m 37 s 90 ms

时间有够长的。。

尝试一下有索引的like模糊

bigdata> select ename from emp where ename like '%S%'
[2021-08-19 14:37:40] 2,093,321 rows retrieved starting from 1 in 5 s 128 ms (execution: 34 ms, fetching: 5 s 94 ms)

覆盖索引，性能提升

可以用desc/explain确认一下走了索引，原理不说了吧，覆盖索引

对比上面可以发现，使用覆盖索引后性能提升了一倍

但你可能说：就这？就这？这有卵用，谁查询时也不可能只查一个字段呀，但是把要查询的字段都加上索引又不现实，毕竟索引也需要空间存储的，给要返回的字段都加上索引，可能光索引就比表数据大N倍了。

那咋整？

实不相瞒，这就是我昨晚思考到的，以至于兴奋得夜不能寐。

关键在于这样：

bigdata> select id, ename from emp where ename like '%S%'
[2021-08-19 14:48:11] 2,093,321 rows retrieved starting from 1 in 4 s 685 ms (execution: 9 ms, fetching: 4 s 676 ms)

没错，就多了个id（或者直接返回id也是可以的，不不不，理论上应该仅返回id，可避免mysql回表）

id有什么用？id能精确查找数据鸭！（有没有觉得很像外部的“主动回表”）

就像这样，二次查询

bigdata> select id, ename... from emp where id in (497723, 670849, 1371884, 1934742, 1960444, 2165983)
[2021-08-19 15:45:23] 6 rows retrieved starting from 1 in 78 ms (execution: 23 ms, fetching: 55 ms)

这速度不就有了，数据也有了。

基于此，还可以实现内存分页，且基本不用担心内存溢出问题

再搞个缓存，性能又能进一步提升，不过代价也很明显，复杂度进一步提升

posted @ 2021-08-19 16:12 夜旦阅读(2383) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

夜旦

生如蝼蚁当有鸿鹄之志。