由一条sql语句想到的子查询优化

一、问题定位过程

1、问题现象

点击系统中某个列表功能模块发现很慢，开启log日志发现使用到了如下的sql语句来统计符合要求的总记录数，以进行分页使用

select count(*) from (select 
            schedule_id, schedule_code ,resource_code, schedule_type, schedule.oper_id, schedule.oper_time, 
            start_date, end_date, start_time, end_time, img_id, video_id, display_time, 
            schedule_color, terrace_code, stb_types, district_codes, user_group_codes, 
            igroup_code, schedule_status, schedule_description, step_id, owner_id, aud.description, so.oper_name
        from schedule_record as schedule 
            left join auditing_desc_record as aud 
            on schedule.schedule_code = aud.code 
            and aud.is_last_auditing = 1
            left join system_oper as so
            on owner_id = so.oper_id
        where 1=1  and schedule_status = 7  
        order by schedule.schedule_code desc) myCount ;

2、explain分析

手动执行该sql发现竟然用了21.18秒，怀疑是未使用索引或者表数据量过大，于是用explain语句分析

explain
select count(*) from (select 
            schedule_id, schedule_code ,resource_code, schedule_type, schedule.oper_id, schedule.oper_time, 
            start_date, end_date, start_time, end_time, img_id, video_id, display_time, 
            schedule_color, terrace_code, stb_types, district_codes, user_group_codes, 
            igroup_code, schedule_status, schedule_description, step_id, owner_id, aud.description, so.oper_name
        from schedule_record as schedule 
            left join auditing_desc_record as aud 
            on schedule.schedule_code = aud.code 
            and aud.is_last_auditing = 1
            left join system_oper as so
            on owner_id = so.oper_id
        where 1=1  and schedule_status = 7  
        order by schedule.schedule_code desc) myCount ;

3、改写sql

当然，看到上图，我相信很容易看出来是没有加索引导致全表扫描（有3条type为ALL），查看索引发现确实如此，连接字段schedule.schedule_code和aud.code都没使用索引

show index from schedule_record;
show index from auditing_desc_record;

但是更成功引起我注意的是为什么明明用了子查询（内部查询）只扫描了1827和11265条，最后外部查询select count(*)却扫描了1827*11265=20581155条记录？怀疑是子查询的导致，于是决定改写sql，看看不用子查询的效果

select 
            count(schedule_code)
        from schedule_record as schedule 
            left join auditing_desc_record as aud 
            on schedule.schedule_code = aud.code 
            and aud.is_last_auditing = 1
            left join system_oper as so
            on owner_id = so.oper_id
        where 1=1  and schedule_status = 7  
        order by schedule.schedule_code desc;

那是因为没有添加索引才会有子查询效率低的问题吗，接下来添加索引再试下

4、添加索引

ALTER TABLE auditing_desc_record ADD INDEX index_code (code);
ALTER TABLE schedule_record ADD INDEX index_schedule_code (schedule_code);

再查询，发现发现不用子查询效率依然要比用了子查询效率高些

这样对比不难发现，在这种情况下，用子查询效率确实更低，因为这里每次子查询每次都需要建立临时表，它会把结果集都存到临时表，这样外部查询select count(*)又重新扫描一次临时表，导致用时更长，扫描效率更低

但仅由此得出子查询效率低似乎太过草莽了。为验证我的想法，于是网上搜集了一些资料来确认下。

二、更多关于子查询效率的问题

《高性能MySQL》，第4.4节“MySQL查询优化器的限制”4.4.1小节“关联子查询”正好讲到这个问题。

MySQL有时优化子查询很差，特别是在WHERE从句中的IN()子查询。像上面我碰到的情况，其实我的想法是MySQL会把

select * from abc_number_prop where number_id in (select number_id from abc_number_phone where phone = '82306839');

变成下面的样子

select * from abc_number_prop where number_id in (8585, 10720, 148644, 151307, 170691, 221897);

但不幸的是，实际情况正好相反。MySQL试图让它和外面的表产生联系来“帮助”优化查询，它认为下面的exists形式更有效率

select * from abc_number_prop where exists (select * from abc_number_phone where phone = '82306839' and number_id = abc_number_prop.number_id);

由此看，在这两种场合缺失不太适合使用子查询，当然文中说到：但是总是认为子查询效率很差也是不对的，有时候可能子查询更好些。怎么确定这个事情呢，应该经过评测来决定（执行查询、用desc/explain等来看）。

另：上面sql语句案例中，order by 非常耗时，非必需的，可以去掉。

参考文章：https://www.cnblogs.com/zishengY/p/9070725.html

posted @ 2019-02-01 14:21 字节悦动阅读(283) 评论(0) 编辑收藏举报

刷新页面返回顶部

北冥有鱼