MySQL Execution Plan--IN子查询包含超多值引发的查询异常
问题描述
版本:MySQL 5.7.24
SQL语句:
SELECT wave_no, SUM(IF(picking_qty IS NULL, 0, picking_qty)) AS PICKED_QTY, SUM(IF(differ_qty IS NULL, 0, differ_qty)) AS PICKED_DIFFER_QTY, SUM(IF(relocate_qty IS NULL, 0, relocate_qty)) AS PICKED_RELOCATE_QTY FROM picking_locate_d WHERE yn = 0 AND wave_no IN ( 'BC76361213164811', 'BC76361213164810', ... 'BC76361213158692' ) AND org_No = '661' AND distribute_No = '763' AND warehouse_No = '612' GROUP BY wave_no;
该SQL在慢日志中记录执行信息为:
# Query_time: 558.604238 Lock_time: 0.070967 Rows_sent: 16094 Rows_examined: 46389900
表picking_locate_d上索引如下:
表picking_locate_d上索引如下: PRIMARY KEY (`id`,`warehouse_no`,`org_no`,`distribute_no`), KEY `FK_locate_d_REFERENCE_task_m` (`task_page_no`) USING BTREE, KEY `index_outbound_no` (`outbound_no`), KEY `idx_wave_no` (`wave_no`), KEY `idx_update_time` (`update_time`), KEY `ix_OUTBOUND_NO` (`outbound_no`), KEY `idx_occupy_uuid` (`occupy_uuid`), KEY `idx_opt_status` (`opt_status`)
表picking_locate_d上数据量为:
show table status like 'picking_locate_d' \G *************************** 1. row *************************** Name: picking_locate_d Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 41628604 Avg_row_length: 812 Data_length: 33825914880 Max_data_length: 0 Index_length: 17470521344 Data_free: 154140672 Auto_increment: 48876000 Create_time: 2019-04-18 00:15:20 Update_time: 2019-04-22 19:29:30 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment:
当IN查询内部的值数量小于等于14238时,查询时间0.31秒,执行计划为:
+----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+------------------------------------+ | 1 | SIMPLE | picking_locate_d | NULL | range | idx_wave_no | idx_wave_no | 153 | NULL | 14238 | 0.01 | Using index condition; Using where | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+------------------------------------+
当IN查询内部的值数量大于等于14239时,查询时间超过558秒
+----+-------------+------------------+------------+-------+---------------+-------------+---------+------+----------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+----------+----------+-------------+ | 1 | SIMPLE | picking_locate_d | NULL | index | idx_wave_no | idx_wave_no | 153 | NULL | 41612750 | 0.01 | Using where | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+----------+----------+-------------+
通过对比执行计划发现,两者出现性能问题的主要原因在于:
1、 前者使用INDEX SEEK查找特定键值,预估影响行数为14238。
2、 后者使用INDEX SCAN扫描全索引数据,预估影响行数为41612750。
将IN查询内部的值调整为16000个,通过调整SQL语句进行测试:
1、 去除WHERE语句中yn = 0条件,查询使用INDEX SCAN,预估影响行数为41612750
2、 去除WHERE语句中org_No/distribute_No/warehouse_No三列其中任意一列或多列,查询使用INDEX SEEK, 预估影响行数为16000
3、 修改WHERE语句中org_No/distribute_No/warehouse_No三列其中任意一列或多列为函数操作,如将org_No = '661'修改为CONCAT(org_No,’’) = '661', 查询使用INDEX SEEK, 预估影响行数为16000。
4、 修改GROUP BY语句中wave_no 为CONCAT(wave_no,’’), 查询使用INDEX SEEK, 预估影响行数为16000。
5、 关闭MySQL ICP(index condition pushdown)特性,查询使用INDEX SCAN,预估影响行数为41612750
原因分析:
1、 由于主键索引为 (`id`,`warehouse_no`,`org_no`,`distribute_no`),而索引idx_wave_no定义为(`wave_no`),因此索引idx_wave_no实际包含列为:(`wave_no`,`id`,`warehouse_no`,`org_no`,`distribute_no`),因此推测调整WHERE语句中org_No/distribute_No/warehouse_No三列其中任意一列或多列会影响查询预估。
2、 由于修改GROUP BY语句,使得查询的GROUP BY部分无法通过索引来避免排序分钟,导致查询会先完成WHERE条件过滤后使用临时表和文件排序,GROUP BY语句不会影响WHERE语句的预估,使得查询优化器更偏向于使用INDEX SEEK。
优化建议:
建议将SQL语句调整为:
SELECT CONCAT(wave_no,'') AS wave_no, SUM(IF(picking_qty IS NULL, 0, picking_qty)) AS PICKED_QTY, SUM(IF(differ_qty IS NULL, 0, differ_qty)) AS PICKED_DIFFER_QTY, SUM(IF(relocate_qty IS NULL, 0, relocate_qty)) AS PICKED_RELOCATE_QTY FROM picking_locate_d WHERE yn = 0 AND wave_no IN ( 'BC76361213164811', 'BC76361213164810', ... 'BC76361213158692' ) AND org_No = '661' AND distribute_No = '763' AND warehouse_No = '612' GROUP BY CONCAT(wave_no,'');
调整后执行时间为0.47秒,执行计划为:
+----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------------------+ | 1 | SIMPLE | picking_locate_d | NULL | range | idx_wave_no | idx_wave_no | 153 | NULL | 16000 | 0.10 | Using index condition; Using where; Using temporary; Using filesort | +----+-------------+------------------+------------+-------+---------------+-------------+---------+------+-------+----------+---------------------------------------------------------------------+
调整后的执行计划中增加Using temporary和Using filesort,导致执行时间0.47秒略高于之前的0.35秒,但会使得查询使用INDEX SEEK概率更高,有利于系统稳定。
其他优化建议:
1、 建议控制IN查询中值数量,避免传入过多值导致查询预估异常。
2、 建议主键尽量使用单列索引,使用多列符合索引为主键会影响非聚集索引的索引长度和查询预估准确率。