一次SQL慢查询的优化处理
背景
在开发完成,数据验证的过程中,需要对两个非常大的表进行query,运行了十几分钟都跑不完。如下:
select * from big_a a, big_b b
where a.some_id = b.some_id
and a.value_one = 'a_111'
and a.value_two = 'a_222'
and b.value_one = 'b_111'
and b.value_two = 'b_222'
假设两个表的数据量非别为M,N,那么查询时间相当于M*N。对两表进行全排列全表搜索,非常耗时。
优化
经同事指点,说可以基于日期加一些filter的条件,这样会快很多。因为该表根据日期做了partition,避免了全表搜索,加快了速度。如下:
select * from big_a a, big_b b
where a.some_id = b.some_id
and a.some_date = 20180101
and b.some_date = 20180101
and a.value_one = 'a_111'
and a.value_two = 'a_222'
and b.value_one = 'b_111'
and b.value_two = 'b_222'
对于某些日期数据量少,查询速度还可以,但是有些日期数据做了full load,量非常大,查询还是很慢。
思考了一下,假设某一天的两表数据量分为m,n,那么查询时间为m*n。因为m,n还是比较大的,所以执行时间还是很长。
解决的思路是对两个表进一步拆分。如下:
select * from
(select * from big_a a
where a.some_date = 20180101
and a.value_one = 'a_111'
and a.value_two = 'a_222')
tmpa
inner join
(select * from big_b b
where b.some_date = 20180101
and b.value_one = 'b_111'
and b.value_two = 'b_222')
tmpb
on tmpa.some_id = tmpb.some_id;
先分别对两表做查询,生成两个tmp表,假设其数据量为m', n',那么总体的执行时间为M+N+m'*n'。一般m', n’非常小,可以忽略不计,即总时间为M+N。
由相乘的数量级变为相加的数量级,速度快了很多。
验证结果
在调优之后,怎么判断调优之后的结果要优于之前的结果?最直接的方法是看运行时间,除此之外,还可以看语句执行的其他信息。如下:
1.直接看软件上显示的时间。
All Rows Fetched: 50 in 1.23 seconds
2.看执行计划。
explain plan for select * from big_big_table;
SELECT plan_table_output FROM TABLE(DBMS_XPLAN.DISPLAY('PLAN_TABLE'));
3.看储存语句运行情况数据的表。
v&sql
or v&sqlarea
"v&sql lists statistics on shared SQL area without the GROUP BY clause and contains one row for each child of the original SQL text entered.
Statistics displayed in V$SQL are normally updated at the end of query execution. However, for long running queries, they are updated every 5 seconds. This makes it easy to see the impact of long running SQL statements while they are still in progress."
-- Oracle
select
sql_text,
executions,
elapsed_time/1000 total_time,
elapsed_time/nvl(executions,1)/1000 avg_time,
parse_calls,
disk_reads,
buffer_gets,
u.username
from v$sql
left join all_users u
on parsing_user_id = u.user_id
where sql_text like '%select * from big_big_table where id = 12345;%'
order by avg_time desc;