在表连接查询的时候，如果select中包含了没有建立索引的列，可能对效率产生严重影响

假设有2个表TableA和TableB，各有2个columns-ID和Created_Time，其中ID是主键，TableB的Created_Time有索引，TableA的Created_Time没有索引。
两个表的ID有业务关系。
产品服务器上，TableA和TableB的数据量都在500万左右。
现在需要查询出条件为TableB的Created_Time的某个时间段内的数据。

情景1：原来的存储过程是这样的：

select a.AID, a.Created_Time

from TableA a with (nolock)

join TableB b on a.AID = b.BID

where b.Created_Time between @dtStartTime and @dtEndTime

在测试的时候，测试服务器的数据量在50万左右，没有发现效率问题，1秒钟就得到结果了。在部署到产品服务器上之后，发生了严重的效率问题，10分钟也没出结果。

情景2：后来将存储过程改为这样：

select BID into #tempTable

from TableB with (nolock) where Created_Time between @dtStartTime and @dtEndTime

select a.AID, a.Created_Time

from TableA a with (nolock)

join #tempTable b on a.AID = b.BID

truncate table #tempTable

drop table #tempTable

在产品服务器上2秒钟就得到结果了。

情景3：之后我又做了测试，如果select中没有a.Created_Time，10秒钟左右就可以得到结果。

select a.AID

from TableA a with (nolock)

join TableB b on a.AID = b.BID

where b.Created_Time between @dtStartTime and @dtEndTime

情景4：而如果select中有b.Created_Time（仅仅为了测试需要，并不是业务需求），不会影响效率，10秒钟左右就可以得到结果。

select a.AID, b.Created_Time

from TableA a with (nolock)

join TableB b on a.AID = b.BID

where b.Created_Time between @dtStartTime and @dtEndTime

之后我又试验了另外2种方式，本以为能够效率和情景2一样，但是结果却效率都和情景1一样，看来SQL Server自动优化之后，和情景1差不多。
情景5：

select a.AID, a.Created_Time

from TableA a with (nolock)

join (select BID from TableB with (nolock) where Created_Time between @dtStartTime and @dtEndTime ) b on a.AID = b.BID

情景6：

select a.AID, a.Created_Time

from TableA a with (nolock)

where a.AID in (select BID from TableB with (nolock) where Created_Time between @dtStartTime and @dtEndTime)

我之前的经验都是认为索引只对where子句或者表连接的on子句有影响，而在相同的条件子句的情况下，select中包括多少列，不会对效率产生数量级的区别。
而从上面这个现象发现，在表连接查询的时候，如果select中包含了没有建立索引的列，可能对效率产生严重影响。

从结果上看，情景3和情景1的效率差别很大，但是原理还不大清楚，期待高手解惑。

posted @ 2008-01-18 03:42 zhkn 阅读(4313) 评论(13) 收藏举报

刷新页面返回顶部

斯巴克斯

在表连接查询的时候，如果select中包含了没有建立索引的列，可能对效率产生严重影响

公告