表关联键上创建索引的重要性

     很久没有写SQL相关的文章了,主要是现在技术部分工比以前明确了。网站部门并不自己写SQL查询数据,数据有其它部门提供服务。但并不是所有情况都是这样,有些项目由于之前没有管理,所以只能自己完成。在这次写的一个SQL查询中,体会到了在做join联接时,关联的键需要创建索引的重要性。
    
      说明:
         1:free_room,freeroom这两个表数据量都不大,小于5000行。
         2:room_type_num,这个表有十多万数据,room_type_id没有创建索引。
    

      查询语句:
       

代码
FROM      free_room f WITH ( NOLOCK )
                            LEFT JOIN freeroom fr WITH ( NOLOCK ) ON f.id 
= fr.free_room_id
                            INNER JOIN room_type_num r WITH ( NOLOCK ) ON r.room_type_id 
= f.room_type_id

 

     

      执行以上结果,发现在关联room_type_num表时,系统选择了hash join,最优的情况绝不应该是hash join,因为不太符合hash join的要求。 我之前的文章简单说明了loop join以及hash join的概念,先贴出来看下:


      第一种算法:NESTED LOOP:

                      定义: 对于被连接的数据子集较小的情况,嵌套循环连接是个较好的选择。在嵌套循环中,内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行,因此整个查询返回的结果集不能太大(大于1 万不适合),要把返回子集较小表的作为外表。
                                                   
      第二种算法:HASH JOIN :


                     定义: 散列连接是做大数据集连接时的常用方式,优化器使用两个表中较小的表(或数据源)利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。这种方式适用于较小的表完全可以放于内存中的情况,这样总成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器会将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段,此时要有较大的临时段从而尽量提高I/O 的性能。


       dba建议在room_type_id创建索引,然后在SQL脚本中把inner join 改写成 inner loop join room_type_num。修改后的执行效率比较如下:

 

 

        问题:我在测试后,发现room_type_id创建索引后,SQL查询优化器就不再采用hash join room_type_num了,自动选择了最优的loop join。在SQL脚本中人为的写优化提示,并不需要特别指出,只要我们索引创建到位,优化器也会配合我们选择最优的执行计划进行查询,否则只能程序员自己走下后门了。

 

        图一为没有创建索引前的执行计划:

                   

        

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

SQL Server parse and compile time:

   CPU time = 15 ms, elapsed time = 189 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

SQL Server Execution Times:

   CPU time = 94 ms,  elapsed time = 101 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

SQL Server Execution Times:

   CPU time = 16 ms,  elapsed time = 6 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

(0 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'free_room'. Scan count 2, logical reads 16, physical reads 1, read-ahead reads 48, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'free_room_use'. Scan count 1, logical reads 8, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(20 row(s) affected)

 

(1 row(s) affected)

 

SQL Server Execution Times:

   CPU time = 15 ms,  elapsed time = 100 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

       图二为增加索引后的执行计划:

                   

 

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

SQL Server parse and compile time:

   CPU time = 16 ms, elapsed time = 45 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

SQL Server Execution Times:

   CPU time = 109 ms,  elapsed time = 102 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

(0 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'free_room'. Scan count 2, logical reads 16, physical reads 1, read-ahead reads 48, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'free_room_use'. Scan count 1, logical reads 8, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

(20 row(s) affected)

 

(1 row(s) affected)

 

SQL Server Execution Times:

  CPU time = 0 ms,  elapsed time = 27 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

 

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

 

       总结:在关联表的关联键上创建合适的索引会影响到SQL优化器对执行计划的选择。数据量的大小以及数据分布情况都会影响执行计划,所以对于SQL优化来说,不能一味套用概念,要实实在在的看到效果才算优化成功。

 

posted on 2010-06-03 17:15  min.jiang  阅读(8175)  评论(8编辑  收藏  举报