【Complete Showplan Operators】Chapter 4: BookMark/Key Lookup

It's time to talk about a flm star amongst(prep. 在…之中;在…当中(等于among)) operators – Key Lookup is such a famous operator that I couldn't write about it without giving it the red carpet(n. 地毯;) treatment.
Get yourself comfortable, grab(vt. 攫取;霸占;将…深深吸引) some popcorn(n. 爆米花,爆玉米花), and relax. I realize that you know this smooth operator, but I hope you discover something new about Key Lookup behavior and learn some good tips here. I hope to give you a fuller understanding of the Bookmark/Key Lookup by the time you fnish the article.
As if seeking anonymity(n. 匿名;匿名者;无名之辈), this operator has changed its identity three times in the last three versions of SQL Server. In SQL 2000 it was called BookMark Lookup. SQL 2005 comes, and it was renamed and showed as a simple Clustered Index Seek operation. Since SQL Server 2005 SP2 it has been called a Key Lookup, which makes more sense to me. Ina text execution plan, it is represented as a Clustered Index Seek.
First, I'll explain a bit about the way that SQL Server uses an index, and why unordered bookmark operations are so very expensive. I'll also create a procedure(n. 程序,手续;步骤) to return some information about when lookup operations are good, and when a scan turns out to be better than a lookup.  

Before you get distracted(adj. 心烦意乱的;思想不集中的) into thinking about RIDs/Heaps, let me say one thing – I'll be talking about RID Lookup another time. The main focus here is about the Key Lookup
operator.

In summary, the Key Lookup is used to fetch(提取,读取) values via a clustered index, when the required data isn't in a non-clustered index. Let's take a look further.
I once heard Kimberly Tripp give a very good and practical(adj. 实际的;实用性的) analogy(n. 类比;类推;类似): imagine a book, and suppose we have two indexes, the frst is a clustered index, the Table of Contents that
appears in beginning of the book; and the non-clustered index is the index in the back of the book. When you need to search for some information in the index at the back,
you have a pointer, the page number, to the page that mentions the subject. This "page number" is what we call a BookMark, in SQL terms, that is the Cluster Key.

This BookMark action goes to the "back" index, fnds the page number that contains the information, and goes to the page. This is what a "bookmark lookup" does.
Suppose I want to know more about "bookmark lookups"; well, since I have the Inside Microsoft SQL Server 2005 – Query Tuning and Optimization book, I can go to the back
index and see if there is information somewhere in the book. At the "B" word I have a text "BookMark Lookup" and the number of the page that talks about the subject. The index
at the end of the book is very useful.

But wait a minute, there is a snag. A Key lookup is a very expensive operation because it performs a random I/O into the clustered index. For every row of the non-clustered
index, SQL Server has to go to the Clustered Index to read their data. We can take advantage of knowing this to improve the query performance. To be honest, when I see
an execution plan that is using a Key Lookup it makes me happy, because I know I've a good chance of improving the query performance just creating a covering index. A
Covering index is a non-clustered "composite(复合)" index which contains all columns required by the query.

Let's demonstrate(vt. 证明;展示;论证) this with some code.
The following script will create a table called TestTable and will insert 100,000 rows with garbage data. An index called ix_Test will be created to be used in our lookup samples.  

IF OBJECT_ID('TestTable') IS NOT NULL
    BEGIN
        DROP TABLE TestTable
    END
GO
CREATE TABLE TestTable
    (
      ID INT IDENTITY(1, 1)
             PRIMARY KEY ,
      Col1 VARCHAR(250) NOT NULL
                        DEFAULT NEWID() ,
      Col2 VARCHAR(250) NOT NULL
                        DEFAULT NEWID() ,
      Col3 VARCHAR(250) NOT NULL
                        DEFAULT NEWID() ,
      Col4 DATETIME NOT NULL
                    DEFAULT GETDATE()
    )
GO
SET NOCOUNT ON
GO
INSERT  INTO TestTable
        DEFAULT VALUES
GO 100000
CREATE NONCLUSTERED INDEX ix_Test ON TestTable(Col1, Col2, Col3)
GO

Now suppose the following query: 

SET STATISTICS IO ON
    SELECT *
    FROM TestTable
    WHERE Col1 like 'A%'
SET STATISTICS IO OFF

(6223 row(s) affected)
Table 'TestTable'. Scan count 1, logical reads 1895, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As we can see in the execution plan above, the Query Optimizer chose not to use the ix_Test index, but instead chose to scan the clustered index to return the rows. 6,223 rows were returned and SQL made 1895 logical reads to return that data. That means the table had 1,895 pages, because SQL made a Scan on the table.
Now, let's suppose that I don't trust the Query Optimizer to create a good plan (don't hit me, but, do you?), and I decide add a hint(n. 暗示;线索;提示) to say how SQL Server should ccess the data. What will happen?

SET STATISTICS IO ON
SELECT  *
FROM    TestTable WITH ( INDEX = ix_Test )
WHERE   Col1 LIKE 'A%'
SET STATISTICS IO OFF

(6223 row(s) affected)
Table 'TestTable'. Scan count 1, logical reads 19159, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Here the Key lookup operator is used to read the value of the column Col4 by referencing the clustered index, since that column is not into the non-clustered index ix_Test. Cool, now our smartest plan uses the index, and return the same 6,223 rows, but, wait a minute, it read 19,159 pages! In other words, it has read 1,7264 more than the frst plan, or ten times the table size.
As you can see, a Key Lookup is a good choice only when a few rows are returned; but the retrieval(n. 检索;恢复;取回;) cost grows with the quantity of returned rows. As Jason Massie once said, there is a point where the threshold(n. 入口;门槛;开始;极限;临界值) is crossed and a scan becomes more efcient than a lookup.
A scan uses a sequential IO while a lookup does random IO.

 Here, we can raise a question. How many rows before a key lookup stops being a good strategy? When is this threshold crossed, and a scan performs better?
To help you with that, I've created a procedure called st_TestLookup (you will get it as follow).

IF OBJECT_ID('st_TestLookup') IS NOT NULL
BEGIN
  DROP PROCEDURE dbo.st_TestLookup
END
GO
CREATE PROCEDURE dbo.st_TestLookup @Table_Name   VarChar(200),
                                   @Lookup_Index VarChar(200),
                                   @Trace_Path   NVarChar(200)

AS
BEGIN

/*
Author: Fabiano Neves Amorim
E-Mail: fabiano_amorim@bol.com.br
http://fabianosqlserver.spaces.live.com/
http://www.simple-talk.com/author/fabiano-amorim/

Use:

EXEC dbo.st_TestLookup @Table_Name   = 'CONCP010',
                       @Lookup_Index = 'ix_ID_Cota',
                       @Trace_Path   = 'F:\TesteTrace.trc'

*/

  BEGIN TRY
    SET NOCOUNT ON;
    
    IF OBJECT_ID(@Table_Name) IS NULL
    BEGIN
      RAISERROR (30002,11,1,'Specified table do not exists');
    END

    IF INDEXPROPERTY(OBJECT_ID(@Table_Name), @Lookup_Index, 'IsClustered') <> 0
    BEGIN
      RAISERROR (30002,12,1,'Specified index must be a nonclustered index');
    END    
    ---------------------------------------------------------
    --------------- Drop temp tables ------------------------
    ---------------------------------------------------------

    IF OBJECT_ID('tempdb.dbo.#tmp') IS NOT NULL
    BEGIN
      DROP TABLE #tmp
    END
    IF OBJECT_ID('tempdb.dbo.#tmp_1') IS NOT NULL
    BEGIN
      DROP TABLE #tmp_1
    END
    IF OBJECT_ID('tempdb.dbo.#tmp_2') IS NOT NULL
    BEGIN
      DROP TABLE #tmp_2
    END
    IF OBJECT_ID('tempdb.dbo.#tb_Results') IS NOT NULL
    BEGIN
      DROP TABLE #tb_Results
    END
    CREATE TABLE #tmp_2 (Total_Rows Int)

    ---------------------------------------------------------
    --------------- Create trace file -----------------------
    ---------------------------------------------------------
    DECLARE @TraceID  Int, 
            @Str      VarChar(MAX),
            @Msg      VarChar(500),
            @CmdShell VarChar(200)
            
    -- Delete the tracefile
    SET @CmdShell = 'del ' + @Trace_Path
    EXEC xp_cmdShell @CmdShell, no_output

    SET @Trace_Path = REPLACE(@Trace_Path, '.trc','')
    
    -- Start Trace
    EXEC sys.sp_trace_create @TraceID output, 0, @Trace_Path
    
    SET @Trace_Path = @Trace_Path + '.trc'
    
    -- Set Events
    -- 10 is RPC:Completed event. 1 is TextData column
    EXEC sys.sp_trace_setevent @TraceID, 45, 16, 1
    -- 13 is SQL:BatchStarting, 1 is TextData column
    EXEC sys.sp_trace_setevent @TraceID, 45, 1, 1 
    -- Set Filter to actual session
    SET @Str = 'EXEC sys.sp_trace_setfilter '+ Convert(VarChar, @TraceID) +' , 12, 0, 0, ' + Convert(VarChar, @@SPID)
    EXEC (@str)
    -- Start Trace (status 1 = start)
    EXEC sys.sp_trace_setstatus @TraceID, 1


    ---------------------------------------------------------
    --------------- Start fullscan read ---------------------
    ---------------------------------------------------------
    DECLARE @i             Int,
            @percent       Int,
            @Total_Rows    Int,
            @Scan_IO       Int,
            @Key_Lookup_IO Int,
            @Points        VarChar(MAX),
            @Str_Col       VarChar(500),
            @Col           VarChar(200)
            
    -- Looking for a column that are not in nonclustered and clustered index
    SELECT TOP 1 
           @Str_Col = 'DECLARE @' + c.name + ' ' + t.name,
           @Col     = c.Name
      FROM sys.columns c
     INNER JOIN sys.types t
        ON c.system_type_id = t.system_type_id
     WHERE c.column_id NOT IN (SELECT ic.column_id
                                 FROM sys.indexes i
                                INNER JOIN sys.index_columns ic
                                   ON i.object_id = ic.object_id
                                  AND i.index_id = ic.index_id
                                WHERE (i.name = @Lookup_Index OR i.type = 1)
                                  AND OBJECT_NAME(i.object_id) = @Table_Name)
       AND OBJECT_NAME(c.object_id) = @Table_Name       
    
    IF ISNULL(@Col,'') = ''
    BEGIN
      RAISERROR (30002,12,1,'There is no "nonclustered/clustered" column');
    END
           
    -- Run a scan at @Table_Name
    SET @Str = 'SELECT COUNT(*) AS Count_Rows FROM ' + QUOTENAME(@Table_Name) + 'WITH(INDEX=0) OPTION (MAXDOP 1) -- ;fullscan;';

    INSERT INTO #tmp_2(Total_Rows)
    EXEC (@Str)

    SELECT @Total_Rows = Total_Rows 
      FROM #tmp_2

    -- Get the number of Reads
    SELECT @Scan_IO = Reads
      FROM ::fn_trace_gettable(@Trace_Path, 1)
     WHERE TextData like '%;fullscan;%'
     
    ---------------------------------------------------------
    --------------- Start lookup tests ----------------------
    ---------------------------------------------------------
    SET @Msg = 'Logical Reads to Scan ' + Convert(VarChar, @Total_Rows) + ' rows of table ' + @Table_Name + ': ' + Convert(VarChar, @Scan_IO)
    RAISERROR (@Msg, 0,0) WITH NOWAIT
    
    SET @Points = CHAR(13) + REPLICATE('*', SubString(Convert(VarChar,@Scan_IO),1, LEN(Convert(VarChar,@Scan_IO)) -2)) + ' Scan'
    SET @Key_Lookup_IO = 0;
    SET @i = 0;
    SET @percent = 1;

    WHILE @Scan_IO >= @Key_Lookup_IO
    BEGIN
      SET @i = CONVERT(Numeric(18,2),(@Total_Rows * @percent) / 1000,2)

      -- Run partial scan at @Table_Name
      SET @Str = @Str_Col +
                 ' SELECT TOP '+ Convert(VarChar,@i) + ' @' + @Col +' = ' + @Col +' FROM ' + QUOTENAME(@Table_Name) + 
                 ' WITH(INDEX='+ QUOTENAME(@Lookup_Index) +') OPTION (MAXDOP 1) -- ;partialscan ' + Convert(VarChar,@i) + ';';
      EXEC (@Str)

      -- Get the number of Reads
      SELECT @Key_Lookup_IO = Reads
        FROM ::fn_trace_gettable(@Trace_Path, 1)
       WHERE TextData like '%;partialscan ' + Convert(VarChar,@i) + ';%'

      IF @Scan_IO > @Key_Lookup_IO
      BEGIN
        SET @Msg = 'GoodPlan - Logical Reads to Lookup ' + Convert(VarChar,@i) + ' rows ' +
                   'of table : ' + Convert(VarChar,@Key_Lookup_IO) 
        RAISERROR (@Msg, 0,0) WITH NOWAIT
        IF LEN(Convert(VarChar,@Key_Lookup_IO)) > 2
        BEGIN
          SET @Points = @Points + CHAR(13) + REPLICATE('*', SubString(Convert(VarChar,@Key_Lookup_IO),1, LEN(Convert(VarChar,@Key_Lookup_IO)) -2))
        END
      END
      ELSE
      BEGIN
        SET @Msg = 'BadPlan - Logical Reads to Lookup ' + Convert(VarChar,@i) + ' rows ' +
                   'of table : ' + Convert(VarChar,@Key_Lookup_IO) 
        RAISERROR (@Msg, 0,0) WITH NOWAIT
        IF LEN(Convert(VarChar,@Key_Lookup_IO)) > 2
        BEGIN
          SET @Points = @Points + CHAR(13) + REPLICATE('*', SubString(Convert(VarChar,@Key_Lookup_IO),1, LEN(Convert(VarChar,@Key_Lookup_IO)) -2))
        END
      END

      SET @percent = @percent + 1;
    END
    PRINT @Points    


    -----------------------------------------------------------------
    --------------- Stop and Close trace ----------------------------
    -----------------------------------------------------------------

    -- Populate a variable with the trace_id of the current trace
    SELECT  @TraceID = TraceID 
      FROM ::fn_trace_getinfo(default) WHERE VALUE = @Trace_Path

    -- First stop the trace. 
    EXEC sp_trace_setstatus @TraceID, 0

    -- Close and then delete its definition from SQL Server. 
    EXEC sp_trace_setstatus @TraceID, 2
    
    -- Delete the tracefile
    SET @CmdShell = 'del ' + @Trace_Path
    EXEC xp_cmdShell @CmdShell, no_output

  END TRY
  BEGIN CATCH
    -----------------------------------------------------------------
    --------------- Stop and Close trace ----------------------------
    -----------------------------------------------------------------
    -- Delete the tracefile
    SET @CmdShell = 'del ' + @Trace_Path
    EXEC xp_cmdShell @CmdShell, no_output
    
    -- Populate a variable with the trace_id of the current trace
    SELECT @TraceID = TraceID 
      FROM ::fn_trace_getinfo(default) WHERE VALUE = @Trace_Path
    
    IF ISNULL(@TraceID,0) <> 0
    BEGIN
      -- First stop the trace. 
      EXEC sp_trace_setstatus @TraceID, 0

      -- Close and then delete its definition from SQL Server. 
      EXEC sp_trace_setstatus @TraceID, 2
    END

    -- Execute error retrieval routine.
    SELECT ERROR_NUMBER()    AS ErrorNumber,
           ERROR_SEVERITY()  AS ErrorSeverity,
           ERROR_STATE()     AS ErrorState,
           ERROR_PROCEDURE() AS ErrorProcedure,
           ERROR_LINE()      AS ErrorLine,
           ERROR_MESSAGE()   AS ErrorMessage;
  END CATCH;
END
GO

I'll not explain what I did, because this is not the intention, but you need to enable xp_cmdShell to run this code so, obviously, I don't recommend that you to run this in a production environment. 

The code below runs the procedure to TestTable, the input parameters are three:
@Table_Name where you input the table that you want to analyze
@Lookup_Index where you input the non-clustered index that will be analyzed
a valid Path to SQL Server to create a profler trace to check the amount of logical IOs for each query.
Let's look at the results.

EXEC dbo.st_TestLookup @Table_Name = 'TestTable',
@Lookup_Index = 'ix_Test',
@Trace_Path = 'C:\TesteTrace.trc'
GO
Logical Reads to Scan 100000 rows of table TestTable: 1902
GoodPlan - Logical Reads to Lookup 100 rows of table : 339
GoodPlan - Logical Reads to Lookup 200 rows of table : 659
GoodPlan - Logical Reads to Lookup 300 rows of table : 981
GoodPlan - Logical Reads to Lookup 400 rows of table : 1301
GoodPlan - Logical Reads to Lookup 500 rows of table : 1620
BadPlan - Logical Reads to Lookup 600 rows of table : 1940
******************* Scan
***
******
*********
*************
****************
*******************

The frst result line shows us how many rows are in the table, and how many logical reads are used to scan the table. This is our start point. Based on that value, I then do a loop, reading values of the table. When the number of lookup IOs cross the Scan, I write a line starting with "BadPlan", and I show the number of IOs to read "x" rows using the Key Lookup operator.
The lines with "*" are just a tentative(adj. 试验性的,暂定的;) way to show these results in a graphical mode. Based on the procedure results, we know that, when we need more than 600 rows, a Scan is better than a Lookup. 

posted @ 2017-07-12 18:12  FH1004322  阅读(210)  评论(0)    收藏  举报