博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

4.0 Query Parameterization

Posted on 2012-02-17 14:11  nzperfect  阅读(540)  评论(0编辑  收藏  举报

原文:http://blogs.msdn.com/b/sqlprogrammability/archive/2007/01/11/4-0-query-parameterization.aspx

Using parameters or parameter markers in queries increases the ability of SQL Server to reuse compiled plans. There are two places where parameterization of queries can be done: on the client side application (or mid tier) or on the server side.

 

4.1 Client Side Parameterization

 

Client side parameterization (via ADO.NET, ODBC, or sp_executesql) is the recommended approach for query parameterization. Based on the degree of re-execution of queries (once versus multiple executions), client applications can decide to parameterize queries by specifying parameter holders, their data types and lengths in the queries. Therefore between query executions the only thing that changes is the actual value of parameters. This allows for re-use of one parameterized compiled plan. The server will not attempt to parameterize these queries further. It is important to parameterize queries completely and correctly in the client code in order to get the full benefits of parameterized queries. Consider the example below where the client submits a parameterized query using sp_executesql:

 

declare @param_value int, @sqlstring nvarchar(500),

@param_definition nvarchar(500), @col2 int;

set @sqlstring = N'select @col2_out = t1.col2 from t1 join t2 on t1.col1 = t2.col1 where t2.col2 = @param';

set @param_definition = N'@param int, @col2_out int output';

 

set @param_value = 5;

exec sp_executesql @sqlstring, @param_definition, @param = @param_value, @col2_out = @col2 output;

 

set @param_value = 10;

exec sp_executesql @sqlstring, @param_definition, @param = @param_value, @col2_out = @col2 output;

 

The adhoc query in this case is fully parameterized, and for different values of the parameter the same compiled plan is re-used which gives significant performance gains. The compiled plan is as below:

 

Text

Plan_handle

Cacheobjtype

Objtype

(@param int, @col2_out int output)select @col2_out = t1.col2 from t1 join t2 on t1.col1 = t2.col1 where t2.col2 = @param

0x06000100

6639AD2BB8

4181040000

0000000000

0000000000

Compiled Plan 

Prepared

 

The next section describes the best practices for query parameterization and highlights how several common omissions in client side parameterization can manifest as performance degradations.

 

4.2 Server Side Parameterization: Simple Parameterization

 

If the query submitted for execution has no parameters specified, then SQL server attempts to parameterize this query. This process is called simple or auto parameterization. The parameterization of queries is done on the server side and is on by default. As the name simple parameterization suggests, only a relatively small set of queries do get parameterized using simple parameterization. Consider the following example:

 

select col2 from t1 where col1 > 5

go

 

select col2 from t1 where col1 > 6

go

 

The 2 queries only differ in the literal values in the where clause. When the first query is submitted for execution, SQL Server parameterizes the first and the entries in the cache look like:

 

Text

Plan_handle

Cacheobjtype

Objtype

select col2 from t1 where col1 > 5   

0x06000100

87711124B8

614C040000

0000000000

0000000000

Compiled Plan 

Adhoc

(@1 tinyint)SELECT [col2] FROM [t1] WHERE [col1]>@1

0x06000100

533AC103B8

81CF030000

0000000000

0000000000

Compiled Plan

Prepared

 

The second row contains what we call a parameterized query and has objtype of prepared. The parameterized (or prepared) query is the actual compiled plan for the query with the literal values replaced with parameters. The first row contains what we call the shell query and has objtype adhoc. As the name suggests, this entry just saves the exact adhoc batch text in cache and points to the parameterized query plan (the prepared compiled plan). In other words the shell query has a dummy compiled plan which just points to the parameterized compiled plan. The shell query is relatively small in size compared to the parameterized (or prepared query).

 

There are benefits in caching the shell query: If the same query were to be re-executed, then we would compute the hash value of the sql text of the query and find an exact match in the cache i.e. the shell query. Since this shell query points to the compiled plan, the compiled plan is executed and we are done. If we had not cached this shell query and if the same query was re-executed then the steps followed would be slightly different: first we would compute the hash of the sql text of the query and not find an exact match in the cache. Next, the query is auto-parameterized. Now for this auto-parameterized query we will search the cache and find an exact match in the cache avoiding the need to go to the query optimizer. Finally we execute this compiled plan and are done. Clearly there are performance gains from caching the shell query, especially for applications that re-execute the same query with the same literal values as well. Note that we do not cache insert shell queries because the probability of re-using the exact same adhoc query is low.

 

SQL Server attempts to auto-parameterize queries and sends the auto-parameterized query to the query processor. Now the query processor decides if it can generate a potentially better plan given the constant values instead of the auto-parameterized query. If the query processor cannot find a better plan with the constant value plugged in, it means that the parameterized plan can be used across different parameter values. This parameterization is considered ‘safe’. In this case we cache both the parameterized and the shell query. However if the query processor can generate a better plan with the literal values, then this parameterization is considered ‘unsafe’. The query is parameterized and the shell query is cached. The parameterized query is not cached and is not visible through any of the plan cache DMV’s. The shell query points to the parameterized query and this is the only way to get to the parameterized query. In other words, unless the exact same query (and parameter value) is re-executed there is no chance of re-using this ‘unsafe’ parameterized query. Queries that already have explicitly defined parameters are not auto-parameterized.  

 

It is important to note that SQL Server will not even attempt to auto-parameterize certain types of queries. A non-exhaustive list of such queries with certain constructs is as below:

 

1.       BULK INSERT

2.       IN Expressions (Ex. IN (1, 2, 3))

3.       UPDATE with SET clause that has variables

4.       SELECT with UNION

5.       SELECT with INTO

6.       SELECT/UPDATE with FOR BROWSE

7.       SELECT/UPDATE/DELETE with OPTION <query hints>

8.       SELECT with DISTINCT

9.       SELECT/UPDATE/DELETE/INSERT with TOP

10.   WAITFOR statements

11.   SELECT with GROUP BY, HAVING, COMPUTE

12.   DELETE/UPDATE with FROM CLAUSE

13.   full text predicates

14.   Sub queries

15.   FROM Clause of a SELECT statement has table valued method or full text table or OPENROWSET or OPENXML or OPENQUERY or OPENDATASOURCE

16.   Comparison predicate of the form EXPR <> a non null constant

17.   Queries with JOINS

 

A general rule of thumb would be that the server auto-parameterizes a query when only one feasible plan for that query is possible. It follows that when auto-parameterization is done, no cost based query optimization is performed by the query optimizer.

 

4.3 Server Side Parameterization: Forced Parameterization

 

Clearly since simple parameterization encompasses a small subset of query types, a more powerful option called forced parameterization is available in SQL Server 2005. There are two levels of granularity with which forced parameterization can be enabled: for the entire database or at the query level.

 

Between the 2 options of database and query level granularity, query level parameterization is the recommended approach. With this alternative, the benefits of parameterization are retained for specific queries and any potential performance problems that may emerge from applying it at the database level are eliminated. Forced parameterization at the query level can be specified using plan guides. Consider the example below:

 

select t1.col2, t2.col2 from t1 join t2 on t1.col1 = t2.col1 and t1.col2 > 5

go

 

This query does not get parameterized when the database parameterized option is set to simple. Now let us create a plan guide for queries of this template that differ only in the where clause parameter value:

 

declare @stmt nvarchar(max), @params nvarchar(max);

exec sp_get_query_template N'select t1.col2, t2.col2 from t1 join t2 on t1.col1 = t2.col1 and t1.col2 > 5', @stmt output, @params output

exec sp_create_plan_guide N'JoinGuide', @stmt, N'Template', NULL, @params, N'OPTION(PARAMETERIZATION FORCED)'

go

 

select t1.col2, t2.col2 from t1 join t2 on t1.col1 = t2.col1 and t1.col2 > 15

go

 

Now if we look at the cached plans we have the parameterized and the shell queries cached.

 

Text

Plan_handle

Cacheobjtype

Objtype

select t1.col2, t2.col2 from t1 join t2 on t1.col1 = t2.col1 and t1.col2 > 15       

0x060001009

DE67C01B8C1

B0040000000

00000000000

000000

Compiled Plan 

Adhoc

(@0 int)select t1 . col2 , t2 . col2 from t1 join t2 on t1 . col1 = t2 . col1 and t1 . col2 > @0

0x060001009

647E335B881

70040000000

00000000000

000000

Compiled Plan

Prepared

 

BOL has more information on plan guides.

 

Next let us consider the database level option. Forced parameterization at the database level can be enabled using the alter database DDL:

 

alter database <database_name> set parameterization forced

go

 

For some databases with a high volume of queries it may significantly reduce the frequency of query compilations. In general, a broad class of DML queries are parameterized and literal values are replaced with parameters during query compilation. BOL article ‘Forced Parameterization’ has a list of query clauses and in literals in query constructs that will not be forced parameterized.

 

There are some factors to consider before turning on forced parameterization for a database. Examine these factors carefully along with query level forced parameterization options before turning on forced parameterization at the database level. Turning on (or off) forced parameterization flushes the plan cache of all entries, except those that were then compiling or executing. Plans that were then compiling or executing are parameterized the next time the query is executed. The database parameterization setting is preserved when re-attaching or restoring a database. Distributed queries may also be parameterized as long these queries are executing in a database where forced parameterization is enabled. Since parameterization replaces the literal values in the query with parameters, the query optimizer may very well choose a sub optimal plan. Specifically, the query optimizer is less likely to match the query to an index, indexed view or a computed column. It is also less likely to choose optimal plans for partitioned tables and distributed partitioned views. Thus forced parameterization can have some serious performance ramifications and should be used with caution.

 

Note that it is possible to turn on forced parameterization at the database level and then use plan guides to specify simple parameterization for specific queries that suffer from poor performance due to forced parameterization. However the reverse approach i.e. specifying forced parameterization for specific queries, with the database parameterization option simple is what is recommended.

 

Using forced parameterization the performance problems associated with not re-using cached plans seems to have to been solved largely. However in some cases it can give rise to another set of problems emerging from parameter sniffing. Parameter sniffing can be explained as follows: If the parameter values are specified during the first execution of a query then the query optimizer attempts to pick the best plan based on the current parameter values provided. Parameter values are sniffed for stored procedure executions, queries submitted for sp_executesql, and prepared queries. While in most cases this works well, in some cases where the optimal plan for a query varies greatly with the range of parameter values, there can be some performance degradation. This is especially true if the first set of parameter values are atypical, because it causes a sub-optimal plan (for other parameter values) to be inserted into cache and re-used as well. Consider the following example: Sales table has three columns: transaction id, country id, and total_amount. We also know that majority of the sales occur in country with id = 1, and country with id = 2 has far fewer rows in the sales table. If we had a stored procedure that reported sales by country id as below:

 

create proc sales_report @country_id int

as begin

      select * from sales where country_id = @country_id

end

go

 

If we now executed the stored proc with country id 2 first, then the query_plan can be obtained by:

 

select text, query_plan, cacheobjtype, objtype

from sys.dm_exec_cached_plans cp cross apply sys.dm_exec_sql_text(plan_handle)

outer apply sys.dm_exec_query_plan(plan_handle)

where cacheobjtype = 'Compiled Plan' and objtype = 'Proc'

and text not like '%select text%'

go

 

The query plan shows that the query optimizer does an index seek when we execute the procedure with country id 2 first, where as the query optimizer does a clustered index scan with country id 1 on first execution of the stored procedure. This difference in plan for the stored procedure execution comes from parameter sniffing and cause poor query execution time for more frequently executed parameter values if the first parameter was not a representative value. Or if the parameterized plan removed from cache due a cache flush and the next stored procedure execution was with atypical value. In such cases one way to bypass parameter sniffing would be to modify the stored procedure as follows using local variables:

 

create proc sales_report @country_id int, @country_id_default int = 1

as begin

      set @country_id_default = @country_id;

      select * from sales where country_id = @country_id_default;

end

go

 

Now irrespective of what order the parameters are supplied, the query optimizer does a clustered index scan. This is because the optimizer sniffs the @country_id_default variable which is always set to the more commonly used value (in this case 1), leading to the same plan irrespective of what parameter we pass to the procedure. Notice that the parameter we pass to the stored procedure is not part of the select statement where clause. Also note that the set statement cannot affect the compiled plan chosen by the query optimizer because it is evaluated only at execution time. There are other options like creating the procedure with the ‘with recompile’ option. More details are available in this white paper and in this blog post.