SQL Server性能调教系列(4)--Profiler(下)

接上篇:

SQL Server性能调教系列(4)--Profiler(上)

 

3.分析跟踪记录

在跟踪了一段时间之后,在文件中就会保存有跟踪的数据(包括IO,Duration,CPU,Reads,Writes,RowCounts等计数器),接下来就是把跟踪的数据加载到表并分析这些数据。可以选择在Profile中打开并检查这些跟踪数据,会有些限制,如不能完成太多的操作,大量重复的SQL语句,没有汇总。

3.1 加载数据到表(使用函数fn_trace_gettable返回表格形式的数据,作为范例只选择分析T-SQL代码和Duration查询的运行时间)

select CAST(textdata as nvarchar(max)) as tsql_code,duration
into Workload
from sys.fn_trace_gettable('C:\test\performancetrace_20100802.trc',NULL) as TT

3.2 汇总相同的SQL项

select tsql_code,SUM(duration) as total_duration from workload group by tsql_code
(由于我是在Production上面做的trace,考虑到系统的安全性,在此不便透露分析的SQL代码,实在很抱歉,各位朋友如有兴趣可在自己的测试环境中测试,讨论测试的结果)
 

问题:分组聚合后会看到逻辑上相同(参数不同)的查询会被分到不同的组,因为在筛选器中使用了不同的值。因为这些相同逻辑的SQL会使用相同的执行计划,应该聚合在一起才能准备的分析总的查询运行的时间。

 

3.3 问题处理方案一(大致分段截取)

通常情况下SQL语句都是Select+栏位,左边有很大一部分是相同的,根据SQL字符的长度,截取前一段来聚合。如取前50,100,150. 方法简单,容易操作,会聚合一部分数据,但是长度不太好取值,只能调整前缀的长度去测试。

select left(tsql_code,50) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,50)

--or

select left(tsql_code,100) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,100)

--or

select left(tsql_code,150) as t_sql,SUM(duration) as total_duration from workload group by left(tsql_code,150)

 

3.4 问题处理方案二(复杂,精确,逻辑上相同的SQL,参数用通配符替代),这个方法是T-SQL查询技术内幕中介绍的方法,如果需要更加详细的说明,请阅读这本书,你会得到更多的启发。

(1) 模式化查询,它对于相同模式的查询是一样的。

  • T-SQL函数实现

 建立函数:

CREATE FUNCTION [dbo].[fn_SQLSigTSQL] 
  (@p1 NTEXT, @parselength INT = 4000)
RETURNS NVARCHAR(4000)

-- This function will replace the parameters with '#'
-- This function is provided "AS IS" with no warranties,
-- and confers no rights. 
-- Use of included script samples are subject to the terms specified at
-- http://www.microsoft.com/info/cpyright.htm
-- 
-- Strips query strings
AS
BEGIN 
  DECLARE @pos AS INT;
  DECLARE @mode AS CHAR(10);
  DECLARE @maxlength AS INT;
  DECLARE @p2 AS NCHAR(4000);
  DECLARE @currchar AS CHAR(1), @nextchar AS CHAR(1);
  DECLARE @p2len AS INT;

  SET @maxlength = LEN(RTRIM(SUBSTRING(@p1,1,4000)));
  SET @maxlength = CASE WHEN @maxlength > @parselength 
                     THEN @parselength ELSE @maxlength END;
  SET @pos = 1;
  SET @p2 = '';
  SET @p2len = 0;
  SET @currchar = '';
  set @nextchar = '';
  SET @mode = 'command';

  WHILE (@pos <= @maxlength)
  BEGIN
    SET @currchar = SUBSTRING(@p1,@pos,1);
    SET @nextchar = SUBSTRING(@p1,@pos+1,1);
    IF @mode = 'command'
    BEGIN
      SET @p2 = LEFT(@p2,@p2len) + @currchar;
      SET @p2len = @p2len + 1 ;
      IF @currchar IN (',','(',' ','=','<','>','!')
        AND @nextchar BETWEEN '0' AND '9'
      BEGIN
        SET @mode = 'number';
        SET @p2 = LEFT(@p2,@p2len) + '#';
        SET @p2len = @p2len + 1;
      END 
      IF @currchar = ''''
      BEGIN
        SET @mode = 'literal';
        SET @p2 = LEFT(@p2,@p2len) + '#''';
        SET @p2len = @p2len + 2;
      END
    END
    ELSE IF @mode = 'number' AND @nextchar IN (',',')',' ','=','<','>','!')
      SET @mode= 'command';
    ELSE IF @mode = 'literal' AND @currchar = ''''
      SET @mode= 'command';

    SET @pos = @pos + 1;
  END
  RETURN @p2;
END

 

该函数参数为一个查询字符串和要分析的代码的长度,但会输入查询的签名,并用井号(#)替换所有的参数。测试结果如下:

select dbo.fn_SQLSigTSQL('select * from Sales.SalesOrderHeader where SalesOrderID=''43659'' and Status=''5'' ',500)

image

 

  • CLR实现

CLR在处理迭代/过程逻辑和字符串处理时比T-SQL效率高,下面介绍用CLR实现模式化查询。

a. 建立C#版的Classs Libary,函数如下:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
 
public partial class SQLSignature
{
    // fn_SQLSigCLR
    [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]
    public static SqlString fn_SQLSigCLR(SqlString querystring)
    {
        return (SqlString)Regex.Replace(
            querystring.Value,
            @"([\s,(=<>!](?![^\]]+[\]]))(?:(?:(?:(?#    expression coming
             )(?:([N])?(')(?:[^']|'')*('))(?#           character
             )|(?:0x[\da-fA-F]*)(?#                     binary
             )|(?:[-+]?(?:(?:[\d]*\.[\d]*|[\d]+)(?#     precise number
             )(?:[eE]?[\d]*)))(?#                       imprecise number
             )|(?:[~]?[-+]?(?:[\d]+))(?#                integer
             ))(?:[\s]?[\+\-\*\/\%\&\|\^][\s]?)?)+(?#   operators
             ))",
            @"$1$2$3#$4");
    }
    // fn_RegexReplace - for generic use of RegEx-based replace
    [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]
    public static SqlString fn_RegexReplace(
        SqlString input, SqlString pattern, SqlString replacement)
    {
        return (SqlString)Regex.Replace(
            input.Value, pattern.Value, replacement.Value);
    }
}

 

b. 加载.dll中间语言代码到DB

USE master;
CREATE ASSEMBLY SQLSignature
FROM 'C:\SQLSignature\SQLSignature\bin\Debug\SQLSignature.dll';

 

c. 注册函数fn_SQLSigCLR和fn_RegexReplace

CREATE FUNCTION dbo.fn_SQLSigCLR(@querystring AS NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
WITH RETURNS NULL ON NULL INPUT 
EXTERNAL NAME SQLSignature.SQLSignature.fn_SQLSigCLR;
GO

CREATE FUNCTION dbo.fn_RegexReplace(
  @input       AS NVARCHAR(MAX),
  @pattern     AS NVARCHAR(MAX),
  @replacement AS NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
WITH RETURNS NULL ON NULL INPUT 
EXTERNAL NAME SQLSignature.SQLSignature.fn_RegexReplace;
GO

 

d. 注册完成之后,用下面代码测试:

SELECT
  dbo.fn_SQLSigCLR(tsql_code) AS sig_sql,
  duration
FROM dbo.Workload;

 

结果的SQL全被模式化,井号(#)替代所有的参数。

 

(2) 以用上面建立的函数,模式化追踪的T-SQL语句,并分类汇总。

a. 以用查询签名,为每个字符串生成整数的校验和(CheckSum),方便以后的汇总计算,提高效率:

ALTER TABLE dbo.Workload ADD cs INT NOT NULL DEFAULT (0);
GO
UPDATE dbo.Workload
  SET cs = CHECKSUM(dbo.fn_SQLSigCLR(tsql_code));

CREATE CLUSTERED INDEX idx_cl_cs ON dbo.Workload(cs);
 

b. 用每个签名的检验和计算运行时间填充临时表#AggQueries,包括运行时间的百分比,以及运行时间降序的行号。

IF OBJECT_ID('tempdb..#AggQueries') IS NOT NULL
  DROP TABLE #AggQueries;
GO

SELECT cs, SUM(duration) AS total_duration,
  100. * SUM(duration) / SUM(SUM(duration)) OVER() AS pct,
  ROW_NUMBER() OVER(ORDER BY SUM(duration) DESC) AS rn
INTO #AggQueries
FROM dbo.Workload
GROUP BY cs;

CREATE CLUSTERED INDEX idx_cl_cs ON #AggQueries(cs);

 

查询聚合之后临时表的内容,数据量会大大的减少,包含签名,总的运行时间,运行时间占总运行时间的半分比,排序序号。

 

c.筛选并匹配,使用APPLY运算符得到查询模式和一个示例查询。

WITH RunningTotals AS
(
  SELECT AQ1.cs,
    CAST(AQ1.total_duration / 1000.
      AS DECIMAL(12, 2)) AS total_s, 
    CAST(SUM(AQ2.total_duration) / 1000.
      AS DECIMAL(12, 2)) AS running_total_s, 
    CAST(AQ1.pct AS DECIMAL(12, 2)) AS pct, 
    CAST(SUM(AQ2.pct) AS DECIMAL(12, 2)) AS run_pct, 
    AQ1.rn
  FROM #AggQueries AS AQ1
    JOIN #AggQueries AS AQ2
      ON AQ2.rn <= AQ1.rn
  GROUP BY AQ1.cs, AQ1.total_duration, AQ1.pct, AQ1.rn
  HAVING SUM(AQ2.pct) - AQ1.pct <= 90 -- percentage threshold
)
SELECT RT.rn, RT.pct, S.sig, S.tsql_code AS sample_query
FROM RunningTotals AS RT
  CROSS APPLY
    (SELECT TOP(1) tsql_code, dbo.fn_SQLSigCLR(tsql_code) AS sig
     FROM dbo.Workload AS W
     WHERE W.cs = RT.cs) AS S
ORDER BY RT.rn;
 

4. 有了查询模式,示例查询,和占用时间的百分比例和排序。然后就可以着手优化。也可以通过类似的方式,找到造成大量结果集,大多数的I/O问题的查询模式。

 

四:总结

Perfiler是一个很好用的工具来追踪系统的性能和工作的负荷,从而准确的找到值得优化的SQL,提高效率,大大减少工作量。

 

附件下载:Server性能计数器.rar

PerformanceMonitor.rar

 

参考:Microsoft SQL Server 2005技术内幕:T-SQL查询

 

>>>SQL Server性能调校系列入口地址

posted @ 2010-08-04 08:16  changbluesky  阅读(8515)  评论(4编辑  收藏  举报
----------------------------------------------------------- 本人博客园地址:http://www.cnblogs.com/changbluesky 请多多支持,谢谢!