数据库标量用户定义函数误用作常量
SQL Prompt是一款实用的SQL语法提示工具。SQL Prompt根据数据库的对象名称、语法和代码片段自动进行检索,为用户提供合适的代码选择。自动脚本设置使代码简单易读--当开发者不大熟悉脚本时尤其有用。SQL Prompt安装即可使用,能大幅提高编码效率。此外,用户还可根据需要进行自定义,使之以预想的方式工作。
下载SQL Prompt正式版慧都网
标量UDF作为全局数据库常量的错误使用是一个主要的性能问题,每当SQL Prompt在任何生产代码中发现此错误时,都应进行调查。除非您需要在计算列或约束中使用这些全局常量,否则通常将值存储在内联表值函数中或使用视图会更安全,更方便。
开发人员倾向于期望能够在数据库中设置全局值,以提供诸如Pi的值之类的常量,或诸如税率,语言,文件URN或URL之类的变量。用户定义的标量函数返回单个值,因此似乎提供了理想的方法。这对于不经常执行的功能以及处理相对较小的数据集是很好的,但是在其他情况下,它可能会导致严重的查询性能问题。出现问题是因为SQL Server不相信未经模式验证的标量函数是精确的和确定性的,因此在执行它们时会选择最安全(尽管最慢)的选项。
调用具有BEGIN…END块的任何SQL Server函数都会产生开销,因为除非允许SQL Server通过使用架构绑定创建该函数来验证其输出,否则它将在过滤数据之前重新为每一行执行该函数。 ,即使您很明显每次都会返回相同的值。这是一个比较隐蔽的问题,尽管扩展事件会话将揭示实际发生的情况,但是它并没有真正显示其在执行计划中的全部意义。
简而言之,请勿在JOIN条件,WHERE搜索条件或SELECT列表中使用标量用户定义函数(UDF),除非该函数是架构绑定的。SQL Prompt实现了静态代码分析规则PE017,该规则专门用于帮助您检测和纠正此问题。除非您对模式绑定及其对数据库更改的后果有信心,否则最好使用将值传输到变量,或者使用诸如视图或内联表值函数之类的模块。
解决问题
如果SQL Prompt检测到您的代码出现PE017,该怎么办?
我们将设置所有可能的选项,运行一些性能测试,并提出一些建议。
符合模式的UDF
通过添加架构绑定以确保其经过系统验证,可以正确使用标量函数。清单1创建了相同的简单Wordcount函数的两个版本,首先不带模式绑定,然后带模式绑定,这两个版本都简单地返回一个常量。在每一种情况下,我们检查IsDeterministic,IsPrecise以及IsSystemVerified属性值对每个对象。
最后,它创建了第三个版本,该版本仅返回其参数值,仅用于检查这是否是SQL Server验证过程中的一个因素。
IF Object_Id('dbo.Wordcount') IS NOT NULL DROP FUNCTION dbo.Wordcount GO CREATE FUNCTION dbo.Wordcount() /** Summary: > A simple scalar multi-statement function without schemabinding that returns a constant Author: PhilFactor Date: 01/02/2018 Returns: > the integer value 5 **/ RETURNS INT AS BEGIN RETURN 5 END GO /* we now test it to see whether SQL Server trusts it */ SELECT ObjectPropertyEx( Object_Id('dbo.Wordcount'), N'IsDeterministic') AS deterministic; --Is a precise number returned? Whenever floating point operations are used in resolving --expressions, the results are not precise, by the very nature of the way that the datatype is stored. SELECT ObjectPropertyEx(Object_Id('dbo.Wordcount'), N'IsPrecise') AS precise; --Can SQL Server verify that the function is precise and deterministic? SELECT ObjectPropertyEx( Object_Id('dbo.Wordcount'), N'IsSystemVerified') AS verified; GO IF Object_Id('dbo.WordcountSchemaBound') IS NOT NULL DROP FUNCTION dbo.WordcountSchemaBound GO CREATE FUNCTION dbo.WordcountSchemaBound() /** Summary: > A second version of a simple scalar multi-statement function with schemabinding that returns a constant Author: PhilFactor Date: 01/02/2018 Returns: > the integer value 5 **/ RETURNS INT WITH SCHEMABINDING AS BEGIN RETURN 5 END GO /* Repeat tests to see if SQL Server trusts dbo.WordcountSchemaBound*/ SELECT ObjectPropertyEx( Object_Id('dbo.WordcountSchemaBound'), N'IsDeterministic') AS deterministic; SELECT ObjectPropertyEx( Object_Id('dbo.WordcountSchemaBound'), N'IsPrecise') AS precise; SELECT ObjectPropertyEx( Object_Id('dbo.WordcountSchemaBound'), N'IsSystemVerified') AS verified; GO IF Object_Id('dbo.Wordcounter') IS NOT NULL DROP FUNCTION dbo.Wordcounter GO CREATE FUNCTION dbo.Wordcounter /** Summary: > A third version of a simple scalar multi-statement function without schemabinding that merely returns its parameter To test whether the absense of any parameter is a determining Factor -- whether adding a parameter here gets round the problem Author: PhilFactor Returns: > the integer value passed to it **/ (@howMany INT) RETURNS INT AS BEGIN RETURN @howMany END GO /* Repeat tests to see if SQL Server trusts dbo.Wordcounter*/ SELECT ObjectPropertyEx( Object_Id('dbo.Wordcounter'), N'IsDeterministic') AS deterministic; SELECT ObjectPropertyEx(Object_Id('dbo.Wordcounter'), N'IsPrecise') AS precise; SELECT ObjectPropertyEx( Object_Id('dbo.Wordcounter'), N'IsSystemVerified') AS verified; GO
清单1
如果运行清单1,您将看到该函数的第二个版本WordCountSchemaBound,对这三个属性返回true。稍后我们将看到这对调用这些函数的所有查询的性能有多大影响。
尽管模式绑定具有许多优点,但是在这种情况下,这意味着您将被明确禁止将常量视为变量,这并不是一件坏事。如果您更改了“常量”函数(即您已在表中的约束或计算列中使用的函数),它将被证明很复杂。另外,如果尝试在数据库工作时更改常量,则使用正在执行的功能的计划将在功能上放置模式稳定性锁,这将防止您更改常量的值,因为它们需要更改模式。锁。
标量UDF的替代品
清单2展示了标量UDF的几种替代选择,它们可以在不需要或不想对其进行模式绑定的情况下保存数据库范围的值。首先是视图,然后是表值函数。
IF Object_Id('dbo.WordCountView') IS NOT NULL DROP VIEW dbo.WordCountView GO CREATE VIEW dbo.WordCountView AS /** Summary: > A very simple view that returns a single row with one column Author: PhilFactor Date: 01/02/2018 Returns: > a single row with a column called 'wordcount' **/ SELECT 5 AS wordcount GO IF Object_Id('dbo.WordCountTVF') IS NOT NULL DROP FUNCTION dbo.WordCountTVF GO CREATE FUNCTION dbo.WordCountTVF() /** Summary: > A table valued function that returns a single row with a column called 'wordcount' Author: PhilFactor Date: 01/02/2018 Returns: > a single row with a column called 'wordcount' **/ RETURNS TABLE AS RETURN (SELECT 5 AS wordcount) GO
清单2
视图定义中引用的对象不能以使视图定义非法或强制SQL Server在视图上重新创建索引的方式进行更改。
尽管有额外的CHECK约束保护,但我没有使用表来保存常量的方法。表格的设计并非一成不变!作为破坏者,我会告诉您它们的表现和视图一样好。
性能测试
排列所有候选解决方案后,让我们看看它们的性能如何。我将测试每个选项能多快找出英语中常用的五个字母的单词。这些测试要求我们创建一个简单Commonwords表,其中包含所有常见单词的单列(主键)。要填充它,您需要下载commonwords 文件,然后运行清单3,并使用正确的文件路径。
DECLARE @AllCommonWords XML = (SELECT * FROM OpenRowset(BULK 'C:\MyPath\commonwords.XML', SINGLE_BLOB) AS x ); IF Object_Id('commonwords', 'U') IS NOT NULL DROP TABLE commonwords; CREATE TABLE commonwords (word VARCHAR(40) NOT NULL PRIMARY KEY); INSERT INTO commonwords(word) SELECT word = word.value('@el', 'varchar(40)') FROM @AllCommonWords.nodes('/commonwords/row') AS CommonWords(word);
清单3
对于计时,我将使用我的文章如何使用SQL Prompt片段记录T-SQL执行时间中描述的简单测试工具。
-- create a temporary table variable to hold timings DECLARE @log TABLE ( TheOrder INT IDENTITY(1, 1), WhatHappened VARCHAR(200), WhenItDid DATETIME2 DEFAULT GetDate() ) ----start of timing INSERT INTO @log(WhatHappened) SELECT 'Starting the test run'--place at the start --first we see how fast it is with a literal number, as a benchmark SELECT Count(*) FROM commonWords WHERE Len(word)=5 INSERT INTO @log(WhatHappened) SELECT 'simple Query with literal number'--place at the start --then see how long it takes if you transfer the 'constant' to a local variable DECLARE @NumberOfLetters INT=dbo.wordcount() SELECT Count(*) FROM commonWords WHERE Len(word)=@NumberOfLetters INSERT INTO @log(WhatHappened) SELECT 'Same query but with value transferred from UDF to variable ' --Now we see how long with the scalar UDF that just passes the value back SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcounter(5) iNSERT INTO @log(WhatHappened) SELECT 'Same but using a scalar function with parameter ' -- And now with the use of a scalar UDF function as a global constant SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcount() iNSERT INTO @log(WhatHappened) SELECT 'Same with a ''constant'' UD scalar function ' -- And now with the use of a schema-bound scalar UDF as a global constant SELECT Count(*) FROM commonWords WHERE Len(word)=dbo.wordcountSchemaBound() iNSERT INTO @log(WhatHappened) SELECT 'Same with schema-bound ''constant'' UD scalar function' --We'll now use a view to do the same thing SELECT Count(*) FROM commonWords INNER JOIN dbo.WordCountView ON Len(word)=wordcount iNSERT INTO @log(WhatHappened) SELECT 'Using a view containing a constant with inner join ' --We'll now use a view with a cross join to do the same thing SELECT Count(*) FROM commonWords CROSS JOIN dbo.WordCountView where Len(word)=wordcount iNSERT INTO @log(WhatHappened) SELECT 'Using a ''constant'' view with a cross join ' --and now with an inline table-valued function. Some functions are OK! SELECT Count(*) FROM commonWords INNER JOIN dbo.WordCountTVF() ON Len(word)=wordcount iNSERT INTO @log(WhatHappened) SELECT 'Using an inline TVF to provide a constant' --we see if a different syntax makes a difference SELECT Count(*) FROM commonWords cross JOIN dbo.WordCountTVF() WHERE Len(word)=wordcount iNSERT INTO @log(WhatHappened) SELECT 'Using an inline TVF and cross join to provide a constant' SELECT ending.whathappened AS test, DateDiff(ms, starting.whenItDid,ending.WhenItDid) [Time in ms] FROM @log starting INNER JOIN @log ending ON ending.theorder=starting.TheOrder+1 --list out all the timings GO /* this is the end of the test section */
清单4
运行此命令时,我们验证查询中使用常量的所有形式均产生相同的结果。时代非常清楚地表明问题是什么,问题的严重程度
几乎不需要图表来强调PE017的恐怖-const UDF的不正确使用。SQL Server执行不受架构限制的未经验证的标量UDF的方式非常谨慎(每行询问“是否仍返回5? ”),速度慢了五十倍。
除了避免使用不受模式约束的标量UDF外,该测试还表明,平均而言,在其他任何向查询中获取恒定值的方法之间,性能没有真正的区别。在每种情况下,查询执行计划都是相同的。
现在,我们通过使用拆卸部分结束,将所有内容整理整齐放在我们的测试数据库中。
IF Object_Id('dbo.Wordcount') IS NOT NULL DROP function dbo.Wordcount GO IF Object_Id('dbo.WordcountSchemaBound') IS NOT NULL DROP function dbo.WordcountSchemaBound GO IF Object_Id('dbo.Wordcounter') IS NOT NULL DROP function dbo.Wordcounter GO IF Object_Id('dbo.WordCountView') IS NOT NULL DROP view dbo.WordCountView GO IF Object_Id('dbo.WordCountTVF') IS NOT NULL DROP function dbo.WordCountTVF GO
清单5
推荐建议
如果您使用未经验证的标量函数,那么查询将非常缓慢,因为无论是否具有参数,您都将在每一行上执行该查询。
如果您面对大量使用标量UDF作为全局常量的继承代码,则可以使用架构绑定重做它们。但是,如果这些是全局变量,并且在实时系统中很少更改,那么我不会想到这种选择,因为如果不临时更改在约束或计算列中使用它的每个表,就无法更改模式绑定函数,要删除它们,请更改函数,然后替换约束和计算列。
视图或TVF的用途更加广泛,所以我更倾向于使用它们来保存“全局”值。如果更改了这些内容,则会记录该更改,因为它们需要DDL更改。唯一的问题是在约束或计算列中只能使用标量函数。如果使用表,那很好,但是请记住,更改常量不是DDL更改,因此您必须设置访问权限以拒绝任何人更改权限,例如税率!