Comparing Inline and Multi-Statement Table valued UDFs

refer: http://blog.waynesheffield.com/wayne/archive/2012/02/comparing-inline-and-multistatement-table-valued-functions/

Table-Valued Functions. What a wonderful addition to SQL they make. They take parameters, do some work, and return a result set that can be used in queries. You can select directly against them, or utilize them with the APPLY operator. These are truly versatile additions to SQL -and  since you can pass parameters to them, they are like a parametrized view. And we have two different types to work with: Inline Table-Valued Functions (ITVF) and Multi-Statement Table-Valued Functions (MTVF).

But how do they compare with each other? Well, let’s start off by looking at the syntax of each:

CREATE FUNCTION Util.MyITVFunction (@Parameters INT)
RETURNS TABLE
AS
RETURN
SELECT TOP (@Parameters) N
  FROM Util.Tally
 ORDER BY N;
GO

  

CREATE FUNCTION Util.MyMTVFunction (@Parameters INT)
RETURNS @FunctionResultTableVariable TABLE (N INT)
AS
BEGIN
  INSERT INTO @FunctionResultTableVariable (N)
  SELECT TOP (@Parameters) N
    FROM Util.Tally
   ORDER BY N;
  RETURN;
END
GO

The changes in the syntax are that MTVFs must first declare a table variable that is to be returned. Secondly, the MTVF must have a BEGIN/END block. Third, inside the BEGIN/END block you need code that populates the table variable. And finally, you return from the function. In comparison, the ITVF just returns a select statement – there is no table variable to mess around with, no inserts, no code blocks. Just a SELECT statement.

So, how do these perform? For this comparison, let’s use the example that Microsoft supplies in Books Online (BOL) for the APPLY operator (http://technet.microsoft.com/en-us/library/ms175156.aspx):

First, make and populate two tables: Employees and Departments:

Sample Data    
--Create Employees table and insert values.
CREATE TABLE Employees
(
    empid   int         NOT NULL
    ,mgrid   int         NULL
    ,empname varchar(25) NOT NULL
    ,salary  money       NOT NULL
    CONSTRAINT PK_Employees PRIMARY KEY(empid)
);
GO
INSERT INTO Employees VALUES(1 , NULL, 'Nancy'   , $10000.00);
INSERT INTO Employees VALUES(2 , 1   , 'Andrew'  , $5000.00);
INSERT INTO Employees VALUES(3 , 1   , 'Janet'   , $5000.00);
INSERT INTO Employees VALUES(4 , 1   , 'Margaret', $5000.00);
INSERT INTO Employees VALUES(5 , 2   , 'Steven'  , $2500.00);
INSERT INTO Employees VALUES(6 , 2   , 'Michael' , $2500.00);
INSERT INTO Employees VALUES(7 , 3   , 'Robert'  , $2500.00);
INSERT INTO Employees VALUES(8 , 3   , 'Laura'   , $2500.00);
INSERT INTO Employees VALUES(9 , 3   , 'Ann'     , $2500.00);
INSERT INTO Employees VALUES(10, 4   , 'Ina'     , $2500.00);
INSERT INTO Employees VALUES(11, 7   , 'David'   , $2000.00);
INSERT INTO Employees VALUES(12, 7   , 'Ron'     , $2000.00);
INSERT INTO Employees VALUES(13, 7   , 'Dan'     , $2000.00);
INSERT INTO Employees VALUES(14, 11  , 'James'   , $1500.00);
GO
--Create Departments table and insert values.
CREATE TABLE Departments
(
    deptid    INT NOT NULL PRIMARY KEY
    ,deptname  VARCHAR(25) NOT NULL
    ,deptmgrid INT NULL REFERENCES Employees
);
GO
INSERT INTO Departments VALUES(1, 'HR',           2);
INSERT INTO Departments VALUES(2, 'Marketing',    7);
INSERT INTO Departments VALUES(3, 'Finance',      8);
INSERT INTO Departments VALUES(4, 'R&D',          9);
INSERT INTO Departments VALUES(5, 'Training',     4);
INSERT INTO Departments VALUES(6, 'Gardening', NULL);

In this example, most (but not all) of the departments in the Departments table have a manager ID that corresponds to an employee in the Employees table. The following table-valued function accepts an employee ID as an argument and returns that employee and all of his/her subordinates.

CREATE FUNCTION dbo.fn_getsubtree(@empid AS INT)
    RETURNS @TREE TABLE
(
    empid   INT NOT NULL
    ,empname VARCHAR(25) NOT NULL
    ,mgrid   INT NULL
    ,lvl     INT NOT NULL
)
AS
BEGIN
  WITH Employees_Subtree(empid, empname, mgrid, lvl)
  AS
  (
    -- Anchor Member (AM)
    SELECT empid, empname, mgrid, 0
    FROM Employees
    WHERE empid = @empid
 
    UNION ALL
 
    -- Recursive Member (RM)
    SELECT e.empid, e.empname, e.mgrid, es.lvl+1
    FROM Employees AS e
      JOIN Employees_Subtree AS es
        ON e.mgrid = es.empid
  )
  INSERT INTO @TREE
    SELECT * FROM Employees_Subtree;
 
  RETURN
END
GO

To return all of the subordinates in all levels for the manager of each department, use the following query:

Source code    
SELECT D.deptid,
       D.deptname,
       D.deptmgrid,
       ST.empid,
       ST.empname,
       ST.mgrid
  FROM Departments AS D
       CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST;

  Which returns this result set:

deptiddeptnamedeptmgridempidempnamemgrid
1 HR 2 2 Andrew 1
1 HR 2 5 Steven 2
1 HR 2 6 Michael 2
2 Marketing 7 7 Robert 3
2 Marketing 7 11 David 7
2 Marketing 7 12 Ron 7
2 Marketing 7 13 Dan 7
2 Marketing 7 14 James 11
3 Finance 8 8 Laura 3
4 R&D 9 9 Ann 3
5 Training 4 4 Margaret 1
5 Training 4 10 Ina 4

This is a MTVF. Now, let’s convert it to an ITVF by removing the table variable declaration, the begin/end block, the insert statement, and move the RETURN to the start:

CREATE FUNCTION dbo.fn_getsubtreeITVF(@empid AS INT)
    RETURNS TABLE
AS
RETURN
  WITH Employees_Subtree(empid, empname, mgrid, lvl)
  AS
  (
    -- Anchor Member (AM)
    SELECT empid, empname, mgrid, 0
    FROM Employees
    WHERE empid = @empid
 
    UNION ALL
 
    -- Recursive Member (RM)
    SELECT e.empid, e.empname, e.mgrid, es.lvl+1
    FROM Employees AS e
      JOIN Employees_Subtree AS es
        ON e.mgrid = es.empid
  )
    SELECT * FROM Employees_Subtree;
GO

As you would expect, this function returns exactly the same result set. So, let’s look at how they perform.

Let’s get the statistics of each by running SET STATISTICS IO,TIME ON before the two queries. We’ll also grab the actual execution plans, and capture the activity with Profiler. The following statistics are returned:

MTVF:

Table '#15502E78'. Scan count 6, logical reads 6...
Table 'Departments'. Scan count 1, logical reads 2...
SQL Server Execution Times:
CPU time = 15 ms,  elapsed time = 12 ms.

ITVF:

Table 'Worktable'. Scan count 7, logical reads 85...
Table 'Employees'. Scan count 1, logical reads 35...
Table 'Departments'. Scan count 1, logical reads 2...
SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 114 ms.

Note that for the MTVF, that the Employees table doesn’t appear to have been touched. Instead, what we are seeing is some read activity on a table variable – some very low read activity. Also notice that the elapsed CPU time for the MTVF is substantially greater than the ITVF. (The total elapsed time is related to how long it takes to return the information to the client, so I disregard this value.) Since the function is being called 6 times, the table variable is built, populated, and then read from 6 times, hence the value of 6. But how many reads were being performed against the Employees table? We know that the function is accessing the Employees table, but we have no clue as to what the IO statistics are for that table. Instead, the statistics show only the reading of the data from the MTVF table variable, not the reading and inserting of data into it.

In looking at the statistics, it appears that the ITVF is doing a lot more work than the MTVF – but it is running considerably faster. (For this small set of data, both are extremely fast, but you can see that the ITVF is so fast that it can’t be registered at the millisecond level.)

Let’s compare this to what Profiler caught:

As you can see, they have the same CPU time that the statistics caught. But here we can see the total read activity that the MTVF is doing, and that the ITVF is doing 2/3 the reads of the MTVF. Again, I’m ignoring the Duration column since it can be impacted by other activity going on across the network / client computer.

Finally, let’s look at the execution plans:

MTVF:

ITVF:

In the MTVF, you see only an operation called “Table Valued Function”. Everything that it is doing is essentially a black box – something is happening, and data gets returned. For MTVFs, SQL can’t “see” what it is that the MTVF is doing since it is being run in a separate context. What this means is that SQL has to run the MTVF as it is written, without being able to make any optimizations in the query plan to optimize it.

In the ITVF, everything that it is doing is being shown… just like a view, it’s activity is being “inlined” into the query plan. Since SQL now can see everything that is going on (across the entire query), it can make optimizations in the query plan to be more efficient.

This example is just on a small handful of records in both tables, but we can already see a performance difference. When you expand these tables to tens of thousands of records, this difference is really magnified. I have seen performance improvements that take a query from running in tens of minutes to seconds, simply by converting a MTVF to an ITVF.

Another thing to note in these execution plans – look at the query cost (relative to the batch) percentages. They show that the MTVF is 20%, and the ITVF is 80%. So, that portion of the execution plan is also misleading.

By the way… scalar functions work nearly essentially the same as MTVFs. You can get a similar substantial performance boost by converting these to ITVFs, or possibly by just JOINing to the table.

So, in closing, use those ITVFs. And try to keep away from MTVFs and scalar functions. Those functions are necessary sometimes, but use the ITVFs if possible… your SQL Server will thank you.

PS – another closing note. When you have MTVFs, remember that both the IO statistics, and the execution plan total cost percentages are misleading.

posted @ 2014-08-31 19:36  Cathy Lee  阅读(5557)  评论(0编辑  收藏  举报