Returning Ranked Results with Microsoft SQL Server 2005

2010-07-22 16:41 Tracy. 阅读(394) 评论(0) 编辑收藏举报

Introduction
In early November 2005, Microsoft launched three new product lines: Visual Studio 2005, SQL Server 2005, and the .NET Framework version 2.0 (which includes ASP.NET version 2.0). SQL Server 2005 is the newest version of Microsoft's database platform, with the last major SQL Server release dating back to SQL Server 2000. Over the past five years of development, SQL Server has added a vast array of new features and capabilities, which are summed up on Microsoft's website in What's New in SQL Server 2005? The new features that will likely be of particular interest to developers building web-based applications that use SQL Server 2005 as a backend data store, include the new Transact-SQL (T-SQL) enhancements, better Visual Studio integration, integration with the CLR/.NET Framework, and the SQL Server 2005 Management Studio application, which is a slicker version of SQL Server 2000's Enterprise Manager.

SQL Server 2005's T-SQL enhancements make writing certain types of queries significantly easier than in SQL Server 2000. While the new additions to 2005's T-SQL syntax could be replicated either in SQL Server 2000 or in the application layer, with 2005 the syntax is much more terse, readable, and sensible. In this article we'll look at SQL Server 2005's rank-related functions, which simplify the process of assigning ranks to query results, retrieve results with a particular rank, and other rank-related tasks common to reporting. Read on to learn more!
Before we look at how to accomplish these common query patterns, let's first create a data model against which we can run these queries. I am going to be using the free Express version of SQL Server 2005 for my demos, and have included the database and an ASP.NET 2.0 website for download at the end of this article. (Like Visual Studio, SQL Server ships with a number of different "versions." The Express version is a free version that is designed for hobbyists, students, and the like. If you download and install Visual Web Developer - the express version of Visual Studio geared toward web developers - you can indicate to install SQL Server 2005 Express edition as well.)

For these examples we'll use a database that has products, salespersons (employees), customers, and orders. This is modeled using five tables: Customers, Employees, Products, Orders, and OrderItems. The Customers, Employees, and Products tables contain a row for each customer, employee, and product, respectively. Each time a customer makes a purchase, a record is added to the Orders table indicating the customer that made the purchase, the employee that made the sale, and the date the order was placed. The OrderItems maps each product purchased for an order, the quantity of product, and the total price billed (assuming that larger purchases may be discounted). The following figure shows these tables, their fields, and the relationships among them.

As the diagram illustrates, the OrderItems establishes a many-to-many join between the Orders and Products tables.

When building reports or analyzing data, often users or management are interested in seeing the data ranked in some manner. For example, your boss may want to have a report that shows the top 10 selling items, or the three people in the sales team who have generated the greatest revenue in the third quarter. More complicated examples might be returning just the 3rd through 5th ranked salespersons. With SQL Server 2000 queries returning the top ranking items could be accomplished using the TOP or ROWCOUNT keywords, as discussed in Retrieving the Top N Records from a SQL Query. To retrieve a particular ranked subset, you needed to use a derived tables- or view-based approach.

SQL Server 2005 provides four new ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. While these are definitely a step above what SQL Server 2000 offers, the ranking functions still have some limitations that require the use of dervied tables or views for more interesting applications. Let's look at each of these individually.

Numbering Rows with ROW_NUMBER
The ROW_NUMBER function assigns an ordinal value with each record returned, with the ordinal values depending on a particular ORDER BY clause used in tandem with the function. The syntax for ROW_NUMBER is: ROW_NUMBER() OVER([partition] order-by-clause). For example, the following query would return the products from most expensive to least expensive, with a rank assigned to each product based on price:


SELECT ProductID, Name, Price, 
       ROW_NUMBER() OVER(ORDER BY Price DESC) As PriceRank
FROM Products

The results would be:

ProductID	Name	Price	PriceRank
8	Desk	495.0000	1
10	Executive Chair	295.0000	2
9	Chair	125.0000	3
5	Mouse	14.9500	4
6	Mousepad	9.9900	5
11	Scissors	8.5000	6
4	Stapler	7.9500	7
3	Binder	1.9500	8
...

By default, the ROW_NUMBER function assigns a increasing value to each record in the resultset. With the optional partition parameter, you can instruct the ROW_NUMBER function to start renumbering whenever there's a change in the value of the column for partitioning. To demonstrate this, I created a view called vwTotalAmountBilledPerOrder that returns each OrderID and the total amount billed for that order using the following query syntax:

SELECT OrderID, SUM(AmountBilled) AS TotalOrderAmount
FROM OrderItems
GROUP BY OrderID

This returns each unique order in the OrderItems table along with the sum of AmountBilled values pertaining to that order. With this view we can use the ROW_NUMBER method to rank those orders by most amount of money spent using the following:


SELECT c.Name, o.DateOrdered, tab.TotalOrderAmount, 
       ROW_NUMBER() OVER (ORDER BY TotalOrderAmount DESC) AS BestCustomer
FROM vwTotalAmountBilledPerOrder AS tab 
       INNER JOIN Orders AS o ON 
           o.OrderID = tab.OrderID 
       INNER JOIN Customers AS c ON 
           c.CustomerID = o.CustomerID

This will return results like:

Name	DateOrdered	TotalOrderAmount	BestCustomer
Bob	12/1/2005	12649.9900	1
Darren	1/2/2006	620.0000	2
Bob	12/19/2005	265.8500	3
Tito	12/22/2005	14.9500	4
Bruce	1/5/2006	14.9500	5
Tito	12/18/2005	12.4400	6
Bruce	1/4/2006	9.9900	7
Lee Ann	1/3/2006	8.5000	8
...

Notice that certain customers are on this list multiple times (Bob, Tito, and Bruce). Perhaps rather than seeing all orders, ordered by sales amount, we are interested in seeing the top orders per customer. We can accomplish this with the PARTITION BY clause in the ROW_NUMBER function like so:


SELECT c.Name, o.DateOrdered, tab.TotalOrderAmount, 
       ROW_NUMBER() OVER (PARTITION BY c.CustomerID ORDER BY TotalOrderAmount DESC) AS BestCustomer
FROM vwTotalAmountBilledPerOrder AS tab 
       INNER JOIN Orders AS o ON 
           o.OrderID = tab.OrderID 
       INNER JOIN Customers AS c ON 
           c.CustomerID = o.CustomerID

This will return the following results:

Name	DateOrdered	TotalOrderAmount	BestCustomer
Bob	12/1/2005	12649.9900	1
Bob	12/19/2005	265.8500	2
Tito	12/22/2005	14.9500	1
Tito	12/18/2005	12.4400	2
Darren	1/2/2006	620.0000	1
Bruce	1/5/2006	14.9500	1
Bruce	1/4/2006	9.9900	2
Lee Ann	1/3/2006	8.5000	1
...

While these results are definitely cool, the bad news is that you can't use the ROW_NUMBER() (or any of the other ranking functions) in the WHERE clause. That is, you might want to say, "Give me the products ranked by price whose rank is between 5 and 8." To accomplish this you'd need to use a derived table or view. For example, you could put the above query in a view called vwPriceRankedProducts and then get back just the 5th through 8th ranked products use the query:


SELECT ProductID, Name, Price, PriceRank
FROM vwPriceRankedProducts
WHERE PriceRank BETWEEN 5 AND 8

Handling Ties with RANK and DENSE_RANK
The ROW_NUMBER function blindly increments the value for each returned result based on the optional partition clause and the required order by clause. However, sometimes you may wish to handle ties differently, instead assigning the same value to ties. For example, earlier when displaying the list of total order amounts the amount of the order placed by Tito on 12/22/2005 is the same as the amount of the order placed by Bruce on 1/5/2006. However, the ROW_NUMBER function ranks these two rows as #4 and #5, rather than using #4 for both.

Both the RANK and DENSE_RANK functions will number ties with the same ranking. For example, using the following query:


SELECT c.Name, o.DateOrdered, tab.TotalOrderAmount, 
       RANK() OVER (ORDER BY TotalOrderAmount DESC) AS BestCustomer
FROM vwTotalAmountBilledPerOrder AS tab 
       INNER JOIN Orders AS o ON 
           o.OrderID = tab.OrderID 
       INNER JOIN Customers AS c ON 
           c.CustomerID = o.CustomerID

The results will be:

Name	DateOrdered	TotalOrderAmount	BestCustomer
Bob	12/1/2005	12649.9900	1
Darren	1/2/2006	620.0000	2
Bob	12/19/2005	265.8500	3
Tito	12/22/2005	14.9500	4
Bruce	1/5/2006	14.9500	4
Tito	12/18/2005	12.4400	6
Bruce	1/4/2006	9.9900	7
Lee Ann	1/3/2006	8.5000	8
...

Notice how the two orders with the same amount are both ranked #4. The difference between RANK and DENSE_RANK is in how they begin renumbering after the tied results. RANK bypasses as many numbers as there are tied results. In our example above, since there are two results tied at #4, following that rank #5 is skipped and the ranking resumes with #6. DENSE_RANK, on the other hand, picks up with the next number. Had we replaced the function name RANK with DENSE_RANK in the above query, Tito's order on 12/18/2005 for $12.44 would have the rank #5.

Like ROW_NUMBER both RANK and DENSE_RANK can use the optional PARTITION BY clause.

Grouping Ranked Results with NTILE
The final ranking-related T-SQL function new to SQL Server 2005 is NTILE(int). NTILE operates like the other ranking functions, but breaks down the results into int number of groups, assigning the same rankings for all records in the specified group. You can use NTILE to break results down into halves, thirds, quarters, and so on, as this example shows:

SELECT ProductID, Name, Price, NTILE(4) OVER (ORDER BY Price DESC) as Quartile
FROM Produts

With the results being:

ProductID	Name	Price	Quartile
8	Desk	495.0000	1
10	Executive Chair	295.0000	1
9	Chair	125.0000	2
5	Mouse	14.9500	2
6	Mousepad	9.9900	3
11	Scissors	8.5000	3
4	Stapler	7.9500	4
3	Binder	1.9500	4

In the downloadable demo I have an example that uses a view with NTILE to allow the user to selectively view various quartiles of the highest order amounts.

Conclusion
In this article we examined four functions new to SQL Server 2005. These four functions - ROW_NUMBER, RANK, DENSE_RANK, and NTILE - make it easier to rank results than was previously available with SQL Server 2000. These functions are but a small sample of the new T-SQL features available with SQL Server 2005. Future articles will highlight other useful facets!

努力加载评论中...

刷新页面返回顶部

☜ Tracy ☞ ☜天佑老爸☞

Returning Ranked Results with Microsoft SQL Server 2005

About