SQL COOKBOOK (Ch.11-Appendix B)
---Chapter 11 Advanced Searching---
introduces recipes that are crucial for everyday development and yet sometimes slip through the cracks. These recipes are not any more difficult than others, yet I see many developers making very inefficient attempts at solving the problems these recipes solve. Examples from this chapter include finding knight values, paginating through a result set, skipping rows from a table, finding reciprocals, selecting the top n records, and ranking results.
11.12. Generating Simple Forecasts
Problem
Based on current data, you want to return addition rows and columns representing future actions. For example, consider the following result set:
Ultimately you want to transform the result set above to the following result set:
Solution
The key is to use a Cartesian product to generate two additional rows for each order then simply use CASE expressions to create the required column values.
DB2 and SQL Server
Use the recursive WITH clause to generate rows needed for your Cartesian product. The DB2 and SQL Server solutions are identical except for the function used to retrieve the current date. DB2 uses CURRENT_DATE and SQL Server uses GET-DATE. The SQL Server solution is shown below:
1 with nrows(n) as ( 2 select 1 from t1 union all 3 select n+1 from nrows where n+1 <= 3 4 ) 5 select id, 6 order_date, 7 process_date, 8 case when nrows.n >= 2 then process_date+1 else null end as verified, 9 case when nrows.n = 3 then process_date+2 else null end as shipped 10 from ( 11 select nrows.n id, getdate()+nrows.n as order_date, getdate()+nrows.n+2 as process_date 12 from nrows 13 ) orders, nrows 14 order by 1
---Chapter 12 Reporting and Warehousing---
introduces queries typically used in warehousing or generating complex reports. This chapter was meant to be the majority of the book as it existed in my original vision. Examples include converting rows into columns and vice versa (cross-tab reports), creating buckets or groups of data, creating histograms, calculating simple and complete subtotals, performing aggregations over a moving window of rows, and grouping rows based on given units of time.
12.12. Calculating Simple Subtotals
Problem
Solution
The ROLLUP extension to the GROUP BY clause solves this problem perfectly. If ROLLUP is not available for your RDBMS, you can solve the problem, albeit with more difficulty, using a scalar subquery or a UNION query.
SQL Server and MySQL
Use the aggregate function SUM to sum the salaries, and use WITH ROLLUP to organize the results into subtotals (by JOB) and a grand total (for the whole table). Then use COALESCE to supply the label ‘TOTAL’ for the grand total row (which will otherwise have a NULL in the job column):
1 select coalesce(job,'TOTAL') job, 2 sum(sal) sal 3 from emp 4 group by job with rollup
---Chapter 13 Hierarchical Queries---
introduces hierarchical recipes. Regardless of how your data is modeled, at some point you will be asked to format data such that it represents a tree or parent-child relationship. This chapter provides recipes accomplishing these tasks. Creating tree-structured result sets can be cumbersome with traditional SQL, so vendor-supplied functions are particularly useful in this chapter. Examples include expressing a parent-child relationship, traversing a hierarchy from root to leaf, and rolling up a hierarchy.
13.2. Expressing a Child-Parent-Grandparent Relationship
Problem
As you can see, employee MILLER works for CLARK who in turn works for KING. You want to express the full hierarchy from MILLER to KING. You want to return the following result set:
However, the single self-join approach from the previous recipe will not suffice to show the entire relationship from top to bottom. You could write a query that does two self joins, but what you really need is a general approach for traversing such hierarchies.
Solution
DB2 and SQL Server
Use the recursive WITH clause to find MILLER’s manager, CLARK, then CLARK’s manager, KING. The SQL Server string concatenation operator + is used in this solution:
1 with x (tree,mgr,depth) 2 as ( 3 select cast(ename as varchar(100)), mgr, 0 4 from emp 5 where ename = 'MILLER' 6 union all 7 select cast(x.tree+'-->'+e.ename as varchar(100)), e.mgr, x.depth+1 8 from emp e, x 9 where x.mgr = e.empno 10 ) 11 select tree leaf___branch___root 12 from x 13 where depth = 2
The approach here is to start at the leaf node and walk your way up to the root (as useful practice, try walking in the other direction). The upper part of the UNION ALL simply finds the row for employee MILLER (the leaf node). The lower part of the UNION ALL finds the employee who is MILLER’s manager, then finds that person’s manager, and this process of finding the “manager’s manager” repeats until processing stops at the highest-level manager (the root node). The value for DEPTH starts at 0 and increments automatically by 1 each time a manager is found.
with clause
先看下面一个嵌套的查询语句:
select * from person.StateProvince where CountryRegionCode in
(select CountryRegionCode from
person.CountryRegion where Name like 'C%')
现在使用公用表表达式(CTE)来解决上面的问题:
with
cr as
(
select CountryRegionCode from person.CountryRegion where
Name like 'C%'
)
select * from person.StateProvince where CountryRegionCode in (select * from cr)
---Chapter 14 Odds ‘n’ Ends---
is a collection of miscellaneous recipes that didn’t seem to fit into any other problem domain, but that nevertheless are interesting 18 and useful. This chapter is different from the rest in that it focuses on vendorspecific solutions only. This is the only chapter of the book where each recipe highlights only one vendor. The reasons are twofold: first, this chapter was meant to serve as more of a fun, geeky chapter. Second, some recipes exist only to highlight a vendor-specific function that has no equivalent in the other RDBMSs (examples include SQL Server’s PIVOT/UNPIVOT operators and Oracle’s MODEL clause). In some cases, though, you’ll be able to easily tweak a solution provided in this chapter to work for a platform not covered in the recipe.
(skip)
---Appendix A Window Function Refresher---
is a window function refresher along with a solid discussion of groups in SQL. Window functions are new to most, so it is appropriate that this appendix serves as a brief tutorial. Additionally, in my experience I have noticed that the use of GROUP BY in queries is a source of confusion for many developers. This chapter defines exactly what a SQL group is, and then proceeds to use various queries as proofs to validate that definition. The chapter then goes into the effects of NULLs on groups, aggregates, and partitions. Lastly, you’ll find discussion on the more obscure and yet extremely powerful syntax of the window function’s OVER clause (i.e., the “framing” or “windowing” clause).
A.1. Grouping
Simply stated, grouping is a way to organize like rows together. When you use GROUP BY in a query, each row in the result set is a group and represents one or more rows with the same values in one or more columns that you specify. That’s the gist of it.
Relationship Between SELECT and GROUP BY
It is important to keep in mind when using aggregate functions such as COUNT that any item in your SELECT list that is not used as an argument to an aggregate function must be part of your group. For example, if you write a SELECT clause such as:
select deptno, count(*) as cnt from emp
then you must list DEPTNO in your GROUP BY clause:
select deptno, count(*) as cnt from emp group by deptno
Now it’s fair to ask: exactly what items in a SELECT list can change a grouping or the value returned by an aggregate function? The answer is simple: other columns from the table(s) you are selecting from. Consider the prospect of adding the JOB column to the query we’ve been looking at:
select deptno, job, count(*) as cnt from emp group by deptno, job
By listing another column, JOB, from table EMP, we are changing the group and changing the result set; thus we must now include JOB in the GROUP BY clause along with DEPTNO, otherwise the query will fail. The inclusion of JOB in the SELECT/GROUP BY clauses changes the query from “How many employees are in each department?” to “How many different types of employees are in each department?”
A.2. Windowing
Window functions, like aggregate functions, perform an aggregation on a defined set (a group) of rows, but rather than returning one value per group, window functions can return multiple values for each group. The group of rows to perform the aggregation on is the window (hence the name "window functions”). DB2 actually calls such functions online analytic processing (OLAP) functions, and Oracle calls them analytic functions, but the ISO SQL standard calls them window functions, so that's the term I use in this book.
select ename, deptno, count(*) over() as cnt from emp order by 2
The presence of the OVER keyword indicates that the invocation of COUNT will be treated as a window function, not as an aggregate function. In general, the SQL standard allows for all aggregate functions to also be window functions, and the keyword OVER is how the language distinguishes between the two uses.
Order of Evaluation
Window functions are performed as the last step in SQL processing prior to the ORDER BY clause.
select ename, deptno, count(*) over() as cnt from emp where deptno = 10 order by 2
The value for CNT for each row is no longer 14, it is now 3. In this example, it is the WHERE clause that restricts the result set to three rows, hence the window function will count only three rows
Partitions
Use the PARTITION BY clause to define a partition or group of rows to perform an aggregation over.
select ename, deptno, count(*) over(partition by deptno) as cnt from emp order by 2
Additionally, what’s nice about the PARTITION BY clause is that it performs its computations independently of other window functions, partitioning by different columns in the same SELECT statement.
select ename, deptno, count(*) over(partition by deptno) as dept_cnt, job, count(*) over(partition by job) as job_cnt from emp order by 2
By now it should be clear that the PARTITION BY clause works like a GROUP BY clause, but it does so without being affected by the other items in the SELECT clause and without requiring you to write a GROUP BY clause.
When Order Matters
select deptno, ename, hiredate, sal, sum(sal)over(partition by deptno) as total1, sum(sal)over() as total2, sum(sal)over(order by hiredate) as running_total from emp where deptno=10
why did including an ORDER BY in the OVER clause create a running total in the first place? The reason is, when you use ORDER BY in the OVER clause you are specify a default “moving” or “sliding” window within the partition even though you don’t see it. The ORDER BY HIREDATE clause terminates summation at the HIREDATE in the current row.
---Appendix B Rozenshtein Revisited---
Rozenshtein’s book, The Essence of SQL (Coriolis Group Books) was the first book I purchased on SQL that was not required by a class. It was from that book that I learned how to “think in SQL.” To this day I attribute much of my understanding of how SQL works to David’s book. It truly is different from any other SQL book I’ve read, and I’m grateful that it was the first one I picked up on my own volition. Appendix B focuses on some of the queries presented in The Essence of SQL, and provides alternative solutions using window functions (which weren’t available when The Essence of SQL was written) for those queries.
B.2. Answering Questions Involving Negation
Question 1
You want to find students who do not take CS112
MySQL and PostgreSQL
Use a CASE expression with the aggregate function MAX to flag CS112 if it exists for a particular student:
1 select s.sno,s.sname,s.age 2 from student s left join take t 3 on (s.sno = t.sno) 4 group by s.sno,s.sname,s.age 5 having max(case when t.cno = 'CS112' then 1 else 0 end) = 0
DB2 and SQL Server
Use a CASE expression with the window function MAX OVER to flag CS112 if it exists for a particular student:
1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 max(case when t.cno = 'CS112' then 1 else 0 end) 5 over(partition by s.sno,s.sname,s.age) as takes_CS112 6 from student s left join take t 7 on (s.sno = t.sno) 8 ) x 9 where takes_CS112 = 0
Discussion
Despite the different syntax for each solution, the technique is the same. The idea is to create a “Boolean” column in the result set to denote whether or not a student takes CS112. If a student takes CS112, then return 1 in that column; otherwise, return 0.
B.3. Answering Questions Involving “at Most”
Question 4
You want to find the students who take at most two courses.
MySQL and PostgreSQL
Use the aggregate function COUNT to determine which students take no more than two courses:
1 select s.sno,s.sname,s.age 2 from student s, take t 3 where s.sno = t.sno 4 group by s.sno,s.sname,s.age 5 having count(*) <= 2
DB2, Oracle, and SQL Server
Use the window function COUNT OVER, again to determine which students take no more than two courses:
1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 count(*) over ( 5 partition by s.sno,s.sname,s.age 6 ) as cnt 7 from student s, take t 8 where s.sno = t.sno 9 ) x 10 where cnt <= 2
Question 5
You want to find students who are older than at most two other students.
MySQL and PostgreSQL
Use the aggregate function COUNT and a correlated subquery to find the students who are older than zero, one, or two other students:
1 select s1.* 2 from student s1 3 where 2 >= ( select count(*) 4 from student s2 5 where s2.age < s1.age )
DB2, Oracle, and SQL Server
Use the window function DENSE_RANK to find the students who are older than zero, one, or two other students:
1 select sno,sname,age 2 from ( 3 select sno,sname,age, 4 dense_rank()over(order by age) as dr 5 from student 6 ) x 7 where dr <= 3