CMU15-445/645 2019 Advanced SQL
Advanced SQL
在 Relational Model 下构建查询语句的方式分为两种:Procedural 和 Non-Procedural。第一节课中已经介绍了 Relational Algebra,它属于 Procedural 类型,而本节将介绍的 SQL 属于 Non-Procedural 类型。使用 SQL 构建查询时,用户只需要指定它们想要的数据,而不需要关心数据获取的方式,DBMS 负责理解用户的查询语义,选择最优的方式构建查询计划。
SQL 的历史
- "SEQUAL" from IBM's System R prototype
- Structured English Query Language
- Adopted by Oracle in the 1970s
-
IBM releases DB2 in 1983
-
ANSI Standard in 1986. ISO in 1987
- Structured Query Language
当前 SQL 的标准是 SQL 2016,而目前大部分 DBMSs 至少支持 SQL-92 标准,具体的系统对比信息可以到这里查询。
SQL主要特性
摘要
-
Aggregations + Group By
-
String / Date / Time Operations
-
Output Control + Redirection
-
Nested Queries
-
Common Table Expressions
-
Window Functions
本节使用的示例数据库如下所示:
student(sid, name, login, gpa)
sid | name | login | age | gpa |
53666 | Kanye | kayne@cs | 39 | 4.0 |
53668 | Bieber | jbieber@cs | 22 | 3.9 |
enrolled(sid, cid, grade)
sid | cid | grade |
53666 | 15-445 | C |
53688 | 15-721 | A |
course(cid, name)
cid | name |
15-445 | Database Systems |
15-721 | Advanced Database Systems |
Aggregates
聚合函数有如下:
-
AVG(col)
-
MIN(col)
-
MAX(col)
-
SUM(col)
-
COUNT(col)
SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs';
SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs';
SELECT COUNT(1) AS cnt
FROM student WHERE login LIKE '%@cs';
SELECT AVG(gpa), COUNT(sid)
FROM student WHERE login LIKE '%@cs';
SELECT COUNT(DISTINCT login)
FROM student WHERE login LIKE '%@cs';
aggregate不能与其他查询行昏庸
String Operations
如在 condition 中判断两个字符串忽略大小写后是否相等:
/* SQL-92 */
WHERE UPPER(name) = UPPER('KaNyE')
/* MySQL */
WHERE name = "KaNyE"
String Matching
SELECT * FROM enrolled AS e
WHERE e.cid LIKE '15-%';
SELECT * FROM student AS s
WHERE s.login LIKE '%@c_';
String Operations
SQL-92 定义了一些 string 函数,如
SELECT SUBSTRING(name, 0, 5) AS abbrv_name
FROM student WHERE sid = 53688;
SELECT * FROM student AS s
WHERE UPPER(e.name) LIKE 'KAN%';
不同 DBMS 有不同的 string 函数(没有完全遵从 SQL-92 标准),如连接两个 strings
/* SQL-92 */
SELECT name FROM student
WHERE login = LOWER(name) || '@cs';
/* MySQL */
SELECT name FROM student
WHERE login = LOWER(name) + '@cs';
SELECT name FROM student
WHERE login = CONCAT(LOWER(name), '@cs')
Data/Time Operations
不同的DBMS的定义用法不一样。这种不同标准带来的问题非常明显
#---------postgresql---------
SELECT NOW();
#----------------------------
#-----------mysql------------
SELECT NOW();
#----------------------------
#-----------sqlite-----------
SELECT CURRENT_TIMESTAMP;
#----------------------------
#---------postgresql---------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days;# 240 Correct result#----------------------------#-----------mysql------------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days; # 728 error resultSELECT ROUND((UNIX_TIMESTAMP(DATE('2018-08-29')) - (UNIX_TIMESTAMP(DATE('2018-01-01'))) / (60*60*24), 0) AS days;# 240 Correct resultSELECT DATEDIFF(DATE('2018-08-29'), DATE('2018-01-01')) AS days;#----------------------------#-----------sqlite-----------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days;# 0 error resultSELECT CAST((julianday(CURRENT_TIMESTAMP) - julianday('2018-01-01')) AS INT) AS days;#----------------------------
Output Redirection
将查询的结果存在另外的表中
-
表必须是已经存在的表
-
表必须有输入相同的列数和数据类型
#---------SQL-92.sql---------SELECT DISTINCT cid INTO CourseIds FROM enrolled;INSERT INTO CourseIds ( SELECT DISTINCT cid FROM enrolled);#----------------------------#-----------mysql------------CREATE TABLE CourseIds ( SELECT DISTINCT cid FROM enrolled);#----------------------------
Output Control
Order By
语法为ORDER BY <column*> [ASC|DESC]
默认为升序
SELECT sid, grade FROM enrolled WHERE cid = '15-721' ORDER BY grade;
按多个字段分别排序
SELECT sid FROM enrolled WHERE cid = '15-721' ORDER BY grade DESC, sid ASC;
同时可以在Order BY中放入表达式
Limit
语法为LIMIT <count> [offset]
这里的offset是要跳过前offset条数据再显示数据
SELECT sid, name FROM student WHERE login LIKE '%@cs' LIMIT 10;SELECT sid, name FROM student WHERE login LIKE '%@cs' LIMIT 20 OFFSET 10;
NESTED QUERIES
nested queries 包含 inner queries 和 outer queries,前者可以出现在 query 的任何位置,且 inner queries 可以引用 outer queries 中的表信息。
例 1:获取所有参与 '15-445' 这门课所有学生的姓名:
SELECT name FROM student WHERE sid IN ( SELECT sid FROM enrolled WHERE cid = '15-445' );SELECT (SELECT S.name FROM student AS S WHERE S.sid = E.sid) AS sname FROM enrolled AS E WHERE cid = '15-445';
语法中支持的谓词包括:
-
ALL: 所有 inner queries 返回的记录都必须满足条件
-
ANY :任意 inner queries 返回的记录满足条件即可
-
IN :与 ANY 等价
-
EXISTS :inner queries 返回的表不为空
SELECT name FROM student WHERE sid = ANY( SELECT sid FROM enrolled WHERE cid = '15-445' );
还可以这样写
SELECT (SELECT S.name FROM student AS S WHERE S.sid = E.sid) AS sname FROM enrolled AS EWHERE cid = '15-445'
例 2:找到至少参与一门课程的所有学生中,id 最大的
SELECT sid, name FROM student WHERE sid >= ALL ( SELECT sid FROM enrolled );SELECT sid, name FROM student WHERE sid IN ( SELECT MAX(sid) FROM enrolled );SELECT sid, name FROM student WHERE sid IN ( SELECT sid FROM enrolled ORDER BY sid DESC LIMIT 1 );
例 3:找到所有没有学生参与的课程
SELECT * FROM course WHERE NOT EXISTS ( SELECT * FROM enrolled WHERE course.cid = enrolled.cid );
Window Functions
主要与聚合函数的区别是在保留原有数据的前提下,增添了一列作为计算的结果.
两类特殊的windows function
-
ROW_NUMBER() 当前的行数
-
RANK() 当前的顺序位置
例1
SELECT *, ROW_NUMBER() OVER () AS row_num FROM enrolled;
得到如下表
OVER关键字可以将window function计算出的tuples分组
SELECT cid, sid,ROW_NUMBER() OVER (PARTITION BY cid) FROM enrolled ORDER BY cid;
可以得到如下表
例 3:找到每门课获得最高分的学生
SELECT * FROM ( #rank()生成了排序后的排名 SELECT *, #遍历enrolled按照cid分组后以grade升序排列 RANK() OVER (PARTITION BY cid ORDER BY grade ASC) AS rank FROM enrolled ) AS rankingWHERE ranking.rank = 1
Common Table Expreesion
提供一种在你大型查询时的方法替换掉繁琐的嵌套查询
WITH cteName AS ( SELECT 1)SELECT * FROM cteNameWITH cteName (col1, col2) AS ( SELECT 1, 2)SELECT col1 + col2 FROM cteName
例 1:找到所有参与课程的学生中 id 最大的
WITH cteSource(maxId) AS ( SELECT MAX(sid) FROM enrolled)SELECT name FROM student, cteSource WHERE student.sid = cteSource.maxId
利用递归关键字RECURSIVE来打印 1-10
WITH RECURSIVE cteSource (counter) AS ( (SELECT 1) UNION ALL (SELECT counter + 1 FROM cteSource WHERE counter < 10))SELECT * FROM cteSource;
递归的语法为
WITH RECURSIVE name_cte AS (SELECT statement /* non-recursive statement */UNION [ALL]SELECT statement /*recursive statement referencing the above select statement */)SELECT * FROM name_cte;
SQLite递归查询与PostgreSQL 类似,过程如下:
-
执行非递归部分并创建临时表
-
执行递归部分并增加结果至临时表
-
重复第二步直到工作表为空
参考链接: