CMU15-445/645 2019 Advanced SQL

Advanced SQL

在 Relational Model 下构建查询语句的方式分为两种:Procedural 和 Non-Procedural。第一节课中已经介绍了 Relational Algebra,它属于 Procedural 类型,而本节将介绍的 SQL 属于 Non-Procedural 类型。使用 SQL 构建查询时,用户只需要指定它们想要的数据,而不需要关心数据获取的方式,DBMS 负责理解用户的查询语义,选择最优的方式构建查询计划。

SQL 的历史

  • "SEQUAL" from IBM's System R prototype

    - Structured English Query Language

    - Adopted by Oracle in the 1970s

  • IBM releases DB2 in 1983

  • ANSI Standard in 1986. ISO in 1987

    - Structured Query Language

当前 SQL 的标准是 SQL 2016,而目前大部分 DBMSs 至少支持 SQL-92 标准,具体的系统对比信息可以到这里查询。

SQL主要特性

摘要

  • Aggregations + Group By

  • String / Date / Time Operations

  • Output Control + Redirection

  • Nested Queries

  • Common Table Expressions

  • Window Functions

本节使用的示例数据库如下所示:

student(sid, name, login, gpa)

sid name login age gpa
53666 Kanye kayne@cs 39 4.0
53668 Bieber jbieber@cs 22 3.9

enrolled(sid, cid, grade)

sid cid grade
53666 15-445 C
53688 15-721 A

course(cid, name)

cid name
15-445 Database Systems
15-721 Advanced Database Systems

Aggregates

聚合函数有如下:

  • AVG(col)

  • MIN(col)

  • MAX(col)

  • SUM(col)

  • COUNT(col)

SELECT COUNT(login) AS cnt
  FROM student WHERE login LIKE '%@cs';

SELECT COUNT(*) AS cnt
  FROM student WHERE login LIKE '%@cs';

SELECT COUNT(1) AS cnt
  FROM student WHERE login LIKE '%@cs';
SELECT AVG(gpa), COUNT(sid)
  FROM student WHERE login LIKE '%@cs';
SELECT COUNT(DISTINCT login)
  FROM student WHERE login LIKE '%@cs';

aggregate不能与其他查询行昏庸

String Operations

image

如在 condition 中判断两个字符串忽略大小写后是否相等:

/* SQL-92 */
WHERE UPPER(name) = UPPER('KaNyE')

/* MySQL */
WHERE name = "KaNyE"

String Matching

SELECT * FROM enrolled AS e
 WHERE e.cid LIKE '15-%';

SELECT * FROM student AS s
 WHERE s.login LIKE '%@c_';

String Operations

SQL-92 定义了一些 string 函数,如

SELECT SUBSTRING(name, 0, 5) AS abbrv_name
  FROM student WHERE sid = 53688;

SELECT * FROM student AS s
 WHERE UPPER(e.name) LIKE 'KAN%';

不同 DBMS 有不同的 string 函数(没有完全遵从 SQL-92 标准),如连接两个 strings

/* SQL-92 */
SELECT name FROM student
 WHERE login = LOWER(name) || '@cs';

/* MySQL */
SELECT name FROM student
 WHERE login = LOWER(name) + '@cs';
SELECT name FROM student
 WHERE login = CONCAT(LOWER(name), '@cs')

Data/Time Operations

不同的DBMS的定义用法不一样。这种不同标准带来的问题非常明显

#---------postgresql---------
SELECT NOW();
#----------------------------

#-----------mysql------------
SELECT NOW();
#----------------------------

#-----------sqlite-----------
SELECT CURRENT_TIMESTAMP;
#----------------------------
#---------postgresql---------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days;# 240 Correct result#----------------------------#-----------mysql------------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days; # 728 error resultSELECT ROUND((UNIX_TIMESTAMP(DATE('2018-08-29')) -  (UNIX_TIMESTAMP(DATE('2018-01-01'))) / (60*60*24), 0) AS days;# 240 Correct resultSELECT DATEDIFF(DATE('2018-08-29'), DATE('2018-01-01')) AS days;#----------------------------#-----------sqlite-----------SELECT DATE('2018-08-29') - DATE('2018-01-01') AS days;# 0 error resultSELECT CAST((julianday(CURRENT_TIMESTAMP) - julianday('2018-01-01')) AS INT) AS days;#----------------------------

Output Redirection

将查询的结果存在另外的表中

  • 表必须是已经存在的表

  • 表必须有输入相同的列数和数据类型

#---------SQL-92.sql---------SELECT DISTINCT cid INTO CourseIds  FROM enrolled;INSERT INTO CourseIds (    SELECT DISTINCT cid FROM enrolled);#----------------------------#-----------mysql------------CREATE TABLE CourseIds (    SELECT DISTINCT cid FROM enrolled);#----------------------------

Output Control

Order By

语法为ORDER BY <column*> [ASC|DESC] 默认为升序

SELECT sid, grade FROM enrolled WHERE cid = '15-721' ORDER BY grade;

按多个字段分别排序

SELECT sid FROM enrolled WHERE cid = '15-721' ORDER BY grade DESC, sid ASC;

同时可以在Order BY中放入表达式

Limit

语法为LIMIT <count> [offset]

这里的offset是要跳过前offset条数据再显示数据

SELECT sid, name FROM student WHERE login LIKE '%@cs' LIMIT 10;SELECT sid, name FROM student WHERE login LIKE '%@cs' LIMIT 20 OFFSET 10;

NESTED QUERIES

nested queries 包含 inner queries 和 outer queries,前者可以出现在 query 的任何位置,且 inner queries 可以引用 outer queries 中的表信息。

例 1:获取所有参与 '15-445' 这门课所有学生的姓名:

SELECT name FROM student WHERE sid IN (   SELECT sid FROM enrolled    WHERE cid = '15-445' );SELECT (SELECT S.name FROM student AS S         WHERE S.sid = E.sid) AS sname  FROM enrolled AS E WHERE cid = '15-445';

语法中支持的谓词包括:

  • ALL: 所有 inner queries 返回的记录都必须满足条件

  • ANY :任意 inner queries 返回的记录满足条件即可

  • IN :与 ANY 等价

  • EXISTS :inner queries 返回的表不为空

SELECT name FROM student WHERE sid = ANY(   SELECT sid FROM enrolled    WHERE cid = '15-445' );

还可以这样写

SELECT (SELECT S.name FROM student AS S WHERE S.sid = E.sid) AS sname FROM enrolled AS EWHERE cid = '15-445'

例 2:找到至少参与一门课程的所有学生中,id 最大的

SELECT sid, name FROM student WHERE sid >= ALL (   SELECT sid FROM enrolled );SELECT sid, name FROM student WHERE sid IN (   SELECT MAX(sid) FROM enrolled );SELECT sid, name FROM student WHERE sid IN (   SELECT sid FROM enrolled    ORDER BY sid DESC LIMIT 1 );

例 3:找到所有没有学生参与的课程

SELECT * FROM course WHERE NOT EXISTS (   SELECT * FROM enrolled    WHERE course.cid = enrolled.cid );

Window Functions

主要与聚合函数的区别是在保留原有数据的前提下,增添了一列作为计算的结果.

两类特殊的windows function

  • ROW_NUMBER() 当前的行数

  • RANK() 当前的顺序位置

例1

SELECT *, ROW_NUMBER() OVER () AS row_num  FROM enrolled;

得到如下表

image

OVER关键字可以将window function计算出的tuples分组

SELECT cid, sid,ROW_NUMBER() OVER (PARTITION BY cid)  FROM enrolled ORDER BY cid;

可以得到如下表

image

例 3:找到每门课获得最高分的学生

SELECT * FROM (  #rank()生成了排序后的排名  SELECT *, #遍历enrolled按照cid分组后以grade升序排列         RANK() OVER (PARTITION BY cid ORDER BY grade ASC)          AS rank    FROM enrolled  ) AS rankingWHERE ranking.rank = 1

Common Table Expreesion

提供一种在你大型查询时的方法替换掉繁琐的嵌套查询

WITH cteName AS (    SELECT 1)SELECT * FROM cteNameWITH cteName (col1, col2) AS (    SELECT 1, 2)SELECT col1 + col2 FROM cteName

例 1:找到所有参与课程的学生中 id 最大的

WITH cteSource(maxId) AS (    SELECT MAX(sid) FROM enrolled)SELECT name FROM student, cteSource WHERE student.sid = cteSource.maxId

利用递归关键字RECURSIVE来打印 1-10

WITH RECURSIVE cteSource (counter) AS (    (SELECT 1)    UNION ALL    (SELECT counter + 1 FROM cteSource      WHERE counter < 10))SELECT * FROM cteSource;

递归的语法为

WITH RECURSIVE name_cte AS (SELECT statement  /* non-recursive statement */UNION [ALL]SELECT statement  /*recursive statement referencing the above select statement */)SELECT * FROM name_cte;

SQLite递归查询与PostgreSQL 类似,过程如下:

  1. 执行非递归部分并创建临时表

  2. 执行递归部分并增加结果至临时表

  3. 重复第二步直到工作表为空

参考链接:

https://blog.csdn.net/neweastsun/article/details/119987245

https://zhenghe.gitbook.io/open-courses/cmu-15-445-645-database-systems/advanced-sql#common-table-expressions

posted @ 2021-09-30 22:20  lizhi334  阅读(89)  评论(0编辑  收藏  举报