MySQL-CTE基础

CTE(Common Table Exp)

官网:https://dev.mysql.com/doc/refman/8.0/en/with.html'

CTE可以用于解决以下问题:

  1. 数据呈树状结构,要查询多层数据时,可以使用递归CTE
  2. 一个子查询的结果需要多次被引用,可以定义一个CTE,并多次对其进行引用
  3. 统计时,统计维度的部分数据不存在,可以使用递归CTE生成其默认值


1. 基本语法

CTE使用WITH关键字来进行定义

with_clause:
    WITH [RECURSIVE]
        cte_name [(col_name [, col_name] ...)] AS (subquery)
        [, cte_name [(col_name [, col_name] ...)] AS (subquery)] ...

如果一个CTE中引用了它自己的别名,那么这就是一个递归的CTE(RECURSIVE)。



2. CTE列名

  • 紧跟WITH子句定义列名

    WITH cte (col1, col2) AS
    (
      SELECT 1, 2
      UNION ALL
      SELECT 3, 4
    )
    SELECT col1, col2 FROM cte;
    
  • 在第一个SELECT子句中定义列名

    WITH cte AS
    (
      SELECT 1 AS col1, 2 AS col2
      UNION ALL
      SELECT 3, 4
    )
    SELECT col1, col2 FROM cte;
    


3. WITH子句的使用范围

  • UPDATEDELETESELECT之前

    WITH ... SELECT ...
    WITH ... UPDATE ...
    WITH ... DELETE ...
    
  • 在子查询中

    SELECT ... WHERE id IN (WITH ... SELECT ...) ...
    SELECT * FROM (WITH ... SELECT ...) AS dt ...
    
  • 特定的包含SELECT关键字的查询中,SELECT之前

    INSERT ... WITH ... SELECT ...
    REPLACE ... WITH ... SELECT ...
    CREATE TABLE ... WITH ... SELECT ...
    CREATE VIEW ... WITH ... SELECT ...
    DECLARE CURSOR ... WITH ... SELECT ...
    EXPLAIN ... WITH ... SELECT ...
    
  • 在同一层查询中,并列定义多个CTE需要用,分割

    WITH cte1 AS (...) WITH cte2 AS (...) SELECT ...   //不合法
    
    WITH cte1 AS (...), cte2 AS (...) SELECT ... //合法
    

引用其他CTE

  • 如果CTE引用自己,则成为递归CTE
  • CTE只能引用在自己之前定义的其他CTE,这个限制避免了循环引用
  • CTE可以引用外层查询的CTE,但不能引用内层查询的CTE


4. 递归CTE

递归CTE在子查询中引用它本身

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte WHERE n < 5
)
SELECT * FROM cte;

+------+
| n    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
+------+

递归CTE有以下特点:

  • 在CTE引用自身时,必须使用关键字WITH RECURSIVE。如果只使用WITH,则会报错

    ERROR 1146 (42S02): Table 'cte_name' doesn't exist
    
  • 递归CTE的子查询包含两个部分,这两个部分被UNION [ALL]或者UNION DISTINCT连接。

    • 第一个查询用于初始化数据,不会引用CTE
    • 第二个查询根据CTE生成新的数据,当这个部分的查询不会产生新数据时,递归结束
    SELECT ...      -- return initial row set
    UNION ALL
    SELECT ...      -- return additional row sets
    
  • 递归CTE返回的列和数据类型(包括长度)只根据第一部分SELECT判断,递归SELECT中不会使用类型推断

  • 使用UNION DISTINCT在一定程度上能避免死循环

  • 每次迭代的输入为上次迭代生成的数据(新生成的)。如果递归部分有多个查询块,每个查询块的迭代不一定按照顺序执行,并且每个查询块的输入为上一个查询(可能是上次迭代生成的行、可能是当前迭代的上一次查询生成的行

    Each iteration of the recursive part operates only on the rows produced by the previous iteration. If the recursive part has multiple query blocks, iterations of each query block are scheduled in unspecified order, and each query block operates on rows that have been produced either by its previous iteration or by other query blocks since that previous iteration's end.


递归执行逻辑

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte WHERE n < 5
)
SELECT * FROM cte;
  1. 第一次迭代:创建初始行,生成数据1 + 1 = 2。
  2. 第二次迭代:根据第一次迭代生成的2,生成新的数据2 + 1 = 3
  3. 直到生成的数据n=5,不满足条件不会返回任何数据,此时迭代结束

自定义返回数据类型

如果在严格模式下执行上面的SQL会报错,非严格模式下才会出现如下结果。是因为

  • 初始化的SELECT定义了str的类型,即char(3)
WITH RECURSIVE cte AS
(
  SELECT 1 AS n, 'abc' AS str
  UNION ALL
  SELECT n + 1, CONCAT(str, str) FROM cte WHERE n < 3
)
SELECT * FROM cte;

# 非严格
+------+------+
| n    | str  |
+------+------+
|    1 | abc  |
|    2 | abc  |
|    3 | abc  |
+------+------+

# 严格
ERROR 1406 (22001): Data too long for column 'str' at row 1

为了避免这个问题,需要使用CAST指定数据类型。

WITH RECURSIVE cte AS
(
  SELECT 1 AS n, CAST('abc' AS CHAR(20)) AS str
  UNION ALL
  SELECT n + 1, CONCAT(str, str) FROM cte WHERE n < 3
)
SELECT * FROM cte;

列顺序

递归SELECT中访问初始化SELECT,是使用别名访问的,因此可以不保证顺序

WITH RECURSIVE cte AS
(
  SELECT 1 AS n, 1 AS p, -1 AS q
  UNION ALL
	# q , p顺序无所谓
  SELECT n + 1, q * 2, p * 2 FROM cte WHERE n < 5
)
SELECT * FROM cte;

+------+------+------+
| n    | p    | q    |
+------+------+------+
|    1 |    1 |   -1 |
|    2 |   -2 |    2 |
|    3 |    4 |   -4 |
|    4 |   -8 |    8 |
|    5 |   16 |  -16 |
+------+------+------+

递归CTE语法约束

在递归的SELECT中不能使用以下语法:

  • 聚合函数
  • 窗口函数
  • GROUP BY
  • ORDER BY
  • DISTINCT
  • LIMIT(8.0.19之前不能用):只能使用在递归CTE中,并且当递归CTE返回行达到目标数量后,立即停止迭代
  • 递归CTE只能在FROM部分引用CTE一次,不能在子查询中再次使用CTE。但可以使用其他表,此时CTE如果需要和其他表进行JOIN,CTE不能在LEFT JOIN的右边(即CTE必须是主表)。

The recursive [SELECT](https://dev.mysql.com/doc/refman/8.0/en/select.html) part must reference the CTE only once and only in its FROM clause, not in any subquery. It can reference tables other than the CTE and join them with the CTE. If used in a join like this, the CTE must not be on the right side of a LEFT JOIN.

EXPLAIN返回分析数据为每次迭代的估计耗时,由于不知道每次迭代的数据量和迭代总次数,和实际结果可能有比较大的出入。


控制递归CTE

为了避免死循环,可以用以下方法:

设置session或global参数

  • [cte_max_recursion_depth](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_cte_max_recursion_depth):可以限制CTE的最大执行深度,其默认值是1000
  • [max_execution_time](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_execution_time):当前SESSION可执行的最长时间

在CTE中设置optimizer hint

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte
)
SELECT /*+ SET_VAR(cte_max_recursion_depth = 1M) */ * FROM cte;

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte
)
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM cte;

如果版本允许,可以使用LIMIT限制返回的最大长度

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte LIMIT 10000
)
SELECT * FROM cte;

结合使用LIMIToptimizer hint

WITH RECURSIVE cte (n) AS
(
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM cte LIMIT 10000
)
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM cte;

斐波拉契数列

WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS
(
  SELECT 1, 0, 1
  UNION ALL
  SELECT n + 1, next_fib_n, fib_n + next_fib_n
    FROM fibonacci WHERE n < 10
)
SELECT * FROM fibonacci;

数据补完

对数据进行统计后,日期之间存在“空洞”,想填充这个空洞可以用CTE实现

mysql> SELECT date, SUM(price) AS sum_price
       FROM sales
       GROUP BY date
       ORDER BY date;
+------------+-----------+
| date       | sum_price |
+------------+-----------+
| 2017-01-03 |    300.00 |
| 2017-01-06 |     50.00 |
| 2017-01-08 |    180.00 |
| 2017-01-10 |      5.00 |
+------------+-----------+

首先用CTE生成连续日期的数据

  • 以sales的最小日期开始
  • 每次使用dates CTE生成一条新数据
  • 直到sales的最大日期结束
WITH RECURSIVE dates (date) AS
(
  SELECT MIN(date) FROM sales
  UNION ALL
  SELECT date + INTERVAL 1 DAY FROM dates
  WHERE date + INTERVAL 1 DAY <= (SELECT MAX(date) FROM sales)
)
SELECT * FROM dates;

+------------+
| date       |
+------------+
| 2017-01-03 |
| 2017-01-04 |
| 2017-01-05 |
| 2017-01-06 |
| 2017-01-07 |
| 2017-01-08 |
| 2017-01-09 |
| 2017-01-10 |
+------------+

然后用LEFT JOIN将CTE生成的数据和原始数据关联

  • COALESCE()会返回第一个不为NULL的数据
WITH RECURSIVE dates (date) AS
(
  SELECT MIN(date) FROM sales
  UNION ALL
  SELECT date + INTERVAL 1 DAY FROM dates
  WHERE date + INTERVAL 1 DAY <= (SELECT MAX(date) FROM sales)
)
SELECT dates.date, COALESCE(SUM(price), 0) AS sum_price
FROM dates LEFT JOIN sales ON dates.date = sales.date
GROUP BY dates.date
ORDER BY dates.date;

+------------+-----------+
| date       | sum_price |
+------------+-----------+
| 2017-01-03 |    300.00 |
| 2017-01-04 |      0.00 |
| 2017-01-05 |      0.00 |
| 2017-01-06 |     50.00 |
| 2017-01-07 |      0.00 |
| 2017-01-08 |    180.00 |
| 2017-01-09 |      0.00 |
| 2017-01-10 |      5.00 |
+------------+-----------+

层级数据遍历

数据如下,Yasmina为最上层的管理者

mysql> SELECT * FROM employees ORDER BY id;
+------+---------+------------+
| id   | name    | manager_id |
+------+---------+------------+
|   29 | Pedro   |        198 |
|   72 | Pierre  |         29 |
|  123 | Adil    |        692 |
|  198 | John    |        333 |
|  333 | Yasmina |       NULL |
|  692 | Tarek   |        333 |
| 4610 | Sarah   |         29 |
+------+---------+------------+

查询每个人从上到下的路径

WITH RECURSIVE employee_paths (id, name, path) AS
(
  SELECT id, name, CAST(id AS CHAR(200))
    FROM employees
    WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, CONCAT(ep.path, ',', e.id)
    FROM employee_paths AS ep JOIN employees AS e
      ON ep.id = e.manager_id
)
SELECT * FROM employee_paths ORDER BY path;

+------+---------+-----------------+
| id   | name    | path            |
+------+---------+-----------------+
|  333 | Yasmina | 333             |
|  198 | John    | 333,198         |
|   29 | Pedro   | 333,198,29      |
| 4610 | Sarah   | 333,198,29,4610 |
|   72 | Pierre  | 333,198,29,72   |
|  692 | Tarek   | 333,692         |
|  123 | Adil    | 333,692,123     |
+------+---------+-----------------+

如果不进行排序,结果如下。可以看出使用的是广度优先遍历

+----+-------+---------------+
|id  |name   |path           |
+----+-------+---------------+
|333 |Yasmina|333            |
|198 |John   |333,198        |
|692 |Tarek  |333,692        |
|29  |Pedro  |333,198,29     |
|123 |Adil   |333,692,123    |
|72  |Pierre |333,198,29,72  |
|4610|Sarah  |333,198,29,4610|
+----+-------+---------------+


CTE和派生表对比

# CTE
WITH cte AS (SELECT 1) SELECT * FROM cte;
# 派生表
SELECT * FROM (SELECT 1) AS dt;

CTE相比于派生表有以下优势:

  • 在同一个SQL中CTE可以被引用多次,而派生表只能被引用一次。要实现同样的效果,需要多次生成派生表
  • CTE可以被自己引用,即递归CTE
  • CTE可以引用其他CTE
  • CTE的可读性更好

CTE类似于[CREATE [TEMPORARY] TABLE](https://dev.mysql.com/doc/refman/8.0/en/create-table.html),但它不需要创建表的权限。



[]()

其他资料

posted @ 2022-07-23 15:01  hikari_1994  阅读(249)  评论(0编辑  收藏  举报