MySQL Crash Course #06# Chapter 13. 14 GROUP BY. 子查询
索引
- 理解 GROUP BY
- 过滤数据 vs. 过滤分组
- GROUP BY 与 ORDER BY 之不成文的规定
- 子查询 vs. 联表查询
- 相关子查询和不相关子查询. 增量构造复杂查询
- Always More Than One Solution As explained earlier in this chapter, although the sample code shown here works, it is often not the most efficient way to perform this type of data retrieval. You will revisit this example in a later chapter.
Understanding Data Grouping
mysql> SELECT COUNT(*) AS num_prods -> FROM products -> WHERE vend_id=1003; +-----------+ | num_prods | +-----------+ | 7 | +-----------+ 1 row in set (0.00 sec)
我们可以通过改变 WHERE 条件中与 vend_id 判等的值(1003. 1004. 1005 .. .)来获取各个供货商的产品数量,但是没办法一次性把它们罗列出来,GROUP BY 恰好可以解决这个问题:
mysql> SELECT vend_id, COUNT(*) AS num_prods -> FROM products -> GROUP BY vend_id; +---------+-----------+ | vend_id | num_prods | +---------+-----------+ | 1001 | 3 | | 1002 | 2 | | 1003 | 7 | | 1005 | 2 | +---------+-----------+ 4 rows in set (0.00 sec)
分组允许把数据分为多个逻辑组,以便能对每个组进行聚集计算。
mysql> SELECT vend_id, COUNT(*) AS num_prods -> FROM products -> GROUP BY vend_id WITH ROLLUP; +---------+-----------+ | vend_id | num_prods | +---------+-----------+ | 1001 | 3 | | 1002 | 2 | | 1003 | 7 | | 1005 | 2 | | NULL | 14 | +---------+-----------+ 5 rows in set (0.00 sec)
↑ 利用该 关键字可以同时拿到汇总值。
The GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.
Filtering Groups
mysql> SELECT cust_id, COUNT(*) AS orders -> FROM orders -> GROUP BY cust_id -> HAVING COUNT(*) >= 2; +---------+--------+ | cust_id | orders | +---------+--------+ | 10001 | 2 | +---------+--------+ 1 row in set (0.00 sec)
如果有 WHERE 那必须是在 GROUP BY 的上面。
WHERE filters before data is grouped, and HAVING filters after data is grouped.
Grouping and Sorting
mysql> SELECT order_num, SUM(quantity*item_price) AS ordertotal -> FROM orderitems -> GROUP BY order_num -> HAVING SUM(quantity*item_price) >= 50 -> ORDER BY ordertotal; # Finally, the output is sorted using the ORDER BY clause. +-----------+------------+ | order_num | ordertotal | +-----------+------------+ | 20006 | 55.00 | | 20008 | 125.00 | | 20005 | 149.87 | | 20007 | 1000.00 | +-----------+------------+ 4 rows in set (0.00 sec)
Don't Forget ORDER BY As a rule, anytime you use a GROUP BY clause, you should also specify an ORDER BY clause. That is the only way to ensure that data is sorted properly. Never rely on GROUP BY to sort your data.
总之,最好在用 GROUP BY 的时候顺手给出 ORDER BY , 除非你完全不在意顺序。
多个字段的 GROUP BY 可以参考这篇文章。
Understanding Subqueries
设计数据库需要遵循一些范式,而做范式的基本手段就是拆表,因此数据被分散都若干个表中是不可避免的,很多时候,采用子查询会让事情变得更简单。下面是一个简单的例子:
即将用到的几张表 ↓
mysql> SELECT * -> FROM orders -> LIMIT 3; +-----------+---------------------+---------+ | order_num | order_date | cust_id | +-----------+---------------------+---------+ | 20005 | 2005-09-01 00:00:00 | 10001 | | 20006 | 2005-09-12 00:00:00 | 10003 | | 20007 | 2005-09-30 00:00:00 | 10004 | +-----------+---------------------+---------+ 3 rows in set (0.00 sec) mysql> SELECT * -> FROM orderitems -> LIMIT 3; +-----------+------------+---------+----------+------------+ | order_num | order_item | prod_id | quantity | item_price | +-----------+------------+---------+----------+------------+ | 20005 | 1 | ANV01 | 10 | 5.99 | | 20005 | 2 | ANV02 | 3 | 9.99 | | 20005 | 3 | TNT2 | 5 | 10.00 | +-----------+------------+---------+----------+------------+ 3 rows in set (0.00 sec) mysql> SELECT cust_id, cust_name -> FROM customers -> LIMIT 3; +---------+-------------+ | cust_id | cust_name | +---------+-------------+ | 10001 | Coyote Inc. | | 10002 | Mouse House | | 10003 | Wascals | +---------+-------------+ 3 rows in set (0.00 sec)
假设你现在希望得到购买了 TNT2 的顾客的清单,实际上可以分成下面几个查询:
先找出所有 'TNT2' 相关的订单号,然后通过订单号可以找到对应的顾客号,最后通过顾客号再找到顾客信息:
mysql> SELECT order_num -> FROM orderitems -> WHERE prod_id = 'TNT2'; +-----------+ | order_num | +-----------+ | 20005 | | 20007 | +-----------+ 2 rows in set (0.00 sec)
mysql> SELECT cust_id -> FROM orders -> WHERE order_num IN (20005,20007); +---------+ | cust_id | +---------+ | 10001 | | 10004 | +---------+ 2 rows in set (0.00 sec)
... / 这好几个查询是可以写在一起的:
mysql> SELECT cust_name, cust_contact -> FROM customers -> WHERE cust_id IN (SELECT cust_id -> FROM orders -> WHERE order_num IN (SELECT order_num -> FROM orderitems -> WHERE prod_id = 'TNT2')); +----------------+--------------+ | cust_name | cust_contact | +----------------+--------------+ | Coyote Inc. | Y Lee | | Yosemite Place | Y Sam | +----------------+--------------+ 2 rows in set (0.00 sec)
仅从拿数据的角度分析上面的命令:x. 最终是要拿到 cust_name 和 cust_contact ,所以首先 SELECT cust_name, cust_contact ,从哪里拿呢?FROM customers,约束条件. cust_id 必须在某个集合内,然后 又 回到 x. 重复,一层一层写下去 ...
效率问题:Subqueries and Performance The code shown here works, and it achieves the desired result. However, using subqueries is not always the most efficient way to perform this type of data retrieval, although it might be. More on this is in Chapter 15, "Joining Tables," where you will revisit this same example.
Using Subqueries As Calculated Fields
相关子查询就像是一个嵌套的 for 循环 ...
SELECT cust_name, cust_state, (SELECT COUNT(*) FROM orders WHERE orders.cust_id = customers.cust_id) AS orders FROM customers ORDER BY cust_name;
使用字段全名是必要的(这可以算是相关子查询的一个特征),否则 mysql 会把 cust_id = cust_id 当成内查询的表字段自己和自己比较。
外查询每找到一条记录就会 执行 一次 子查询,类似于 SELECT x - 2017 ,每找到一条记录都要做一次运算(这里是减2017)。
Build Queries with Subqueries Incrementally Testing and debugging queries with subqueries can be tricky, particularly as these statements grow in complexity. The safest way to build (and test) queries with subqueries is to do so incrementally, in much the same way as MySQL processes them. Build and test the innermost query first. Then build and test the outer query with hard-coded data, and only after you have verified that it is working embed the subquery. Then test it again. And keep repeating these steps as for each additional query. This will take just a little longer to construct your queries, but doing so saves you lots of time later (when you try to figure out why queries are not working) and significantly increases the likelihood of them working the first time.