一次执行10天的SQL(数据库练习)

一次执行10天的SQL

原创薛晓刚四海内皆兄弟 2023-09-15 07:01 发表于上海

收录于合集

#开发理念54个

#Oracle57个

#MYSQL46个

子查询和关联查询过多的危害（从前标量子查询那篇文章的延续）

子查询过多（很有可能不少都是不必要的，或者设计导致的）会导致SQL变得很慢。下面实验采用MySQL演示但是结论适用于MySQL、PostgreSQL以及Oracle都所有关系型数据库。

创建两张样例表，a和b。a表的id列与b表的aid作为关联条件。

mysql> create table a (id int, name varchar(10));
Query OK, 0 rows affected (0.03 sec)

mysql> create table b (id int,aid int, name varchar(10),c int,s int);
Query OK, 0 rows affected (0.04 sec)

初始化数据：

mysql> insert into a values (1,'A');
Query OK, 1 row affected (0.01 sec)

mysql> insert into a values (2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (3,'C');
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (1,1,'a',1,1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (2,1,'a',1,2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (3,2,'b',1,3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (4,2,'b',1,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (5,3,'c',1,5);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (6,3,'c',1,6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (4,'D');
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (5,'E');
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (7,4,'d',1,7);
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (8,4,'d',1,8);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (9,5,'e',1,9);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (10,5,'e',1,10);
Query OK, 1 row affected (0.00 sec)

经过简单的初始化后：

mysql> select * from a;
+------+------+
| id | name |
+------+------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+------+------+
5 rows in set (0.00 sec)

mysql> select * from b;
+------+------+------+------+------+
| id | aid | name | c | s |
+------+------+------+------+------+
| 1 | 1 | a | 1 | 1 |
| 2 | 1 | a | 1 | 2 |
| 3 | 2 | b | 1 | 3 |
| 4 | 2 | b | 1 | 4 |
| 5 | 3 | c | 1 | 5 |
| 6 | 3 | c | 1 | 6 |
| 7 | 4 | d | 1 | 7 |
| 8 | 4 | d | 1 | 8 |
| 9 | 5 | e | 1 | 9 |
| 10 | 5 | e | 1 | 10 |
+------+------+------+------+------+
10 rows in set (0.00 sec)

给关联列创建索引。
mysql> create index a_id on a (id);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> create index b_id on b (aid);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

进行多个子查询（标量子查询）的查询

SQL1 SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a where a.id=1; 由于需要计算b表的字段的sum和count所以动用到了标量子查询，而且还是两个。

从后台日志中看到 SQL1 查询了5行数据返回1行。这5行是 A表的id=1扫描了1行， B表的count子查询计算时候扫描了2行，B表的sum子查询计算时候扫描了2行。

# User@Host: root[root] @ localhost [] Id: 18
# Query_time: 0.001879 Lock_time: 0.000013 Rows_sent: 1 Rows_examined: 5
SET timestamp=1694589980;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

SQL2，不带任何条件，全表查询。

从后台日志中看到 SQL2 查询了25行数据返回5行。这25行是 A表的全表扫描了5行， B表的count子查询计算时候扫描了10行，B表的sum子查询计算时候扫描了10行。

# User@Host: root[root] @ localhost [] Id: 18
# Query_time: 0.001282 Lock_time: 0.000013 Rows_sent: 5 Rows_examined: 25
SET timestamp=1694589992;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;

可见结果集越多，扫描行数成线性增加。

SQL3：由于需求又变了，需要用到b表的一个字段，于是在id=1的情况下，（与前两个不同，这次只取字段没有sum和count）继续增加子查询。变成了这样;

SELECT a.ID,a.NAME,(SELECT max(b.name) FROM b WHERE a.ID=b.AID) AS a_n,(SELECT max(c) FROM b WHERE a.ID=b.AID) AS a_c,(select max(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

凑后台日志看到SQL3：查询了7行数据返回1行。这5行是 A表的id=1扫描了1行， B表的count子查询计算时候扫描了2行，B表的sum子查询计算时候扫描了2行，B表需要name字段又扫描了2行。

# User@Host: root[root] @ localhost [] Id: 21
# Query_time: 0.004694 Lock_time: 0.000037 Rows_sent: 1 Rows_examined: 7
SET timestamp=1694670424;
SELECT a.ID,a.NAME,(SELECT max(b.name) FROM b WHERE a.ID=b.AID) AS a_n,(SELECT max(c) FROM b WHERE a.ID=b.AID) AS a_c,(select max(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

可见子查询越多，扫描行数成线性增加。

以上还是关联列有索引且用得到的情况下，而且where条件只限定返回1行，在真实环境中几乎都没有这样的理想条件。所以可能是几十万行，几百万行，甚至上亿行的扫描。比如SQL2的语句在去掉索引的环境下。

mysql> alter table a alter index a_id invisible;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table b alter index b_id invisible;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table a\G
*************************** 1. row ***************************
Table: a
Create Table: CREATE TABLE `a` (
`id` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL,
KEY `a_id` (`id`) /*!80000 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> show create table b\G
*************************** 1. row ***************************
Table: b
Create Table: CREATE TABLE `b` (
`id` int DEFAULT NULL,
`aid` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL,
`c` int DEFAULT NULL,
`s` int DEFAULT NULL,
KEY `b_id` (`aid`) /*!80000 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;
+------+------+------+------+
| ID | NAME | a_c | a_s |
+------+------+------+------+
| 1 | A | 2 | 3 |
| 2 | B | 2 | 7 |
| 3 | C | 2 | 11 |
| 4 | D | 2 | 15 |
| 5 | E | 2 | 19 |
+------+------+------+------+
5 rows in set (0.01 sec)

从后台看到的数据就是查询了105行才返回5行。比起带索引时候的25行多了4倍多的扫描行数正式环境的表如果行数多（全表5行，没有索引，多了4行就多4倍。那么如果全表1万行就是一万倍）。

# Query_time: 0.002229 Lock_time: 0.000012 Rows_sent: 5 Rows_examined: 105
SET timestamp=1694680508;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;

所以这个3500亿行的扫描就能理解了吧？

那么解决方案就是减少不必要的关联。

例如SQL3可以改写成这样。

扫描行数从7行变成了2行。

# Query_time: 0.003672 Lock_time: 0.000024 Rows_sent: 1 Rows_examined: 2
SET timestamp=1694671011;
SELECT a.ID,a.NAME,b.name, b.c,b.s from a,b where a.id=b.aid and a.id=1 limit 1;

在工作中有时候见到了太多的不能理解的无关紧要的子查询关联，但是很多开发对此问题不以为然。希望更多的人能理解优化就是能不干的就不干。