一次执行10天的SQL(数据库练习)

一次执行10天的SQL

    子查询和关联查询过多的危害(从前标量子查询那篇文章的延续)    

    子查询过多(很有可能不少都是不必要的,或者设计导致的)会导致SQL变得很慢。下面实验采用MySQL演示但是结论适用于MySQL、PostgreSQL以及Oracle都所有关系型数据库。

    创建两张样例表,a和b。a表的id列与b表的aid作为关联条件。

     mysql> create table a (id int, name varchar(10));
     Query OK, 0 rows affected (0.03 sec)

     mysql> create table b (id int,aid int, name varchar(10),c int,s int);
     Query OK, 0 rows affected (0.04 sec)

 

初始化数据:

mysql> insert into a values (1,'A');
Query OK, 1 row affected (0.01 sec)

mysql> insert into a values (2,'B');
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (3,'C');
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (1,1,'a',1,1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (2,1,'a',1,2);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (3,2,'b',1,3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (4,2,'b',1,4);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (5,3,'c',1,5);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (6,3,'c',1,6);
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (4,'D');
Query OK, 1 row affected (0.00 sec)

mysql> insert into a values (5,'E');
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (7,4,'d',1,7);
Query OK, 1 row affected (0.01 sec)

mysql> insert into b values (8,4,'d',1,8);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (9,5,'e',1,9);
Query OK, 1 row affected (0.00 sec)

mysql> insert into b values (10,5,'e',1,10);
Query OK, 1 row affected (0.00 sec)

 

经过简单的初始化后:

mysql> select * from a;
+------+------+
| id | name |
+------+------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | E |
+------+------+
5 rows in set (0.00 sec)

mysql> select * from b;
+------+------+------+------+------+
| id | aid | name | c | s |
+------+------+------+------+------+
| 1 | 1 | a | 1 | 1 |
| 2 | 1 | a | 1 | 2 |
| 3 | 2 | b | 1 | 3 |
| 4 | 2 | b | 1 | 4 |
| 5 | 3 | c | 1 | 5 |
| 6 | 3 | c | 1 | 6 |
| 7 | 4 | d | 1 | 7 |
| 8 | 4 | d | 1 | 8 |
| 9 | 5 | e | 1 | 9 |
| 10 | 5 | e | 1 | 10 |
+------+------+------+------+------+
10 rows in set (0.00 sec)


给关联列创建索引。
mysql> create index a_id on a (id);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> create index b_id on b (aid);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

进行多个子查询(标量子查询)的查询

SQL1 SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;  由于需要计算b表的字段的sum和count所以动用到了标量子查询,而且还是两个。

图片

   从后台日志中看到 SQL1 查询了5行数据返回1行。   这5行是 A表的id=1扫描了1行, B表的count子查询计算时候扫描了2行,B表的sum子查询计算时候扫描了2行。

# User@Host: root[root] @ localhost [] Id: 18
# Query_time: 0.001879 Lock_time: 0.000013 Rows_sent: 1 Rows_examined: 5
SET timestamp=1694589980;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

 

SQL2,不带任何条件,全表查询。

图片

 

  从后台日志中看到 SQL2 查询了25行数据返回5行。   这25行是 A表的全表扫描了5行, B表的count子查询计算时候扫描了10行,B表的sum子查询计算时候扫描了10行。

# User@Host: root[root] @ localhost [] Id: 18
# Query_time: 0.001282 Lock_time: 0.000013 Rows_sent: 5 Rows_examined: 25
SET timestamp=1694589992;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;

 

可见结果集越多,扫描行数成线性增加。

 

SQL3:由于需求又变了,需要用到b表的一个字段,于是在id=1的情况下,(与前两个不同,这次只取字段没有sum和count)继续增加子查询。变成了这样;

SELECT a.ID,a.NAME,(SELECT max(b.name) FROM b WHERE a.ID=b.AID) AS a_n,(SELECT max(c) FROM b WHERE a.ID=b.AID) AS a_c,(select max(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

图片

凑后台日志看到SQL3:查询了7行数据返回1行。   这5行是 A表的id=1扫描了1行, B表的count子查询计算时候扫描了2行,B表的sum子查询计算时候扫描了2行,B表需要name字段又扫描了2行。

# User@Host: root[root] @ localhost [] Id: 21
# Query_time: 0.004694 Lock_time: 0.000037 Rows_sent: 1 Rows_examined: 7
SET timestamp=1694670424;
SELECT a.ID,a.NAME,(SELECT max(b.name) FROM b WHERE a.ID=b.AID) AS a_n,(SELECT max(c) FROM b WHERE a.ID=b.AID) AS a_c,(select max(s) from b where a.id=b.aid) as a_s FROM a where a.id=1;

 

可见子查询越多,扫描行数成线性增加。

 

以上还是关联列有索引且用得到的情况下,而且where条件只限定返回1行,在真实环境中几乎都没有这样的理想条件。所以可能是几十万行,几百万行,甚至上亿行的扫描。比如SQL2的语句在去掉索引的环境下。

mysql> alter table a alter index a_id invisible;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table b alter index b_id invisible;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> show create table a\G
*************************** 1. row ***************************
Table: a
Create Table: CREATE TABLE `a` (
`id` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL,
KEY `a_id` (`id`) /*!80000 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> show create table b\G
*************************** 1. row ***************************
Table: b
Create Table: CREATE TABLE `b` (
`id` int DEFAULT NULL,
`aid` int DEFAULT NULL,
`name` varchar(10) DEFAULT NULL,
`c` int DEFAULT NULL,
`s` int DEFAULT NULL,
KEY `b_id` (`aid`) /*!80000 INVISIBLE */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;
+------+------+------+------+
| ID | NAME | a_c | a_s |
+------+------+------+------+
| 1 | A | 2 | 3 |
| 2 | B | 2 | 7 |
| 3 | C | 2 | 11 |
| 4 | D | 2 | 15 |
| 5 | E | 2 | 19 |
+------+------+------+------+
5 rows in set (0.01 sec)

 

从后台看到的数据就是查询了105行才返回5行。比起带索引时候的25行多了4倍多的扫描行数正式环境的表如果行数多(全表5行,没有索引,多了4行就多4倍。那么如果全表1万行就是一万倍)。

# Query_time: 0.002229 Lock_time: 0.000012 Rows_sent: 5 Rows_examined: 105
SET timestamp=1694680508;
SELECT a.ID,a.NAME,(SELECT COUNT(*) FROM b WHERE a.ID=b.AID) AS a_c,(select sum(s) from b where a.id=b.aid) as a_s FROM a;

 

所以这个3500亿行的扫描就能理解了吧?

图片

那么解决方案就是减少不必要的关联。

 

例如SQL3可以改写成这样。

图片

扫描行数从7行变成了2行。

# Query_time: 0.003672 Lock_time: 0.000024 Rows_sent: 1 Rows_examined: 2
SET timestamp=1694671011;
SELECT a.ID,a.NAME,b.name, b.c,b.s from a,b where a.id=b.aid and a.id=1 limit 1;

 

在工作中有时候见到了太多的不能理解的无关紧要的子查询关联,但是很多开发对此问题不以为然。希望更多的人能理解优化就是能不干的就不干。

图片

 

薛晓刚
收录于合集 #开发理念
 54
上一篇这些SQL看着就违法下一篇早就说了连接数别太多呀
 
阅读 2497
四海内皆兄弟
 
 
 
 
posted @ 2023-09-28 09:13  往事已成昨天  阅读(11)  评论(0编辑  收藏  举报