MySQL Crash Course #07# Chapter 15. 关系数据库. INNER JOIN. VS. nested subquery
索引
我发现 MySQL 的官方文档里是有教程的。
The SQL Tutorial for Data Analysis | SQL Tutorial - Mode Analytics
Understanding Relational Tables
The key here is that having multiple occurrences of the same data is never a good thing, and that principle is the basis for relational database design. Relational tables are designed so information is split into multiple tables, one for each data type. The tables are related to each other through common values (and thus the relational in relational design).
书上举了一个产品表和供应商表的例子,一个供应商可以对应很多的产品,不把供应商的信息放在每一行产品的理由有如下几点:
- 多个产品的供应商是一致的,重复相同的信息很浪费空间
- 如果供应商的信息改变,你不得不更新每一条该供应商相关的产品记录
- 很大概率出现数据不一致的情况
所以产品和供应商应该分两张表存,两张表都应该有 primary key , 供应商表专门存供应商的信息,而产品表专门存产品的信息,每一个产品记录除了包含一个供应商的 id 属性不应该包含任何供应商的其他信息,这个属性对应的字段叫做 foreign key (和供应商表的 primary key 相关系)这么做有如下几个好处:
- 没有重复数据,节省时间和空间
- 需要修改供应商信息时只需要修改一处就好了
- 因为数据没有被重复,很好的保证了数据一致性
Why Use Joins?
As just explained, breaking data into multiple tables enables more efficient storage(高效存储), easier manipulation(易于操作), and greater scalability(极高的可扩展性). But these benefits come with a price.
If data is stored in multiple tables, how can you retrieve that data with a single SELECT statement?
The answer is to use a join.
It is important to understand that a join is not a physical entity in other words, it does not exist in the actual database tables. A join is created by MySQL as needed, and it persists for the duration of the query execution.
- maintaining referential integrity 是说 MySQL 只允许合法的数据(foreign key 的值在主表中存在的数据)插入到关系表中。
Creating a Join
SELECT vend_name, prod_name, prod_price FROM vendors, products ORDER BY vend_name, prod_name;
SELECT vend_name, prod_name, prod_price FROM vendors INNER JOIN products ON vendors.vend_id = products.vend_id;
- 虽然默认就是 inner join (看这个),但是最好还是用 INNER JOIN ON 语句,这样你就再也不会忘记 JOIN 的类型了。
- 无条件的inner join是笛卡儿积,有条件的才是取交集(看这个)
- JOIN 是在运行时临时做的,关联的表越多越消耗资源,所以不必要就不要乱联表
It Pays to Experiment As you can see, there is often more than one way to perform any given SQL operation. And there is rarely a definitive right or wrong way. Performance can be affected by the type of operation, the amount of data in the tables, whether indexes and keys are present, and a whole slew of other criteria. Therefore, it is often worth experimenting with different selection mechanisms to find the one that works best for you.
联表快还是子查询快取决于具体情况,所以在必要时候可以进行测试。。 。问题 在于 。。 如何测试?? -- > 待更新