SQL JOIN 的常见用法-MySQL、Hive、Spark、Presto
无论是关系型数据库,还是SQL on Hadoop类的大数据技术组件,都有SQL JOIN功能,join大致分为内连接(inner join)、左外连接(left outer join)、右外连接(right outer join)、全外连接(full outer join)。
笛卡尔积
要理解各种JOIN首先要理解笛卡尔积。笛卡尔积就是将A表的每一条记录与B表的每一条记录进行关联。所以,如果A表有n条记录,B表有m条记录,笛卡尔积产生的结果就会产生n*m条记录。比如,如果t_1表记录3条,B表记录3条,笛卡尔积结果就是9条。而JOIN的结果,可以简单理解为对笛卡尔积结果的再处理。
笛卡尔积结果
mysql> select * from t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 2 | lisi |
| 4 | wangwu |
+----+----------+
mysql> select * from t2;
+----+--------+
| id | gender |
+----+--------+
| 1 | men |
| 2 | women |
| 3 | men |
+----+--------+
mysql> select * from t1,t2 order by t1.id;
+----+----------+----+--------+
| id | name | id | gender |
+----+----------+----+--------+
| 1 | zhangsan | 1 | men |
| 1 | zhangsan | 2 | women |
| 1 | zhangsan | 3 | men |
| 2 | lisi | 1 | men |
| 2 | lisi | 2 | women |
| 2 | lisi | 3 | men |
| 4 | wangwu | 2 | women |
| 4 | wangwu | 3 | men |
| 4 | wangwu | 1 | men |
+----+----------+----+--------+
内连接:INNER JOIN
内连接INNER JOIN是最常用的连接操作。从数学的角度讲就是求两个表的交集,从笛卡尔积的角度讲就是从笛卡尔积结果中挑选出“t1.id =t2.id”的结果。
内连接结果
mysql> select * from t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 2 | lisi |
| 4 | wangwu |
+----+----------+
mysql> select * from t2;
+----+--------+
| id | gender |
+----+--------+
| 1 | men |
| 2 | women |
| 3 | men |
+----+--------+
mysql> select * from t1 inner join t2 on t1.id =t2.id order by t1.id;
+----+----------+----+--------+
| id | name | id | gender |
+----+----------+----+--------+
| 1 | zhangsan | 1 | men |
| 2 | lisi | 2 | women |
+----+----------+----+--------+
左外连接:LEFT OUTER JOIN
左外连接LEFT JOUTER OIN的含义就是求两个表的交集外加左表剩下的数据。从笛卡尔积的角度讲,就是先从笛卡尔积中挑出子句条件“t1.id =t2.id”成立的记录,然后加上左表中剩余的记录。
mysql> select * from t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 2 | lisi |
| 4 | wangwu |
+----+----------+
mysql> select * from t2;
+----+--------+
| id | gender |
+----+--------+
| 1 | men |
| 2 | women |
| 3 | men |
+----+--------+
mysql> select * from t1 left outer join t2 on t1.id =t2.id order by t1.id;
+----+----------+------+--------+
| id | name | id | gender |
+----+----------+------+--------+
| 1 | zhangsan | 1 | men |
| 2 | lisi | 2 | women |
| 4 | wangwu | NULL | NULL |
+----+----------+------+--------+
右外连接:RIGHT OUTER JOIN
同理右外连接RIGHT OUTER JOIN就是求两个表的交集外加右表剩下的数据。再次从笛卡尔积的角度描述,右连接就是从笛卡尔积中挑出子句条件“t1.id =t2.id”成立的记录,然后加上右表中剩余的记录。
mysql> select * from t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 2 | lisi |
| 4 | wangwu |
+----+----------+
mysql> select * from t2;
+----+--------+
| id | gender |
+----+--------+
| 1 | men |
| 2 | women |
| 3 | men |
+----+--------+
mysql> select * from t1 right outer join t2 on t1.id =t2.id order by t1.id;
+------+----------+----+--------+
| id | name | id | gender |
+------+----------+----+--------+
| NULL | NULL | 3 | men |
| 1 | zhangsan | 1 | men |
| 2 | lisi | 2 | women |
+------+----------+----+--------+
全外连接:FULL OUTER JOIN
全外连接就是求两个集合的并集。从笛卡尔积的角度讲就是从笛卡尔积中挑出子句条件“t1.id =t2.id”成立的记录,然后加上左表中剩余的记录,最后加上右表中剩余的记录。另外MySQL不支持OUTER JOIN,但是我们可以对左连接和右连接的结果做UNION操作来实现。
mysql> select * from t1;
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
| 2 | lisi |
| 4 | wangwu |
+----+----------+
mysql> select * from t2;
+----+--------+
| id | gender |
+----+--------+
| 1 | men |
| 2 | women |
| 3 | men |
+----+--------+
mysql> select * from t1 right outer join t2 on t1.id = t2.id union (select * from t1 left outer join t2 on t1.id = t2.id);
+------+----------+------+--------+
| id | name | id | gender |
+------+----------+------+--------+
| 1 | zhangsan | 1 | men |
| 2 | lisi | 2 | women |
| NULL | NULL | 3 | men |
| 4 | wangwu | NULL | NULL |
+------+----------+------+--------+