Combining Queries with Set & Join Operators【SQL】

SET   Operator

 

 

 

PROC SQL can combine the results of two or more queries in various ways by using the following set operators:

UNION

produces all unique rows from both queries.

o     All unique rows from both tables are selected.
o     Resulting columns are determined by the first table.
o     Columns are overlaid in the order they appear, not by matching names.
o     Overlaid columns must have the same data type.
EXCEPT

produces rows that are part of the first query only.

o     Unique rows from the first table that are not found in the second table are selected.
o     Resulting columns are determined by the first table.
INTERSECT  

produces rows that are common to both query results.

o     Common unique rows from both tables are selected.
o     Resulting columns are determined by the first table.
o     Columns are overlaid in the order they appear,not by matching names.
o     Overlaid columns must have the same data type.
OUTER UNION

concatenates the query results.

o     All rows from both tables, unique as well as nonunique are selected.
o     All columns from both tables are selected.
o     Columns are not overlaid.

 

 

The operator is used between the two queries, for example:

select columns from table 
set-operator
select columns from table;

 

Place a semicolon after the last SELECT statement only. Set operators combine columns from two queries based on their position in the referenced tables without regard to the individual column names. Columns in the same relative position in the two queries must have the same data types. The column names of the tables in the first query become the column names of the output table. For information about using set operators with more than two query results, see the section about the SQL procedure in the Base SAS Procedures Guide. The following optional keywords give you more control over set operations:

ALL

does not suppress duplicate rows. When the keyword ALL is specified, PROC SQL does not make a second pass through the data to eliminate duplicate rows. Thus, using ALL is more efficient than not using it. ALL is not necessary with the OUTER UNION operator.

CORRESPONDING (CORR)

overlays columns that have the same name in both tables. When used with EXCEPT, INTERSECT, and UNION, CORR suppresses columns that are not in both tables.

Each set operator is described and used in an example based on the following two tables.

 

Tables Used in Set Operation Examples

                                    Table A

                                      x  y
                               ------------------
                                      1  one     
                                      2  two     
                                      2  two     
                                      3  three   
                                    Table B

                                      x  z
                               ------------------
                                      1  one     
                                      2  two     
                                      4  four    

Whereas join operations combine tables horizontally, set operations combine tables vertically. Therefore, the set diagrams that are included in each section are displayed vertically.

Producing Unique Rows from Both Queries (UNION)

 

SAS SQL SET操作[1] - 蝉弟 - 数据分析小矮人

The UNION operator combines two query results. It produces all the unique rows that result from both queries; that is, it returns a row if it occurs in the first table, the second, or both. UNION does not return duplicate rows. If a row occurs more than once, then only one occurrence is returned.

proc sql;
   title 'A UNION B';
   select * from sql.a
   union
   select * from sql.b;
 

 

Producing Unique Rows from Both Queries (UNION)

                                   A UNION B

                                      x  y
                               ------------------
                                      1  one     
                                      2  two     
                                      3  three   
                                      4  four    

You can use the ALL keyword to request that duplicate rows remain in the output.

proc sql;
   title 'A UNION ALL B';
   select * from sql.a
   union all
   select * from sql.b;

 

  

Producing Rows from Both Queries (UNION ALL)

                                 A UNION ALL B

                                      x  y
                               ------------------
                                      1  one     
                                      2  two     
                                      2  two     
                                      3  three   
                                      1  one     
                                      2  two     
                                      4  four    

Producing Rows That Are in Only the First Query Result (EXCEPT)

 

SAS SQL SET操作[1] - 蝉弟 - 数据分析小矮人

The EXCEPT operator returns rows that result from the first query but not from the second query. In this example, the row that contains the values 3 and three exists in the first query (table A) only and is returned by EXCEPT.

proc sql;
   title 'A EXCEPT B';
   select * from sql.a
   except
   select * from sql.b;

 

 

 

Producing Rows That Are in Only the First Query Result (EXCEPT)

                                   A EXCEPT B

                                      x  y
                               ------------------
                                      3  three   

Note that the duplicated row in Table A containing the values 2 and two does not appear in the output. EXCEPT does not return duplicate rows that are unmatched by rows in the second query. Adding ALL keeps any duplicate rows that do not occur in the second query.

proc sql;
   title 'A EXCEPT ALL B';
   select * from sql.a
   except all
   select * from sql.b;

 

 

Producing Rows That Are in Only the First Query Result (EXCEPT ALL)

                                 A EXCEPT ALL B

                                      x  y
                               ------------------
                                      2  two     
                                      3  three   

Producing Rows That Belong to Both Query Results (INTERSECT)

 

SAS SQL SET操作[1] - 蝉弟 - 数据分析小矮人

The INTERSECT operator returns rows from the first query that also occur in the second.

proc sql;
   title 'A INTERSECT B';
   select * from sql.a
   intersect
   select * from sql.b;

 

 

 

Producing Rows That Belong to Both Query Results (INTERSECT)

                                 A INTERSECT B

                                      x  y
                               ------------------
                                      1  one     
                                      2  two     

The output of an INTERSECT ALL operation contains the rows produced by the first query that are matched one-to-one with a row produced by the second query. In this example, the output of INTERSECT ALL is the same as INTERSECT.

 

Concatenating Query Results (OUTER UNION)

 

SAS SQL SET操作[1] - 蝉弟 - 数据分析小矮人

The OUTER UNION operator concatenates the results of the queries. This example concatenates tables A and B.

proc sql;
   title 'A OUTER UNION B';
   select * from sql.a
   outer union
   select * from sql.b;

 

 

 

Concatenating the Query Results (OUTER UNION)

                                A OUTER UNION B

                            x  y                x  z
                     --------------------------------------
                            1  one              .          
                            2  two              .          
                            2  two              .          
                            3  three            .          
                            .                   1  one     
                            .                   2  two     
                            .                   4  four    

Notice that OUTER UNION does not overlay columns from the two tables. To overlay columns in the same position, use the CORRESPONDING keyword.

proc sql;
   title 'A OUTER UNION CORR B';
   select * from sql.a
   outer union corr
   select * from sql.b;

 

Concatenating the Query Results (OUTER UNION CORR)

                              A OUTER UNION CORR B

                                 x  y         z
                          ----------------------------
                                 1  one               
                                 2  two               
                                 2  two               
                                 3  three             
                                 1            one     
                                 2            two     
                                 4            four    

Producing Rows from the First Query or the Second Query

 

There is no keyword in PROC SQL that returns unique rows from the first and second table, but not rows that occur in both. Here is one way you can simulate this operation:

 

(query1 except query2) 
union 
(query2 except query1)

 

This example shows how to use this operation.


proc sql;
   title 'A EXCLUSIVE UNION B';
   (select * from sql.a
         except
         select * from sql.b)
   union
   (select * from sql.b
         except
         select * from sql.a);

 

 

Producing Rows from the First Query or the Second Query

                              A EXCLUSIVE UNION B

                                      x  y
                               ------------------
                                      3  three   
                                      4  four    

The first EXCEPT returns one unique row from the first table (table A) only. The second EXCEPT returns one unique row from the second table (table B) only. The middle UNION combines the two results. Thus, this query returns the row from the first table that is not in the second table, as well as the row from the second table that is not in the first table.

 

 

 

JOIN Operator

 

对于SQL的Join,在学习起来可能是比较乱的。我们知道,SQL的Join语法有很多inner的,有outer的,有left的,有时候,对于Select出来的结果集是什么样子有点不是很清楚。Coding Horror上有一篇文章,通过文氏图 Venn diagrams 解释了SQL的Join。我觉得清楚易懂,转过来。

假设我们有两张表。Table A 是左边的表。Table B 是右边的表。其各有四条记录,其中有两条记录name是相同的,如下所示:让我们看看不同JOIN的不同

                                             

A表
id name
1 Pirate
2 Monkey
3 Ninja
4 Spaghetti
B表
id name
1 Rutabaga
2 Pirate
3 Darth Vade
4 Ninja

 

 

1.INNER JOIN

SELECT * FROM TableA INNER JOIN TableB ON TableA.name = TableB.name

 

                                  

结果集
(TableA.) (TableB.)
id name id name
1 Pirate 2 Pirate
3 Ninja 4 Ninja

 

 
 

Inner join 产生的结果集中,是A和B的交集。

 

2.FULL [OUTER] JOIN 
(1)
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name 
                                                                                  
结果集
(TableA.) (TableB.)
id name id name
1 Pirate 2 Pirate
2 Monkey null null
3 Ninja 4 Ninja
4 Spaghetti null null
null null 1 Rutabaga
null null 3 Darth Vade

Full outer join 产生A和B的并集。但是需要注意的是,对于没有匹配的记录,则会以null做为值。
可以使用IFNULL判断。
 
(2)
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name
WHERE TableA.id IS null OR TableB.id IS null
 
结果集
(TableA.) (TableB.)
id name id name
2 Monkey null null
4 Spaghetti null null
null null 1 Rutabaga
null null 3 Darth Vade
 产生A表和B表没有交集的数据集。
 
3.LEFT [OUTER] JOIN
(1)
SELECT * FROM TableA LEFT OUTER JOIN TableB ON TableA.name = TableB.name
                                                                                
结果集
(TableA.) (TableB.)
id name id name
1 Pirate 2 Pirate
2 Monkey null null
3 Ninja 4 Ninja
4 Spaghetti null null
Left outer join 产生表A的完全集,而B表中匹配的则有值,没有匹配的则以null值取代。
 
(2)
SELECT * FROM TableA LEFT OUTER JOIN TableB ON TableA.name = TableB.nameWHERE TableB.id IS null
                                                                                 
结果集
(TableA.) (TableB.)
id name id name
2 Monkey null null
4 Spaghetti null null

产生在A表中有而在B表中没有的集合。

4.RIGHT [OUTER] JOIN
RIGHT OUTER JOIN 是后面的表为基础,与LEFT OUTER JOIN用法类似。这里不介绍了。
 
5.UNION  UNION ALL
UNION 操作符用于合并两个或多个 SELECT 语句的结果集。
请注意,UNION 内部的 SELECT 语句必须拥有相同数量的列。列也必须拥有相似的数据类型。同时,每条 SELECT 语句中的列的顺序必须相同。UNION 只选取记录,而UNION ALL会列出所有记录。
(1)SELECT name FROM TableA UNION SELECT name FROM TableB 
                  
新结果集
name
Pirate
Monkey
Ninja
Spaghetti
Rutabaga
Darth Vade
选取不同值
 
(2)SELECT name FROM TableA UNION ALL SELECT name FROM TableB 
                            
新结果集
name
Pirate
Monkey
Ninja
Spaghetti
Rutabaga
Pirate
Darth Vade
Ninja

全部列出来

 

(3)注意:

SELECT * FROM TableA UNION SELECT * FROM TableB
                                                                                       
新结果集
id name
1 Pirate
2 Monkey
3 Ninja
4 Spaghetti
1 Rutabaga
2 Pirate
3 Darth Vade
4 Ninja
由于 id 1 Pirate   与 id 2 Pirate 并不相同,不合并
 
还需要注册的是我们还有一个是“交差集” cross join, 这种Join没有办法用文式图表示,因为其就是把表A和表B的数据进行一个N*M的组合,即笛卡尔积。
表达式如下:
                    SELECT * FROM TableA CROSS JOIN TableB
这个笛卡尔乘积会产生 4 x 4 = 16 条记录,一般来说,我们很少用到这个语法。但是我们得小心,如果不是使用嵌套的select语句,一般系统都会产生笛卡尔乘积然再做过滤。
这是对于性能来说是非常危险的,尤其是表很大的时候。
posted @ 2013-12-18 12:21  寒秋绝月  阅读(530)  评论(0编辑  收藏  举报