sql中的in(三值逻辑)与exists(二值逻辑)(by atkfc)

上网查了一下，关于sql中的in和exists,大家普遍关心的是效率问题：通常情况下采用exists要比in效率高，因为IN不走索引，但要看实际情况具体使用：IN适合于外表大而内表小的情况；EXISTS适合于外表小而内表大的情况。

今天我要说的是sql中的in(三值逻辑)与exists(二值逻辑)。所谓二值逻辑,即true或false，三值逻辑，即true或false或unknown。(环境为sqlserver 2008)

首先，请大家看一个现象：(ad:http://www.cnblogs.com/atkfc)

--现象：
declare @a1 int=22
if @a1 in (1,22,null) print 'y1' else print 'n1' --result：y1
if @a1 in (1,2,null) print 'y1' else print 'n1' --result:n1
if @a1 not in (1,2,null) print 'y2' else print 'n2' --result:n2
--分析：@a1 not in (1,2,null)<=>not @a1 in (1,2,null)<=>not(@a=1 or @a=2 or @a=null)<=>not(false or false or unknown)<=>not unknown<=>unknown

然后，举一个示例，以飨读者：

STEP1：建两个表

--table tbl1:
if OBJECT_ID('tbl1','U') is not null drop table tbl1
create table tbl1
(id int not null primary key,
age int
)
insert tbl1 values(1,1),(2,2),(3,3),(4,4),(5,null)
select * from tbl1
--table tbl2:
if OBJECT_ID('tbl2','U') is not null drop table tbl2
create table tbl2
(id int not null primary key,
age int
)
insert tbl2 values(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,null)
select * from tbl2

--两张表tbl1、tbl2查询结果如下图：

STEP2：执行下面几句

--e.g.1
select * from tbl2 t2 where age in(select t1.age from tbl1 t1)
--e.g.2
select * from tbl2 t2 where age not in(select t1.age from tbl1 t1)
--e.g.3
select * from tbl2 t2 where age not in(select t1.age from tbl1 t1 where t1.age is not null)
--e.g.4
select * from tbl2 t2 where exists(select t1.age from tbl1 t1 where t1.age =t2.age)
--e.g.5
select * from tbl2 t2 where not exists(select t1.age from tbl1 t1 where t1.age =t2.age)
--e.g.6
select * from tbl2 t2 where not exists(select t1.age from tbl1 t1 where t1.age =t2.age) and t2.age is not null

　　　　 --执行结果如下图：(ad:http://www.cnblogs.com/atkfc)

STEP3：分析(P is Problem, S is Solution)：

P1:当对至少返回一个null值的子查询使用not in时，外部查询总会返回一个空集。对于来自外部表的某个值，如果已经知道这个值会在子查询返回的集合中出现，那么外部查询将不会返回这个值，因为外部查询原本就是要返回不会在子查询返回集合中出现的值；如果外部表中的这个值没有在子查询返回的已知值集合中出现，那么外部查询仍然不会返回这个值，因为这时无法确切的判断这个值是否包含null的集合所涵盖的范围。
S1toP1:显式地排除null值，如e.g.3，加条件where t1.age is not null
S2toP1:隐式地排除null值，如e.g.5，使用not exists取代not in

小结：　1.当一个列不应该允许为null时，把它定义为not null很重要。加强数据的完整性定义，比很多人想象的要重要的多。
　　　　2.与in不同的是，exists使用的是二值逻辑，所以exists总是返回true或false，而绝不会返回unknown，当子查询遇到null值时，表达式的计算结果就为unknown，该行会被过滤掉。就exists来说，它可以自然的消除null值的情况，好像他们根本不存在。这样exists只处理已知的值。因此，使用not exists比使用not in更安全。(ad:http://www.cnblogs.com/atkfc)