测试In & Exists & Join 的性能
Step 1. 测试环境搭建
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Enterprise Edition (64-bit) on Windows NT 6.1
1: -- IN & EXISTS & INNER JOIN
2: CREATE TABLE BIG_TB
3: (
4: ID INT IDENTITY PRIMARY KEY,
5: Col CHAR(4) NOT NULL
6: )
7:
8: CREATE TABLE SMALL_TB
9: (
10: ID INT IDENTITY PRIMARY KEY,
11: Col CHAR(4) NOT NULL,
12: IDate DATE DEFAULT(GETDATE())
13: )
14:
15:
16: INSERT INTO BIG_TB (Col)
17: SELECT top 250000
18: char(65+FLOOR(RAND(a.column_id *1111 + b.object_id)*10)) + char(65+FLOOR(RAND(b.column_id *2222 + b.object_id)*12)) +
19: char(65+FLOOR(RAND(b.column_id *3333 + a.object_id)*12)) + char(65+FLOOR(RAND(a.column_id *4444 + b.object_id)*8))
20: from master.sys.columns a cross join master.sys.columns b;
21: GO
22: --250000
23:
24:
25: INSERT INTO SMALL_TB (col)
26: SELECT DISTINCT col
27: FROM BIG_TB TABLESAMPLE (20 PERCENT);
28: GO
29: --2807
Step 2.测试代码
1: --===================== No Index =====================
2: SELECT ID, col FROM BIG_TB
3: WHERE col IN (SELECT col FROM SMALL_TB)
4:
5: SELECT ID, col FROM BIG_TB
6: WHERE EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
7:
8: SELECT A.ID ,A.col FROM BIG_TB A
9: INNER JOIN SMALL_TB B ON A.col = B.col
10:
11:
12: --Not In & Not Exists
13: SELECT ID, col FROM BIG_TB
14: WHERE col NOT IN (SELECT col FROM SMALL_TB)
15:
16: SELECT ID, col FROM BIG_TB
17: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
18:
19: SELECT A.ID,A.col FROM BIG_TB A
20: LEFT OUTER JOIN SMALL_TB B ON A.col = B.col
21: WHERE B.col IS NULL
22:
23:
24: --===================== Index =====================
25: CREATE INDEX idx_BIG_TB_col
26: ON BIG_TB (col)
27:
28: CREATE INDEX idx_SMALL_TB_col
29: ON SMALL_TB (col)
30:
31:
32: SELECT ID, col FROM BIG_TB
33: WHERE col IN (SELECT col FROM SMALL_TB)
34:
35: SELECT ID, col FROM BIG_TB
36: WHERE EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
37:
38: SELECT A.ID ,A.col FROM BIG_TB A
39: INNER JOIN SMALL_TB B ON A.col = B.col
40:
41:
42: --Not In & Not Exists
43: SELECT ID, col FROM BIG_TB
44: WHERE col NOT IN (SELECT col FROM SMALL_TB)
45:
46: SELECT ID, col FROM BIG_TB
47: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
48:
49: SELECT A.ID,A.col FROM BIG_TB A
50: LEFT OUTER JOIN SMALL_TB B ON A.col = B.col
51: WHERE B.col IS NULL
Step 3.测试数据汇总
总结: 在有无Index的前提下,In/Exists及Not In/Not Exists的性能表现相差无几。 而Inner Join/Left Outer Join则相较而言,性能表现上稍微较慢。但注意到Col列 是Not Null属性的。当该列为Nullable状态时,Not Exists与Not In的处理方式将 有所不同,性能表现见下表:
1: ALTER TABLE BIG_TB
2: ALTER COLUMN col char(4) NULL
3:
4: ALTER TABLE SMALL_TB
5: ALTER COLUMN col char(4) NULL
6:
7:
8:
9: SELECT ID, col FROM BIG_TB
10: WHERE col NOT IN (SELECT col FROM SMALL_TB)
11:
12: /*
13: 表 'SMALL_TB'。扫描计数 13,逻辑读取 500058 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
14: 表 'BIG_TB'。扫描计数 5,逻辑读取 3992 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
15:
16: SQL Server 执行时间:
17: CPU 时间 = 1095 毫秒,占用时间 = 306 毫秒。
18: */
19:
20:
21: SELECT ID, col FROM BIG_TB
22: WHERE NOT EXISTS (SELECT col FROM SMALL_TB WHERE SMALL_TB.col = BIG_TB.col)
23: /*
24: 表 'BIG_TB'。扫描计数 1,逻辑读取 3639 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
25: 表 'SMALL_TB'。扫描计数 1,逻辑读取 9 次,物理读取 0 次,预读 0 次,lob 逻辑读取 0 次,lob 物理读取 0 次,lob 预读 0 次。
26:
27:
28: SQL Server 执行时间:
29: CPU 时间 = 265 毫秒,占用时间 = 263 毫秒。
30: */
又上述数据可以看出。在处理含有null的数据时,Not exists 的性能将比Not In的性能有很大提升。提升效率大约为5倍左右。
家家有老人
人人会变老
帮老就是帮未来的您