自关联去掉组内重复数据
数据库环境:SQL SERVER 2005
现有一个表的数据如下,id是主键,p1,p2是字符串类型,如果当前行的p1,p2字段的值分别等于其它行
的字段p2,p1的值,则视这2行记录为一组。比如,id=1和id=5就属于同一组数据。同一组数据只显示id最小
的那行记录,没有组的数据全部显示。
实现思路:
将表进行自关联左联,假设表的别名是a,b,根据id进行关联,对关联后的结果集进行过滤。如果b.id是空的,则保留,
如果b.id不为空,则只保留a.id比b.id小的记录。
实现的SQL脚本:
/*1.数据准备*/ WITH x0 AS ( SELECT 1 AS id , 'A' AS p1 , 'B' AS p2 /*UNION ALL SELECT 0 AS id , 'A' AS p1 , 'B' AS p2*/ UNION ALL SELECT 2 AS id , 'C' AS p1 , 'D' AS p2 UNION ALL SELECT 3 AS id , 'E' AS p1 , 'F' AS p2 UNION ALL SELECT 4 AS id , 'D' AS p1 , 'C' AS p2 UNION ALL SELECT 5 AS id , 'B' AS p1 , 'A' AS p2 UNION ALL SELECT 6 AS id , 'H' AS p1 , 'J' AS p2 UNION ALL SELECT 7 AS id , 'T' AS p1 , 'U' AS p2 UNION ALL SELECT 8 AS id , 'J' AS p1 , 'H' AS p2 /*UNION ALL SELECT 9 AS id , 'I' AS p1 , 'L' AS p2 UNION ALL SELECT 10 AS id , 'J' AS p1 , 'K' AS p2*/ ),/*2.去重*/ x1 AS ( SELECT id , p1 , p2 FROM ( SELECT id , p1 , p2 , ROW_NUMBER() OVER ( PARTITION BY p1, p2 ORDER BY id ) AS rn FROM x0 ) t WHERE rn = 1 ) /*3.求值*/ SELECT a.id , a.p1 , a.p2 FROM x1 a LEFT JOIN x1 b ON b.p1 = a.p2 AND b.p2 = a.p1 WHERE b.id IS NULL OR a.id < b.id
最终实现的效果如图:
也有网友提出通过ASCII来实现,他的实现SQL脚本如下:
WITH c1 AS ( SELECT 1 AS id , 'A' AS p1 , 'B' AS p2 /*UNION ALL SELECT 0 AS id , 'A' AS p1 , 'B' AS p2*/ UNION ALL SELECT 2 AS id , 'C' AS p1 , 'D' AS p2 UNION ALL SELECT 3 AS id , 'E' AS p1 , 'F' AS p2 UNION ALL SELECT 4 AS id , 'D' AS p1 , 'C' AS p2 UNION ALL SELECT 5 AS id , 'B' AS p1 , 'A' AS p2 UNION ALL SELECT 6 AS id , 'H' AS p1 , 'J' AS p2 UNION ALL SELECT 7 AS id , 'T' AS p1 , 'U' AS p2 UNION ALL SELECT 8 AS id , 'J' AS p1 , 'H' AS p2 /*UNION ALL SELECT 9 AS id , 'I' AS p1 , 'L' AS p2 UNION ALL SELECT 10 AS id , 'J' AS p1 , 'K' AS p2*/ ), c2 AS ( SELECT MIN(id) AS min_id FROM c1 GROUP BY ASCII(p1) + ASCII(p2) ) SELECT c1.* FROM c1 JOIN c2 ON id = min_id
咋一看,似乎也可以实现同样的需求。实际上,这种写法存在2个问题:
1.如果p1,p2是多个字符,ASCII的方式只会取第一个字符的ASCII
2.ASCII('A')+ASCII('D')=ASCII('B')+ASCII('C'),对于这样的数据,用ASCII的方式无法区分
(本文完)