Riordon

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: :: :: 管理 ::

经常要对数据库中的数据进行去重,有时还需要使用外部表填冲数据,本文档记录数据去重与外表填充数据。
date:2016/8/17
author:wangxl

1 需求

对user_info1表去重,并添加age项。

2 表数据

user_info1:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
|  1 | xiaolong | 1    | NULL |
|  2 | xiaoyun  | 1    | NULL |
|  3 | xiaoqin  | 2    | NULL |
|  4 | xiaolong | 1    | NULL |
|  5 | xiaodong | 1    | NULL |
|  6 | xiaokai  | 1    | NULL |
|  7 | xiaohong | 2    | NULL |
|  8 | xiaolong | 1    | NULL |
|  9 | xiaohong | 2    | NULL |
| 10 | xiaofen  | 2    | NULL |
+----+----------+------+------+

user_info2:
+----------+------+
| name     | age  |
+----------+------+
| xiaolong |   26 |
| xiaoyun  |   28 |
| xiaoqin  |   27 |
| xiaodong |   27 |
| xiaokai  |   27 |
| xiaohong |   24 |
| xiaofen  |   22 |
+----------+------+

3 实战

3.1 去重

(1) 找出有重复字段
	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1);
(2) 找出要删除的记录,重复记录是根据单个字段(name)来判断,只留有id最小的记录
	select * from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);
(3) 删除表中多余的重复记录
	delete from user_info1 where name in (select name from user_info1 group by name having count(name) > 1) and id not in (select min(id) from user_info1 group by name having count(name) > 1);

	报错:ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause

更换思路:找出每组中非最小id并删除,如下:

(4) 找出每组最小id
	select min(id) from user_info1 group by name
(5) 找出每组非最小id
	select * from user_info1 where id not in (select min(id) from user_info1 group by name);
(6) 删除每组中非最小id所在行
	delete from user_info1 where id not in (select id from select min(id) from user_info1 group by name);
	ERROR 1093 (HY000): You can't specify target table 'user_info1' for update in FROM clause
	更正:
	delete from user_info1 where id not in (select minid from (select min(id) as minid from user_info1 group by name) a);、

结果展示:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
| 1  | xiaolong | 1    | NULL |
| 2  | xiaoyun  | 1    | NULL |
| 3  | xiaoqin  | 2    | NULL |
| 5  | xiaodong | 1    | NULL |
| 6  | xiaokai  | 1    | NULL |
| 7  | xiaohong | 2    | NULL |
| 10 | xiaofen  | 2    | NULL |
+----+----------+------+------+

对于没有primary key的话,怎么去重呢?

(7) 创建表test
(8) insert into test select distinct(name),sex,age from user_info1 group by name; 

暂时没想出一句话解决方案.

3.2 外表插入

update user_info1 t set age=(select age from user_info2 where name=t.name);
结果如下:
+----+----------+------+------+
| id | name     | sex  | age  |
+----+----------+------+------+
| 1  | xiaolong | 1    |   26 |
| 2  | xiaoyun  | 1    |   28 |
| 3  | xiaoqin  | 2    |   27 |
| 5  | xiaodong | 1    |   27 |
| 6  | xiaokai  | 1    |   27 |
| 7  | xiaohong | 2    |   24 |
| 10 | xiaofen  | 2    |   22 |
+----+----------+------+------+
posted on 2016-08-17 23:03  Riordon  阅读(301)  评论(0编辑  收藏  举报