mysql - 删除重复数据
如下表:
单字段重复:
例如:删除仅stu_name一个字段重复的数据
全部删除
-- 查询全部重复数据 select * from tb_4 where stu_name in (select stu_name from tb_4 group by stu_name having count(1) > 1) -- 删除全部重复数据 delete from tb_4 where stu_name in (select a.stu_name from (select stu_name from tb_4 group by stu_name having count(1) > 1)a)
部分删除(保留一行)
-- 查询多余重复数据 select * from tb_4 where id not in (select min(id) from tb_4 group by stu_name) -- 删除多余重复数据(保留一行) delete from tb_4 where id not in (select a.min_id from (select min(id) as min_id from tb_4 group by stu_name)a)
多字段重复:
在单字段的基础上,将group by中的字段修改为你想要的即可
例如:删除stu_name、score两个字段重复的数据
全部删除
-- 查询全部重复数据 select * from tb_4 where stu_name in (select a.stu_name from (select stu_name from tb_4 group by stu_name, score having count(1) > 1)a) and score in (select a.score from (select score from tb_4 group by stu_name, score having count(1) > 1)a) -- 删除全部重复数据 delete from tb_4 where stu_name in (select a.stu_name from (select stu_name from tb_4 group by stu_name, score having count(1) > 1)a) and score in (select a.score from (select score from tb_4 group by stu_name, score having count(1) > 1)a)
部分删除(保留一行)
-- 查询多余重复数据 select * from tb_4 where id not in (select min(id) from tb_4 group by stu_name, score) -- 删除多余重复数据(保留一行) delete from tb_4 where id not in (select a.min_id from (select min(id) as min_id from tb_4 group by stu_name, score)a)
保留多行
如下表:删除相同医院、相同科室5个以上的医生数据(删除红框中的数据)
结合sql和python实现
1. 以下sql可以将相同医院相同科室符合条件的数据删除一行
delete from tb_test_doctor_1 where phone in ( select b.phone from (select phone from tb_test_doctor_1 group by hospital, department having count(1) > 5 ORDER BY count(1) desc) as b )
2. 使用python循环,直到删完
import MySQLdb conn = MySQLdb.connect( host='192.168.1.0', port=3306, user='root', passwd='123', db='test' ) cur = conn.cursor() # sql = 'select version()' # # cur.execute(sql) # a = cur.fetchall() # print(a) sql_sum = 'select count(1) from tb_test_doctor_1 group by hospital, department having count(1) > 5 ORDER BY count(1) DESC limit 1' cur.execute(sql_sum) sum_tuple = cur.fetchone() sum = sum_tuple[0] print(sum) sql2 = """ delete from tb_test_doctor_1 where phone in ( select b.phone from (select phone from tb_test_doctor_1 group by hospital, department having count(1) > 5 ORDER BY count(1) desc) as b ) """ for n in range(sum): cur.execute(sql2) conn.commit() cur.close() conn.close()