ORACLE 删除重复的数据
一:重复数据根据单个字段进行判断
1、首先,查询表中多余的数据,由关键字段(name)来查询。
select * from OA_ADDRESS_BOOK where name in (select name from OA_ADDRESS_BOOK group by name having count(name)>1)
2、删除表中重复数据,重复数据是根据单个字段(Name)来判断,只留有rowid最小的记录
delete from OA_ADDRESS_BOOK where (Name) in
(select Name from OA_ADDRESS_BOOK group by Name having count(Name) >1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name having count(Name)>1)
二:重复数据根据多个字段进行判断
1、首先,查询表中重复数据,由关键字段(Name,UNIT_ID)来查询。
select * from OA_ADDRESS_BOOK book1 where (book1.name,book1.unit_id) in
(select book2.name,book2.unit_id from OA_ADDRESS_BOOK book2 group by book2.name,book2.unit_id having count(*)>1)
2、删除表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,只留有rowid最小的记录
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
3、查询表中重复数据,重复数据是根据多个字段(Name,UNIT_ID)来判断,不包含rowid最小的记录
(select Name,UNIT_ID from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*) > 1)
and rowid not in (select min(rowid) from OA_ADDRESS_BOOK group by Name,UNIT_ID having count(*)>1)
1. 问题描述
BBSCOMMENT表为BBSDETAIL的从表,记录商户评价信息。因为数据倒腾来倒腾去的,有很多重复数据。表结构如下:
COMMENT_ID NOT NULL NUMBER --主键
DETAIL_ID NOT NULL NUMBER --外键,引用BBSDETAIL表
COMMENT_BODY NOT NULL VARCHAR2(500) --评价内容
--其它字段忽略
其中主键是没有重复的,重复的是DETAIL_ID+COMMENT_BODY+......等信息,就是某些商家的评价信息有重复。
2. 解决步骤
2.1 查找表中多余的重复记录
--查询出所有有重复的数据 select DETAIL_ID,COMMENT_BODY,count(*) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1 order by DETAIL_ID, COMMENT_BODY; --1955条
2.2 显示了所有的非冗余的数据
--这一条命令显示了所有的非冗余的数据 select min(COMMENT_ID) as COMMENT_ID,DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY; --21453条,之所以此值不等于表总记录数-1955,是因为1955条记录中,有的重复了不止一次。
2.3 如果记录数量少(千级别),可以把上面的语句做成子查询然后直接删除
--如果表数据量不是很大(1千条以内),可以把上面的语句做成子查询然后直接删除 delete from BBSCOMMENT where COMMENT_ID not in( select min(COMMENT_ID) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY ); --782秒,在我这里,2万条记录,重复记录2千多(太慢了!!)
2.4 另一种删除方法
--这条语句也能够实现上述功能,但不好测试了,数据已经被我删除了 --删除条件一:有重复数据的记录;条件二:保留最小rowid的记录。 delete from BBSCOMMENT a where (a.DETAIL_ID,a.COMMENT_BODY) in(select DETAIL_ID,COMMENT_BODY from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*) > 1) and rowid not in (select min(rowid) from BBSCOMMENT group by DETAIL_ID,COMMENT_BODY having count(*)>1);
2.5 大数据量还是用PL/SQL方便快捷
declare --定义存储结构 type bbscomment_type is record ( comment_id BBSCOMMENT.COMMENT_ID%type, detail_id BBSCOMMENT.DETAIL_ID%type, comment_body BBSCOMMENT.COMMENT_BODY%type ); bbscomment_record bbscomment_type; --可供比较的变量 v_comment_id BBSCOMMENT.COMMENT_ID%type; v_detail_id BBSCOMMENT.DETAIL_ID%type; v_comment_body BBSCOMMENT.COMMENT_BODY%type; --其它变量 v_batch_size integer := 5000; v_counter integer := 0; cursor cur_dupl is --取出所有有重复的记录 select COMMENT_ID, DETAIL_ID, COMMENT_BODY from BBSCOMMENT where(DETAIL_ID, COMMENT_BODY) in ( --这些记录有重复 select DETAIL_ID, COMMENT_BODY from BBSCOMMENT group by DETAIL_ID, COMMENT_BODY having count(*) > 1) order by DETAIL_ID, COMMENT_BODY; begin for bbscomment_record in cur_dupl loop if v_detail_id is null or (bbscomment_record.detail_id != v_detail_id or nvl(bbscomment_record.comment_body, ' ') != nvl(v_comment_body, ' ')) then --首次进入、换记录了,都重新赋值 v_detail_id := bbscomment_record.detail_id; v_comment_body := bbscomment_record.comment_body; else --其它记录删除 delete from BBSCOMMENT where COMMENT_ID = bbscomment_record.comment_id; v_counter := v_counter + 1; if mod(v_counter, v_batch_size) = 0 then --每多少条提交一次 commit; end if; end if; end loop; if v_counter > 0 then --最后一次提交 commit; end if; dbms_output.put_line(to_char(v_counter)||'条记录被删除!'); exception when others then dbms_output.put_line('sqlerrm-->' ||sqlerrm); rollback; end;