1. 程式人生 > >Oracle刪除重復數據

Oracle刪除重復數據

bi報表 oracle 除了 技術 bsp select HERE recycle group

背景:有兩個數據庫(源數據庫,和目標數據庫),每天把源數據庫了數據同步到目標數據庫中,由於各種原因,怕數據丟失,所有同步8天前後的數據(有主鍵,不要擔心重復,每天十幾萬條,表中已經有6千萬條),但是不知道哪天有同事把主鍵誤drop掉。

統計的BI報表數據多的離譜。經過的一番折騰,問題解決了。下面總結一下幾種方法:

1)閃回:oracle有閃回技術,可以利用recyclebin(回收站)查詢刪除的的主鍵,但是這之前要把重復的數據刪除。

2)利用rowid查詢重復數據並且幹掉相同數據除rowid最小,語句:

delete from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*)> 1) and rowid not in (select min(rowid) from 表group by Id,seq having count(*)>1)

這條dml語句就是噩夢,因為有"not in" 如果你的數據量大,請慎用。 3)也就是經過實踐的方法,效率還可以,大概5分鐘就刪除了。步奏如下: 1.查詢表中的重復數據 select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) (a.Id,a.seq 是有重復的主鍵) 2.建一張表 create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); commit ;(這樣lsb的表結構就和表1的表結構一樣) 3.刪除表1裏的重復數據 delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ; commit; 4.查詢lsb表中的rowid最小的數據 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) 5.把查詢出來的rowid插入到表1裏 insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ; commit; 6.drop table lsb; 4)整體步奏 create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); --也可以是臨時表效率更高(不需要寫磁盤) commit ; delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ; commit; insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ; commit; drop table lsb;

Oracle刪除重復數據