Library code snippets
Deleting duplicate rows in a table
Have you ever came across a situation where someone has entered duplicate data into a database table and now the queries that you have written is bringing back bad data!
Well, I don't know about you but I have been there and it is no fun!
Here are a few pointers that can help you find the duplicate values in a table and delete those pesky pieces of information.
Lets say you have a table called Employees and its layout is as follows:
create table employee
(
EmpId number,
EmpName varchar2(10),
EmpSSN varchar2(11)
);
insert into employee values(1, 'Jack', '555-55-5555');
insert into employee values (2, 'Joe', '555-56-5555');
insert into employee values (3, 'Fred', '555-57-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
insert into employee values (7, 'Jack', '555-55-5555');
insert into employee values (8, 'Mike', '555-58-5555');
insert into employee values (9, 'Cathy', '555-59-5555');
insert into employee values (10, 'Lisa', '555-70-5555');
insert into employee values (11, 'Lisa', '555-70-5555');
You can see from the data inserted above that the EmpId is unique but the EmpName and EmpSSN have duplicate values, so now you ask yourself, how in the world can I find these duplicates and remove them?
Look no further as I will show you below one way of solving this problem.
First off you will need to get an idea of what records are duplicate and by running the query below it will help you figure that out:
SQL> select count(empssn), empssn from employee
2 group by empssn
3 having count(empssn) > 1;
COUNT(EMPSSN) EMPSSN
------------- -----------
2 555-55-5555
2 555-58-5555
2 555-59-5555
3 555-70-5555
From the above output you can now tell what employees have duplicate records and
now you can move onto the next step and remove them but still retain a unique
record for the duplicate employees.
Again here is a list all of the records in the table before we remove them.
SQL> select * from employee;
EMPID EMPNAME EMPSSN
---------- ---------- -----------
1 Jack 555-55-5555
2 Joe 555-56-5555
3 Fred 555-57-5555
4 Mike 555-58-5555
5 Cathy 555-59-5555
6 Lisa 555-70-5555
7 Jack 555-55-5555
8 Mike 555-58-5555
9 Cathy 555-59-5555
10 Lisa 555-70-5555
11 Lisa 555-70-5555
Now, let's get rid of those duplicate records by running this query:
delete from employee
where (empid, empssn)
not in
( select min(empid), empssn
from employee group by empssn);
After the above query has been executed lets select all the records from the table
to see if indeed it actually worked:
SQL> select * from employee;
EMPID EMPNAME EMPSSN
---------- ---------- -----------
1 Jack 555-55-5555
2 Joe 555-56-5555
3 Fred 555-57-5555
4 Mike 555-58-5555
5 Cathy 555-59-5555
6 Lisa 555-70-5555
6 rows selected.
BAM! Problem solved, no more duplicates, now finally you can go to bed and get
some sleep. WAIT! Just before you do, let me throw a little wrench into the situation,
lets say that you have duplicate records once again but this time the empids and
empSSNs are the same.
For example the data now looks like this:
Now, how can you figure out what data duplicate as now there is no unique identifier in the table. The Ids are the same and the SSN are the same. This is just great!insert into employee values(1, 'Jack', '555-55-5555'); insert into employee values (2, 'Joe', '555-56-5555'); insert into employee values (3, 'Fred', '555-57-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6, 'Lisa', '555-70-5555'); insert into employee values (1, 'Jack', '555-55-5555'); insert into employee values (4, 'Mike', '555-58-5555'); insert into employee values (5, 'Cathy', '555-59-5555'); insert into employee values (6 ,'Lisa', '555-70-5555'); insert into employee values (6, 'Lisa', '555-70-5555');
No problem, relax, there is actually a distinct value in the table but you can't see it. This little helper is called the rowid.
Remember there is nothing to uniquely identify the data as you can see by the above insert statements so you will now have to rely on the rowid to get us out of this mess. In order to delete the duplicate values in the employee table you will now need to write a query like this:
delete from employee
where (rowid, empssn)
not in
(select min(rowid), empssn from employee group by empssn);
And hence, after the delete query above has been executed your data will appear
as:
SQL> select * from employee
2 /
EMPID EMPNAME EMPSSN
---------- ---------- -----------
1 Jack 555-55-5555
2 Joe 555-56-5555
3 Fred 555-57-5555
4 Mike 555-58-5555
5 Cathy 555-59-5555
6 Lisa 555-70-5555
6 rows selected.
As you can see all the duplicate values have been removed from the table and now
you finally have unique records in the table, so before you call it quits or before
someone else slips in and enters more duplicate data make sure you add a primary
key to that EMPID or EMPSSN column so this wont happen again!
Now finally, you will be able to get that long awaited sleep!
IMPORTANT NOTE: Remember rowid will always uniquely identify a row in a table but please do not use this as a primary key. I only showed you this little trick to help you get out of a bind when the database is poorly designed by allowing duplicate data to be entered.
I hope this has helped you out!
Related articles
Related discussion
-
regarding sql query,oracle,sql
by amitbond (2 replies)
-
Please Help Me: Recordset to array
by sidachaba@yahoo.co.uk (0 replies)
-
LINQ in Action
by naser1 (0 replies)
-
question: program for ora 02068 error
by vquanski (0 replies)
-
Help me how to fast export data from datagridview to Excel with many format cell ?
by anatha1 (9 replies)
Related podcasts
-
Java Posse #213 - Newscast for Oct 23rd 2008
Newscast for Oct 23rd 2008 Fully formatted shownotes can always be found at http://javaposse.com The Android project has been released as open source, beating the rumored launch date for the source code by several months http://source.android.com/ And, Gizmodo and ZDNet both offer in-depth ...
Events coming up
-
Jul
22
IT Job Seekers Greater Boston (Java, .NET, SQL, LAMP, etc.) July Meetup
Boston, United States
6:00pm to 6:30pm: Refreshments and networking6:30pm to 7:00pm: Individual introductions to the group7:00pm to 7:30pm: Job Hunting Tips- interviewing tips- what is hot, what is not- job hunting resources, places to look7:30 - 8:00pm: More networking!If you are a hiring manager, recruiter, or entrepreneur you are welcome to bring with you handouts about your company and opened positions.
Comments
Leave a comment
Sign in or Join us (it's free).