One of the biggest problem with analysis or business intelligence is clean data. In many real world scenarios you will find that data is corrupt, missing or
duplicate. In this article we are going to demonstrate multiple ways to delete duplicate data from a table.
Step 1: Lets create a dummy table.
create table demo (id int, name char(10))
Step 2: Insert duplicate values
insert into demo(id,name) values(1,'abc');
insert into demo(id,name) values(2,'abd');
insert into demo(id,name) values(3,'vijay');
insert into demo(id,name) values(4,'vijay');
insert into demo(id,name) values(5,'ajay');
insert into demo(id,name) values(6,'sanjay');
Step 3: Select * from demo;
select * from demo;
+------+--------+
| id | name |
+------+--------+
| 1 | abc |
| 2 | abd |
| 3 | vijay |
| 4 | vijay |
| 5 | ajay |
| 6 | sanjay |
+------+--------+
6 rows in set (0.00 sec)
So the table now contains duplicate values, id 3,4 with name vijay is duplicate. Now lets look at the options to delete this duplicate value
Step 4: Find duplicate value using select
select id, count(name) from demo group by name having count(name)>1;
+------+-------------+
| id | count(name) |
+------+-------------+
| 3 | 2 |
+------+-------------+
1 row in set (0.00 sec)
So the output shows that id 3 with name vijay has two entries. The simple way to find this was using group by and count on column where you think duplicate are there and should not be. Lets delete this
Step 5: Delete duplicate using inner join
delete d1 from demo d1 inner join demo d2 where d1.id>d2.id AND d1.name=d2.name;
Query OK, 1 row affected (0.00 sec)
Step 6: Check if the query ran successfully, repeat step 4.select id, count(name) from demo group by name having count(name)>1;
Empty set (0.00 sec)
So no output that means duplicate data has been deleted success. Reconfirm with select on table
select id,name from demo;
+------+--------+
| id | name |
+------+--------+
| 1 | abc |
| 2 | abd |
| 3 | vijay |
| 5 | ajay |
| 6 | sanjay |
+------+--------+
5 rows in set (0.00 sec)
In next article, I am going to show other queries to delete duplicate.