Monday, June 10, 2013

Write a query to delete duplicate rows in a table (SQL)

Delete duplicate records from the emp table


The Row_Number() built in function returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
So when for a partition by empID, ROW_NUMBER() function will return the numbered rows grouped by empID.
The result set of the flowing query shows the data returned by row_number() function :

SELECT empid, ROW_NUMBER() over (PARTITION BY empid ORDER BY empid) as rowNum
 FROM [MySampleDB].[dbo].[emp]

The result set looks like :


To delete only duplicate records from the table, we have to delete only those rows for which the rowNumber is more than 1.

The following query will remove duplicates from the table :

WITH
temp_table AS
(SELECT empid, ROW_NUMBER() over (PARTITION BY empid ORDER BY empid) as rowNum
 FROM [MySampleDB].[dbo].[emp]
 )
DELETE FROM temp_table
WHERE rowNum > 1

1 comment:

Unknown said...

Could you please explain how this query removes the duplicates?