Sunday, December 27, 2015

SQL Server Brain Basher of the Week #037 - TSQL - Deleting duplicates

Deleting records is a very common activity performed against databases and everyone is familiar with DELETE statement. Here is a question based on DELETE statement.

What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?

There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.

  1. IF OBJECT_ID('dbo.ProductCode'IS NOT NULL  
  2.  DROP TABLE dbo.ProductCode;  
  3. GO  
  4.   
  5. CREATE TABLE dbo.ProductCode  
  6. (  
  7.  ProductCode char(5)  
  8.  , ProductType char(5)  
  9. );  
  10. GO  
  11.   
  12. INSERT INTO dbo.ProductCode  
  13. VALUES  
  14.  ('PR001''T0001')  
  15.  , ('PR001''T0001')  
  16.  , ('PR002''T0002')  
  17.  , ('PR003''T0003')  
  18.  , ('PR001''T0001')  
  19.  , ('PR004''T0004')  
  20.  , ('PR003''T0003')  



Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.

  1. -- Here is the way of deleting duplicates  
  2. WITH cte AS  
  3. (  
  4.  SELECT *,   
  5.   ROW_NUMBER() OVER (PARTITION BY ProductCode ORDER BY ProductCode) AS RowNumber  
  6.  FROM dbo.ProductCode)  
  7. DELETE FROM cte  
  8. WHERE RowNumber > 1 ;  
  9.   
  10. -- Checking records  
  11. SELECT * FROM dbo.ProductCode  
  12. ORDER BY ProductCode;  


  1. -- Here is another way of deleting them  
  2. SET ROWCOUNT 1  
  3. WHILE 1=1  
  4. BEGIN  
  5.   
  6.  DELETE   
  7.  FROM dbo.ProductCode  
  8.  WHERE ProductCode IN (SELECT ProductCode  
  9.        FROM dbo.ProductCode  
  10.        GROUP BY ProductCode  
  11.        HAVING COUNT(*) > 1)  
  12.  IF @@ROWCOUNT = 0  
  13.   BREAK;  
  14. END  
  15. SET ROWCOUNT 0  
  16.   
  17. -- Checking records  
  18. SELECT * FROM dbo.ProductCode  
  19. ORDER BY ProductCode;  


No comments: