Deleting records is a very common activity performed against databases and everyone is familiar with DELETE statement. Here is a question based on DELETE statement.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
IF OBJECT_ID('dbo.ProductCode') IS NOT NULL
 DROP TABLE dbo.ProductCode;
GO
CREATE TABLE dbo.ProductCode
(
 ProductCode char(5)
 , ProductType char(5)
);
GO
INSERT INTO dbo.ProductCode
VALUES
 ('PR001', 'T0001')
 , ('PR001', 'T0001')
 , ('PR002', 'T0002')
 , ('PR003', 'T0003')
 , ('PR001', 'T0001')
 , ('PR004', 'T0004')
 , ('PR003', 'T0003')
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
-- Here is the way of deleting duplicates WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ProductCode ORDER BY ProductCode) AS RowNumber FROM dbo.ProductCode) DELETE FROM cte WHERE RowNumber > 1 ; -- Checking records SELECT * FROM dbo.ProductCode ORDER BY ProductCode;
-- Here is another way of deleting them
SET ROWCOUNT 1
WHILE 1=1
BEGIN
 DELETE 
 FROM dbo.ProductCode
 WHERE ProductCode IN (SELECT ProductCode
       FROM dbo.ProductCode
       GROUP BY ProductCode
       HAVING COUNT(*) > 1)
 IF @@ROWCOUNT = 0
  BREAK;
END
SET ROWCOUNT 0
-- Checking records
SELECT * FROM dbo.ProductCode
ORDER BY ProductCode;

 
 
No comments:
Post a Comment