Deleting records is a very common activity performed against databases and everyone is familiar with DELETE statement. Here is a question based on DELETE statement.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
IF OBJECT_ID('dbo.ProductCode') IS NOT NULL
DROP TABLE dbo.ProductCode;
GO
CREATE TABLE dbo.ProductCode
(
ProductCode char(5)
, ProductType char(5)
);
GO
INSERT INTO dbo.ProductCode
VALUES
('PR001', 'T0001')
, ('PR001', 'T0001')
, ('PR002', 'T0002')
, ('PR003', 'T0003')
, ('PR001', 'T0001')
, ('PR004', 'T0004')
, ('PR003', 'T0003')
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
-- Here is the way of deleting duplicates WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ProductCode ORDER BY ProductCode) AS RowNumber FROM dbo.ProductCode) DELETE FROM cte WHERE RowNumber > 1 ; -- Checking records SELECT * FROM dbo.ProductCode ORDER BY ProductCode;
-- Here is another way of deleting them
SET ROWCOUNT 1
WHILE 1=1
BEGIN
DELETE
FROM dbo.ProductCode
WHERE ProductCode IN (SELECT ProductCode
FROM dbo.ProductCode
GROUP BY ProductCode
HAVING COUNT(*) > 1)
IF @@ROWCOUNT = 0
BREAK;
END
SET ROWCOUNT 0
-- Checking records
SELECT * FROM dbo.ProductCode
ORDER BY ProductCode;

No comments:
Post a Comment