Deleting records is a very common activity performed against databases and everyone is familiar with DELETE statement. Here is a question based on DELETE statement.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?
There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.
- IF OBJECT_ID('dbo.ProductCode') IS NOT NULL
- DROP TABLE dbo.ProductCode;
- GO
- CREATE TABLE dbo.ProductCode
- (
- ProductCode char(5)
- , ProductType char(5)
- );
- GO
- INSERT INTO dbo.ProductCode
- VALUES
- ('PR001', 'T0001')
- , ('PR001', 'T0001')
- , ('PR002', 'T0002')
- , ('PR003', 'T0003')
- , ('PR001', 'T0001')
- , ('PR004', 'T0004')
- , ('PR003', 'T0003')
Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.
- -- Here is the way of deleting duplicates
- WITH cte AS
- (
- SELECT *,
- ROW_NUMBER() OVER (PARTITION BY ProductCode ORDER BY ProductCode) AS RowNumber
- FROM dbo.ProductCode)
- DELETE FROM cte
- WHERE RowNumber > 1 ;
- -- Checking records
- SELECT * FROM dbo.ProductCode
- ORDER BY ProductCode;
- -- Here is another way of deleting them
- SET ROWCOUNT 1
- WHILE 1=1
- BEGIN
- DELETE
- FROM dbo.ProductCode
- WHERE ProductCode IN (SELECT ProductCode
- FROM dbo.ProductCode
- GROUP BY ProductCode
- HAVING COUNT(*) > 1)
- IF @@ROWCOUNT = 0
- BREAK;
- END
- SET ROWCOUNT 0
- -- Checking records
- SELECT * FROM dbo.ProductCode
- ORDER BY ProductCode;
No comments:
Post a Comment