Not expecting duplicates but a significant improvement in performance, we tend to use UNION ALL when no duplicates are guaranteed in combining two sets. The reason is, UNION ALL does not performing an additional task for filtering (or removing) duplicates, hence gives a better performance than UNION. However, the purpose of UNION ALL is not for improving the performance but for producing the result with duplicates if exist. While this was being taught during my classes, a thought came into my mind: Why other set operators such as INTERSECT and EXCEPT do not offer the same built-in functionality like INTERSECT ALL and EXCEPT all?
A usual quick search did not give me the required answer but proved that it is not available as a built-in functionality. There were few possible solutions but the one given by T-SQL expert Itzik Ben-Gan was quite interesting. Here is a small note on INTERSECT and EXCEPT and implementation suggested by Itzik Ben-Gan for INTERSECT ALL and EXCEPT ALL.
INTERSECT and INTERSECT ALL
INTERSECT allows us to retrieve only rows that are available in both two sets. This removes duplicates if found. It basically behaves as DISTINCT INTERSECT. If duplicates to be included to the result, a workaround is required and it can be easily done with ROW_NUMBER ranking function. All required columns need to be added under PARTITION BY clause and SELECT < constant> needs to be added for ODER BY clause for instructing SQL Server to not consider the order of the window. This generates aN unique number of all records, making them available in the final resultset of INTERSECT. Here is a sample code for this;
- USE tempdb
- GO
- -- creating table 1
- CREATE TABLE dbo.InternetSales
- (
- SaleDate datetime NOT NULL
- , Product varchar(100) NOT NULL
- , Amount money NOT NULL
- )
- GO
- -- creating table 2
- CREATE TABLE dbo.ResellerSales
- (
- SaleDate datetime NOT NULL
- , Product varchar(100) NOT NULL
- , Amount money NOT NULL
- )
- GO
- -- inserting values for table 1
- INSERT INTO dbo.InternetSales
- (SaleDate, Product, Amount)
- VALUES
- ('01/01/2014', 'A', 100)
- , ('02/01/2014', 'B', 100)
- , ('03/01/2014', 'B', 100)
- , ('04/01/2014', 'B', 100)
- , ('05/01/2014', 'C', 100)
- , ('05/01/2014', 'C', 100)
- -- inserting values for table 2
- INSERT INTO dbo.ResellerSales
- (SaleDate, Product, Amount)
- VALUES
- ('01/01/2014', 'A', 100)
- , ('02/01/2014', 'B', 100)
- , ('03/01/2014', 'B', 100)
- , ('04/01/2014', 'D', 100)
- , ('05/01/2014', 'D', 100)
- -- checking INTERSECT, will return only A, B
- -- If duplicates were included, should return A, B, B
- SELECT Product FROM dbo.InternetSales
- INTERSECT
- SELECT Product FROM dbo.ResellerSales
- -- Adding ROW_NUMBER for creating unique numbers
- -- And using CTE for removing the number
- -- This returns A, B, B
- WITH cte
- AS
- (
- SELECT
- ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
- , Product
- FROM dbo.InternetSales
- INTERSECT
- SELECT
- ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
- , Product
- FROM dbo.ResellerSales
- )
- SELECT Product FROM cte;
EXCEPT and EXCEPT ALL
EXCEPT returns all distinct rows found in “left” (or first) set that are not found in “right” (or second) set. This discards duplicates too. If duplicates are required, same workaround can be applied. Here is the code for it;
- -- checking EXCEPT, will return only C
- -- If duplicates were included, should return B, C, C
- SELECT Product FROM dbo.InternetSales
- EXCEPT
- SELECT Product FROM dbo.ResellerSales
- -- Adding ROW_NUMBER for creating unique numbers
- -- And using CTE for removing the number
- -- This returns B, C, C (Duplicates)
- WITH cte
- AS
- (
- SELECT
- ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
- , Product
- FROM dbo.InternetSales
- EXCEPT
- SELECT
- ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
- , Product
- FROM dbo.ResellerSales
- )
- SELECT Product FROM cte;
- -- Cleaning
- DROP TABLE dbo.InternetSales
- DROP TABLE dbo.ResellerSales
No comments:
Post a Comment