Dinesh's Blog :::: Being Compiled ::::: UNION ALL is possible. How about INTERSECT ALL and EXCEPT ALL?

Monday, February 17, 2014

UNION ALL is possible. How about INTERSECT ALL and EXCEPT ALL?

Not expecting duplicates but a significant improvement in performance, we tend to use UNION ALL when no duplicates are guaranteed in combining two sets. The reason is, UNION ALL does not performing an additional task for filtering (or removing) duplicates, hence gives a better performance than UNION. However, the purpose of UNION ALL is not for improving the performance but for producing the result with duplicates if exist. While this was being taught during my classes, a thought came into my mind: Why other set operators such as INTERSECT and EXCEPT do not offer the same built-in functionality like INTERSECT ALL and EXCEPT all?

A usual quick search did not give me the required answer but proved that it is not available as a built-in functionality. There were few possible solutions but the one given by T-SQL expert Itzik Ben-Gan was quite interesting. Here is a small note on INTERSECT and EXCEPT and implementation suggested by Itzik Ben-Gan for INTERSECT ALL and EXCEPT ALL.

INTERSECT and INTERSECT ALL
INTERSECT allows us to retrieve only rows that are available in both two sets. This removes duplicates if found. It basically behaves as DISTINCT INTERSECT. If duplicates to be included to the result, a workaround is required and it can be easily done with ROW_NUMBER ranking function. All required columns need to be added under PARTITION BY clause and SELECT < constant> needs to be added for ODER BY clause for instructing SQL Server to not consider the order of the window. This generates aN unique number of all records, making them available in the final resultset of INTERSECT. Here is a sample code for this;

 USE tempdb
GO
 
-- creating table 1
CREATE TABLE dbo.InternetSales
(
    SaleDate datetime NOT NULL
    , Product varchar(100) NOT NULL
    , Amount money NOT NULL
)
GO
 
-- creating table 2
CREATE TABLE dbo.ResellerSales
(
    SaleDate datetime NOT NULL
    , Product varchar(100) NOT NULL
    , Amount money NOT NULL
)
GO
 
-- inserting values for table 1
INSERT INTO dbo.InternetSales
    (SaleDate, Product, Amount)
VALUES
    ('01/01/2014', 'A', 100)
    , ('02/01/2014', 'B', 100)
    , ('03/01/2014', 'B', 100)
    , ('04/01/2014', 'B', 100)
    , ('05/01/2014', 'C', 100)
    , ('05/01/2014', 'C', 100)
 
-- inserting values for table 2
INSERT INTO dbo.ResellerSales
    (SaleDate, Product, Amount)
VALUES
    ('01/01/2014', 'A', 100)
    , ('02/01/2014', 'B', 100)
    , ('03/01/2014', 'B', 100)
    , ('04/01/2014', 'D', 100)
    , ('05/01/2014', 'D', 100)
 
-- checking INTERSECT, will return only A, B
-- If duplicates were included, should return A, B, B
SELECT Product FROM dbo.InternetSales
INTERSECT
SELECT Product FROM dbo.ResellerSales
 
-- Adding ROW_NUMBER for creating unique numbers
-- And using CTE for removing the number
-- This returns A, B, B
WITH cte
AS
(
    SELECT
        ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number 
        , Product 
    FROM dbo.InternetSales
    INTERSECT
    SELECT 
        ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number 
        , Product 
    FROM dbo.ResellerSales
)
SELECT Product FROM cte;
 

EXCEPT and EXCEPT ALL
EXCEPT returns all distinct rows found in “left” (or first) set that are not found in “right” (or second) set. This discards duplicates too. If duplicates are required, same workaround can be applied. Here is the code for it;

 -- checking EXCEPT, will return only C
-- If duplicates were included, should return B, C, C
SELECT Product FROM dbo.InternetSales
EXCEPT
SELECT Product FROM dbo.ResellerSales
 
-- Adding ROW_NUMBER for creating unique numbers
-- And using CTE for removing the number
-- This returns B, C, C (Duplicates)
WITH cte
AS
(
    SELECT
        ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number 
        , Product 
    FROM dbo.InternetSales
    EXCEPT
    SELECT 
        ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number 
        , Product 
    FROM dbo.ResellerSales
)
SELECT Product FROM cte;
 
-- Cleaning
DROP TABLE dbo.InternetSales
DROP TABLE dbo.ResellerSales
 

Monday, February 17, 2014

UNION ALL is possible. How about INTERSECT ALL and EXCEPT ALL?

No comments: