Sunday, February 16, 2014

Best way to find records exist in one set that do not appear in other set: NOT IN | LEFT OUTER | EXCEPT

Finding records exist only in one set that do not exist in another is a common requirement in database developments. Finding customers who have not placed orders, finding products that have not been purchased by any customers are general examples for it. There are many difference ways of obtaining the required record set and the mentioned methods/operators in the title are the commonly used ones for achieving this. However which gives the better performance is questionable, hence let’s analyze them and see.

Let’s try to get all products that have not been purchased by customers from AdventureWorks database. Have a look on all three SELECTs. They return product ids from Production.Product table which are not exist in Sales.SalesOrderDetail table.

  1. USE AdventureWorks2012
  2. GO
  3.  
  4. -- Finding unsold products using NOT IN
  5. SELECT p.ProductID
  6. FROM Production.Product p
  7. WHERE p.ProductID NOT IN (SELECT ProductID from Sales.SalesOrderDetail)
  8. ORDER BY 1
  9.  
  10. -- Finding unsold products using LEFT OUTER JOIN
  11. SELECT p.ProductID
  12. FROM Production.Product p
  13.     LEFT OUTER JOIN Sales.SalesOrderDetail d
  14.         ON p.ProductID = d.ProductID
  15. WHERE d.ProductID IS NULL
  16. ORDER BY 1
  17.  
  18. -- Finding unsold products using EXCEPT
  19. SELECT p.ProductID
  20. FROM Production.Product p
  21. EXCEPT
  22. SELECT ProductID
  23. FROM Sales.SalesOrderDetail
  24. ORDER BY 1

All three produce the same result;

image

Let’s analyze the execution plans of all three. Note the “Query Cost – Relative to the batch” too.

image

As you see, plans for NOT IN and EXCEPT are same and the performance of them are good but LEFT OUTER is different. LEFT OUTER has used “Merge Join” whereas other two have used “Nested Loops” which is low cost join. In this scenario, LEFT OUTER does not offer much benefits but NOT IN and EXCEPT give better performance.

However, this behavior is not guaranteed with all scenario hence we cannot conclude that NOT IN and EXCEPT provide better performance rather than LEFT OUTER. This is totally depend on factors such as index availability and number of records. Therefore, best way is, trying with all ways and pick the best for the situation. In addition to the mentioned methods, there are few more popular ways such as NOT EXISTS and OUTER APPLY. All these can be used for retrieving the required result, however, as mentioned above, best way can be determined only by trying the same with all the ways.

No comments: