Thursday, August 4, 2016

Generate aggregates for different levels in the same queries without using sub queries

While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did. 

It was something similar to below query;

  1. USE WideWorldImporters;  
  2. GO  
  3.   
  4. SET STATISTICS IO ON;  
  5.   
  6. SELECT YEAR(OrderDate) SalesYear  
  7.  , DATEPART(q, OrderDate) SalesQuarter  
  8.  , MONTH(OrderDate) SalesMonth  
  9.  , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount  
  10.  , (SELECT SUM(ol2.UnitPrice * ol2.Quantity)   
  11.   FROM Sales.Orders o2  
  12.   INNER JOIN Sales.OrderLines ol2  
  13.    ON o2.OrderID = ol2.OrderID  
  14.   WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate)   
  15.    AND  DATEPART(q, o2.OrderDate) =  DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount  
  16.  , (SELECT SUM(ol2.UnitPrice * ol2.Quantity)   
  17.   FROM Sales.Orders o2  
  18.   INNER JOIN Sales.OrderLines ol2  
  19.    ON o2.OrderID = ol2.OrderID  
  20.   WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount  
  21. FROM Sales.Orders o1  
  22.  INNER JOIN Sales.OrderLines ol1  
  23.   ON o1.OrderID = ol1.OrderID  
  24. GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)  
  25. ORDER BY 1, 2, 3;  

You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.


I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.

  1. DBCC FREEPROCCACHE;  
  2.   
  3. DBCC DROPCLEANBUFFERS;  


The quickest solution came to my mind was CTE, hence applied it to the query as below.

  1. USE WideWorldImporters;  
  2. GO  
  3.   
  4. SET STATISTICS IO ON;  
  5.   
  6. WITH YearlySales (SalesYear, YearlySalesAmount)  
  7. AS  
  8. (  
  9.  SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity)  
  10.  FROM Sales.Orders o  
  11.  INNER JOIN Sales.OrderLines ol  
  12.    ON o.OrderID = ol.OrderID  
  13.  GROUP BY YEAR(OrderDate)  
  14. ),  
  15. QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount)  
  16. AS  
  17. (  
  18.  SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity)  
  19.  FROM Sales.Orders o  
  20.  INNER JOIN Sales.OrderLines ol  
  21.    ON o.OrderID = ol.OrderID  
  22.  GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate)  
  23. ),  
  24. MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount)  
  25. AS  
  26. (  
  27.  SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity)  
  28.  FROM Sales.Orders o  
  29.  INNER JOIN Sales.OrderLines ol  
  30.    ON o.OrderID = ol.OrderID  
  31.  GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)  
  32. )  
  33. SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount  
  34. FROM YearlySales y  
  35.  INNER JOIN QuarterlySales q  
  36.   ON y.SalesYear = q.SalesYear  
  37.  INNER JOIN MonthlySales m  
  38.   ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter  
  39. ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;  

And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.


A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.

No comments: