Thursday, August 4, 2016

Generate aggregates for different levels in the same queries without using sub queries

While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did. 

It was something similar to below query;

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

SELECT YEAR(OrderDate) SalesYear
 , DATEPART(q, OrderDate) SalesQuarter
 , MONTH(OrderDate) SalesMonth
 , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) 
   AND  DATEPART(q, o2.OrderDate) =  DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount
FROM Sales.Orders o1
 INNER JOIN Sales.OrderLines ol1
  ON o1.OrderID = ol1.OrderID
GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
ORDER BY 1, 2, 3;

You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.


I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;


The quickest solution came to my mind was CTE, hence applied it to the query as below.

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

WITH YearlySales (SalesYear, YearlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate)
),
QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate)
),
MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
)
SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount
FROM YearlySales y
 INNER JOIN QuarterlySales q
  ON y.SalesYear = q.SalesYear
 INNER JOIN MonthlySales m
  ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter
ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;

And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.


A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.

No comments: