While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did.
It was something similar to below query;
USE WideWorldImporters; GO SET STATISTICS IO ON; SELECT YEAR(OrderDate) SalesYear , DATEPART(q, OrderDate) SalesQuarter , MONTH(OrderDate) SalesMonth , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) FROM Sales.Orders o2 INNER JOIN Sales.OrderLines ol2 ON o2.OrderID = ol2.OrderID WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) AND DATEPART(q, o2.OrderDate) = DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) FROM Sales.Orders o2 INNER JOIN Sales.OrderLines ol2 ON o2.OrderID = ol2.OrderID WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount FROM Sales.Orders o1 INNER JOIN Sales.OrderLines ol1 ON o1.OrderID = ol1.OrderID GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate) ORDER BY 1, 2, 3;
You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.
I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.
I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.
DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
The quickest solution came to my mind was CTE, hence applied it to the query as below.
USE WideWorldImporters; GO SET STATISTICS IO ON; WITH YearlySales (SalesYear, YearlySalesAmount) AS ( SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity) FROM Sales.Orders o INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID GROUP BY YEAR(OrderDate) ), QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount) AS ( SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity) FROM Sales.Orders o INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate) ), MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount) AS ( SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity) FROM Sales.Orders o INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate) ) SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount FROM YearlySales y INNER JOIN QuarterlySales q ON y.SalesYear = q.SalesYear INNER JOIN MonthlySales m ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;
And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.
A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.
No comments:
Post a Comment