While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did.
It was something similar to below query;
- USE WideWorldImporters;
- GO
- SET STATISTICS IO ON;
- SELECT YEAR(OrderDate) SalesYear
- , DATEPART(q, OrderDate) SalesQuarter
- , MONTH(OrderDate) SalesMonth
- , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount
- , (SELECT SUM(ol2.UnitPrice * ol2.Quantity)
- FROM Sales.Orders o2
- INNER JOIN Sales.OrderLines ol2
- ON o2.OrderID = ol2.OrderID
- WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate)
- AND DATEPART(q, o2.OrderDate) = DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount
- , (SELECT SUM(ol2.UnitPrice * ol2.Quantity)
- FROM Sales.Orders o2
- INNER JOIN Sales.OrderLines ol2
- ON o2.OrderID = ol2.OrderID
- WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount
- FROM Sales.Orders o1
- INNER JOIN Sales.OrderLines ol1
- ON o1.OrderID = ol1.OrderID
- GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
- ORDER BY 1, 2, 3;
You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.
I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.
I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.
- DBCC FREEPROCCACHE;
- DBCC DROPCLEANBUFFERS;
The quickest solution came to my mind was CTE, hence applied it to the query as below.
- USE WideWorldImporters;
- GO
- SET STATISTICS IO ON;
- WITH YearlySales (SalesYear, YearlySalesAmount)
- AS
- (
- SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity)
- FROM Sales.Orders o
- INNER JOIN Sales.OrderLines ol
- ON o.OrderID = ol.OrderID
- GROUP BY YEAR(OrderDate)
- ),
- QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount)
- AS
- (
- SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity)
- FROM Sales.Orders o
- INNER JOIN Sales.OrderLines ol
- ON o.OrderID = ol.OrderID
- GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate)
- ),
- MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount)
- AS
- (
- SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity)
- FROM Sales.Orders o
- INNER JOIN Sales.OrderLines ol
- ON o.OrderID = ol.OrderID
- GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
- )
- SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount
- FROM YearlySales y
- INNER JOIN QuarterlySales q
- ON y.SalesYear = q.SalesYear
- INNER JOIN MonthlySales m
- ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter
- ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;
And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.
A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.
No comments:
Post a Comment