Friday, February 14, 2014

Types of SQL Server Sub Queries: Self-Contained, Correlated, Scalar, Multi-Valued, Table-Valued

A Sub query is a SELECT statement that is embedded to another query. Or in other words, a SELECT statement that is nested to another SELECT. Or in a simplest way, it is a query within a query. This posts speaks about types related and terms used with sub queries.

What are Inner queries and Outer queries?
Once a SELECT is written within another SELECT, the one written inside becomes the INNER QUERY. The query that holds the inner query is called as OUTER QUERY. See the below query. The query refers the Sales.Customer table is the Outer Query. The query written on Sales.SalesOrderHeader is the Inner Query.

  1. SELECT *
  2. FROM Sales.Customer
  3. WHERE CustomerID = (SELECT TOP (1)
  4.                 CustomerID
  5.                 FROM Sales.SalesOrderHeader
  6.                 ORDER BY SubTotal DESC)

What are Scalar, Multi-valued and Table-valued Sub Queries?
Sub queries can be categorized based on their return type. If the query returns a single value, it becomes a scalar sub query. Scalar sub query behaves as an expression for the outer query and it can be used with clauses like SELECT and WHERE. Sub query produces NULL value if the result of it is empty and equality operators (=, != , <, etc.) are used with predicates when they are used with WHERE clauses.

Multi-valued sub query still return a single column but it may produce multiple values (can be considered as multiple records) for the column. This is mainly used with IN predicate and based on matching values, predicate either returns TRUE or FALSE.

Table-valued sub query returns a whole table; multiple columns, multiple rows. This is mainly used with derived tables.

Here are some samples for these three types;

  1. -- Get all orders from last recorded customer
  2. -- This sub query is a scalar sub query
  3. SELECT *
  4. FROM Sales.SalesOrderHeader
  5. WHERE CustomerID = (SELECT MAX(CustomerID)
  6.                 FROM Sales.Customer)
  8. -- Get top Customers
  9. -- This sub query is a multi-valued sub query
  10. SELECT *
  11. FROM Sales.Customer
  12. WHERE CustomerID IN (SELECT CustomerID
  13.                     FROM Sales.SalesOrderHeader
  14.                     WHERE SubTotal > 100000)
  16. -- Get order amount for years and months
  17. -- This sub query is a table-valued sub query
  18. SELECT ROW_NUMBER() OVER (ORDER BY d.OrderYear, d.OrderMonth)
  19.     , d.OrderYear, d.OrderMonth
  20.     , d.OrderAmount
  21. FROM
  22. (SELECT YEAR(OrderDate) OrderYear
  23.     , MONTH(OrderDate) OrderMonth
  24.     , SUM(SubTotal) OrderAmount
  25. FROM Sales.SalesOrderHeader
  26. GROUP BY YEAR(OrderDate), MONTH(OrderDate)
  27. ) d

What are Self-Contained and Correlated Queries? 

Sub query always has an outer query which it is nested with. If the sub query is completely independent and do not require any input from outer query, it is called as a Self-Contained Sub Query. Self-Contained sub query is evaluated once for the outer query and result is used with all records produced by outer query.

Correlated sub query is a query that requires an input from its outer query. This sub query is fully dependent on the outer query and cannot be executed without required attributes from outer query. This behavior increases the cost of the execution of the sub query as it needs to be executed for each of row of outer query.

Here are some samples for them.

  1. -- This returns year total with full total and last year total
  2. -- First sub query is a Self-Contained Scalar sub query
  3. -- Second sub query is a Correlated Scalar sub query
  4. SELECT YEAR(h.OrderDate) OrderYear
  5.     , SUM(h.SubTotal) OrderAmount
  6.     , (SELECT SUM(SubTotal)
  7.         FROM Sales.SalesOrderHeader) TotalOrderAmount
  8.     , (SELECT SUM(SubTotal)
  9.         FROM Sales.SalesOrderHeader
  10.         WHERE YEAR(OrderDate) = YEAR(h.OrderDate) - 1) LastYearTotalAmount
  11. FROM Sales.SalesOrderHeader h
  12. GROUP BY YEAR(h.OrderDate)


Wille said...

Great post. One thing that is complicated for a beginner like me is to realize that correlated queries invole the subquery being processed for every line in the outer query. This also helps understand why the EXISTS function works the way it does on "false/true" values.

Dinesh Priyankara said...

Thanks Wille, thanks for the comment.