A Sub query is a SELECT statement that is embedded to another query. Or in other words, a SELECT statement that is nested to another SELECT. Or in a simplest way, it is a query within a query. This posts speaks about types related and terms used with sub queries.
What are Inner queries and Outer queries?
Once a SELECT is written within another SELECT, the one written inside becomes the INNER QUERY. The query that holds the inner query is called as OUTER QUERY. See the below query. The query refers the Sales.Customer table is the Outer Query. The query written on Sales.SalesOrderHeader is the Inner Query.
- SELECT *
- FROM Sales.Customer
- WHERE CustomerID = (SELECT TOP (1)
- CustomerID
- FROM Sales.SalesOrderHeader
- ORDER BY SubTotal DESC)
What are Scalar, Multi-valued and Table-valued Sub Queries?
Sub queries can be categorized based on their return type. If the query returns a single value, it becomes a scalar sub query. Scalar sub query behaves as an expression for the outer query and it can be used with clauses like SELECT and WHERE. Sub query produces NULL value if the result of it is empty and equality operators (=, != , <, etc.) are used with predicates when they are used with WHERE clauses.
Multi-valued sub query still return a single column but it may produce multiple values (can be considered as multiple records) for the column. This is mainly used with IN predicate and based on matching values, predicate either returns TRUE or FALSE.
Table-valued sub query returns a whole table; multiple columns, multiple rows. This is mainly used with derived tables.
Here are some samples for these three types;
- -- Get all orders from last recorded customer
- -- This sub query is a scalar sub query
- SELECT *
- FROM Sales.SalesOrderHeader
- WHERE CustomerID = (SELECT MAX(CustomerID)
- FROM Sales.Customer)
- -- Get top Customers
- -- This sub query is a multi-valued sub query
- SELECT *
- FROM Sales.Customer
- WHERE CustomerID IN (SELECT CustomerID
- FROM Sales.SalesOrderHeader
- WHERE SubTotal > 100000)
- -- Get order amount for years and months
- -- This sub query is a table-valued sub query
- SELECT ROW_NUMBER() OVER (ORDER BY d.OrderYear, d.OrderMonth)
- , d.OrderYear, d.OrderMonth
- , d.OrderAmount
- FROM
- (SELECT YEAR(OrderDate) OrderYear
- , MONTH(OrderDate) OrderMonth
- , SUM(SubTotal) OrderAmount
- FROM Sales.SalesOrderHeader
- GROUP BY YEAR(OrderDate), MONTH(OrderDate)
- ) d
What are Self-Contained and Correlated Queries?
Correlated sub query is a query that requires an input from its outer query. This sub query is fully dependent on the outer query and cannot be executed without required attributes from outer query. This behavior increases the cost of the execution of the sub query as it needs to be executed for each of row of outer query.
Here are some samples for them.
- -- This returns year total with full total and last year total
- -- First sub query is a Self-Contained Scalar sub query
- -- Second sub query is a Correlated Scalar sub query
- SELECT YEAR(h.OrderDate) OrderYear
- , SUM(h.SubTotal) OrderAmount
- , (SELECT SUM(SubTotal)
- FROM Sales.SalesOrderHeader) TotalOrderAmount
- , (SELECT SUM(SubTotal)
- FROM Sales.SalesOrderHeader
- WHERE YEAR(OrderDate) = YEAR(h.OrderDate) - 1) LastYearTotalAmount
- FROM Sales.SalesOrderHeader h
- GROUP BY YEAR(h.OrderDate)
Great post. One thing that is complicated for a beginner like me is to realize that correlated queries invole the subquery being processed for every line in the outer query. This also helps understand why the EXISTS function works the way it does on "false/true" values.
ReplyDeleteThanks Wille, thanks for the comment.
ReplyDelete