Dinesh's Blog :::: Being Compiled ::::: Query Processing

Showing posts with label Query Processing. Show all posts

Saturday, September 17, 2016

How SQL Server finds the Execution Plan in the Plan Cache Store

You know that SQL Server creates a plan for your query and keeps them in the cache for reusing when it is required. When you make a request again, how SQL Server knows that the plan is exist in the cache or not? What sort of techniques it uses for finding the plan? Let's try to understand it.

There are four main plan cache stores. These plan cache stores maintain different types of plans and plans are reused when the same request is made by users or plans are discarded when SQL Server needs space in the store (Let's discuss the techniques it uses for discarding with another post). These for stores are;

Object plans store for storing plans for procedures, functions and triggers.
SQL plans store for storing plans for ad-hoc queries.
Bound Trees plan store for storing structures created by Algebrizer for views, defaults and constraints.
Extended Stored Procedure plan store for storing plans for Extended Stored Procedures.

A store is a Hash Table and it contains series of Buckets. A Bucket contains zero or more plans and the Hash value is calculated;

Using databaseId and objectId for Object plan, Bound Tree plan and Extended Stored Procedure plan stores.
Using databaseId and the statement for SQL plan store.

The assignment of execution plan to store bucket is done using the Hash Value. The Plan Handle (or Cache Key) which is made up with various attributes, is used for referencing the plan in the cache.

When you make a request to SQL Server, it does following steps for find the plan in the cache;

Generate the Hash Value for the request for the appropriate store.
Search all execution plans in the bucket along with the cache key.
If found, it uses it, else create a new plan.

For understanding the process, let's check with a simple code;

-- Clear the cache
DBCC FREEPROCCACHE

-- Checking the number of buckets for each store
-- and how many have been used.
SELECT cc.name, buckets_count, buckets_in_use_count
FROM sys.dm_os_memory_cache_hash_tables AS ht 
 JOIN sys.dm_os_memory_cache_counters AS cc 
  ON ht.cache_address = cc.cache_address 
WHERE cc.name IN ('Object Plans','SQL Plans','Bound Trees','Extended Stored Procedures');

SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate = '01/06/2014';

SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate = '01/07/2014';

-- Check the cached plans
-- You will see two plans for above two statements
SELECT p.*, t.text
FROM sys.dm_exec_cached_plans p  
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t  
WHERE --p.usecounts > 0 AND  
 text LIKE '%SELECT * FROM Sales.SalesOrderHeader%'  
 AND text NOT LIKE '%SELECT p.*%'
ORDER BY p.usecounts DESC;

-- Check how the cache key is formed
-- Note the two bucket ids
-- And note that only object id is different
-- For SQLs, it is the internal hash of the batch text
-- not an id in sys.objects table
SELECT p.bucketid, t.text  , ph.*
FROM sys.dm_exec_cached_plans p  
 CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t  
 CROSS APPLY sys.dm_exec_plan_attributes(p.plan_handle) AS ph
WHERE --p.usecounts > 0 AND  
 text LIKE '%SELECT * FROM Sales.SalesOrderHeader%'  
 AND text NOT LIKE '%SELECT p.*%'
 AND is_cache_key = 1
ORDER BY p.usecounts DESC;

Friday, September 16, 2016

Warnings in SQL Server Query Execution Plans

Have you seen Warnings on operators in Query Execution Plans?

Warnings indicate that Query Optimizer has found something that might affect the performance of the query and you need to work on it. There are various reasons for warnings but typically warning is displayed when;

values used with WHERE or JOIN clauses are not same data type. This might prevent statistics from being used correctly and might display
database is with AUTO CREATE STATISTICS OFF and column used with the predicate has no statistics.
no ON clause (or equivalent) when two or more tables are referenced in the query.

Here are some code samples on them.

The first code segment shows how warning is shown when data types are not match with JOIN.

USE AdventureWorks2014;
GO

-- create new header table
SELECT *
INTO OrderHeader
FROM Sales.SalesOrderHeader;

-- create new details table
SELECT *
INTO OrderDetails
FROM Sales.SalesOrderDetail;

-- change the data type of SalesOrderID in OrdeDetails table
ALTER TABLE OrderDetails
ALTER COLUMN SalesOrderID varchar(10);

-- Enable Execution Plan
-- Execute the query and see
SELECT *
FROM OrderHeader h
 INNER JOIN OrderDetails d
  ON h.SalesOrderID = d.SalesOrderID

The second code shows how warning is shown when no statistics available.

-- create a new database
CREATE DATABASE TestDatabase
GO

-- turn off auto create statistics
ALTER DATABASE TestDatabase SET AUTO_CREATE_STATISTICS OFF;

-- connect with the database
USE TestDatabase;
GO

-- create a new table
SELECT *
INTO OrderHeader
FROM AdventureWorks2014.Sales.SalesOrderHeader;

-- create a clustered index on newly created table
CREATE CLUSTERED INDEX IX_Sales ON dbo.OrderHeader (SalesOrderID);

-- Execute the query and see
-- Since no statistics on OrderDate column
-- and it does not create
-- you should see a warning
SELECT *
FROM dbo.OrderHeader
WHERE OrderDate = '2014-01-01';

The third shows how warning is shown when no ON clause for tables referenced.

USE AdventureWorks2014;
GO
-- Execute the query and see
-- Since no ON clause, warning is displayed
SELECT *
FROM Production.Product, Production.ProductSubcategory;

Thursday, September 15, 2016

SQL Server: How Optimizer creates a plan for your query

If you have not read previous two posts related this, please have a look on them first;

When a query is submitted, it goes through various phases and finally it comes to Query Optimizer as an Algebrizer Tree including details of physical objects related to the query. The Query Optimizer uses details such as Object Schema (indexes, constraints, data types) and Object Statistics for making plans for the query.

Optimization Phases

In order to reduce the time it takes for creating plans and evaluating them for finding the best, Query Optimizer goes through three phases. There three phases makes sure that it avoids complex plans as much as possible and goes to phases only if it is required. Three phases are:

Simplification
This tries to make the plan simplify for making the optimization quicker. This does simplifications such as Converting sub-queries to joins, Removing redundant joins.
Trivial plan generation
For some queries, there will only few possible plans exist. When Query Optimizer experiences it, it picks a single plan without evaluating other plans.
Full optimization
It comes to this phase if the query has many number of plans and they have to be evaluated for selecting the best. This is called as Full cost-based optimization as well. However, in certain scenario, Query Optimizer does not evaluate all plans because of the cost on resource usage related to evaluation.

In order to select plans for evaluation, it uses Transformation Rules. The combination of Logical Query Tree (or Algebrizer Tree) and Transformation Rules results an internal structure called Memo. It will be used for calculating the cost.

There can be an instance that many number of plans are created and even with Transformation Rules, it takes long time for evaluating. Then, SQL Server tries to handle the optimization with three stages; Search 0 - Transaction Processing Phase, Search 1 - Quick Plan Phase, and Search 2 - Full Optimization Phase. When it goes through these stages, if the stage does not return a plan, then it goes to the next one. If none of the stages return a plan, then it picks the plan with lowest cost.

Here are some code samples that explain some phases;

When this code is executed, it ignores Production.ProductCateory table as it is not required. The reason is the Foreign Key Constraint exit between Production.ProductSubCategory and Production.ProductCategory and as per the JOIN condition, it says that no records can exist in the Production.ProductSubCategory without a record in Production.ProductCategory. Therefore Optimizer ignores it. This is an example for Simplification.

Here is another example for simplification. This query has a condition on the Status column and Status column has a check constraint set saying that it should be between 0 and 8. Query Optimizer knows that there cannot be record exist with Status 10 because of the constraint, hence it simplify the query.

Here is one more query that shows the second phase which is Trivial Plan Generation. There are not much options in terms of multiple plans, hence one will be immediately selected.

Wednesday, September 14, 2016

SQL Server Physical Phases of Query Processing

I made a small video that shows Logical Phases of Query Processing and published with the post called SQL Server Logical Phases of Query Processing. Not only logical phases, knowing Physical Phases would definitely help you on Query Optimization because it tells how SQL Server makes plans based on various rules and techniques. Let's try to understand the Physical Phases;

Parsing
This is the first phase of Query Processing. This phase validates the query syntax and parses it into a Logical Query Tree. The sequence of items in the Logical Query Tree is based on Logical Phases describes with my above mentioned post. If SQL Server finds some syntax errors, then error is thrown without continuing. If no error found, then it goes to the next phase with the Logical Query Tree created. Note that this is also referred as Parse Tree and the component handles this is called as Command Parser.
Binding
This phase checks and see whether objects in the Logical Query Tree are exist and user has sufficient permissions on them. If objects are not exist or user has no permission, then it throws an error and stops the execution. If you see invalid object name error, it means that your query has stopped at Binding.

If everything works fine, then the output of this phase which is the Algebrizer Tree that is the Logical Query Tree bound to objects is created. This process is handled by Algebrizer.
Query Optimization
The Query Optimizer is responsible for this phase. It takes the Algebrizer Tree and uses it with supportive components like Object Schema that are constraints, indexes and data types, and Object Statistics. This makes multiple plans with various steps and picks the best plan for the query.
Query Execution
Finally Query Executor executes the query based on the plan created by Query Optimizer. Output of this phase is the result we expect.

Let's talk about more on Query Optimizer with the next post.

Tuesday, September 6, 2016

SQL Server Logical Phases of Query Processing

Here is how SQL Server process clauses in your SELECT statement;