Dinesh's Blog :::: Being Compiled ::::: SQL Server 2008

Showing posts with label SQL Server 2008. Show all posts

Thursday, April 21, 2011

Does Referential Integrity improve performance?

I believe that everybody knows that Referential Integrity is for maintaining the relationship between two entities; Parent and Child. It makes sure that no child records is exist without a parent. Referential Integrity is implemented with Foreign Keys. The foreign key column can be set as Nullable, in that case it allows to have a child without a parent Smile .

While I was going through my one of my old presentations, I found a slide related to this (yes, it is good to keep old presentations, human forgets things). Let me explain it with a code;

The above code is executed against AdventureWorks2008 database. Note that it gets records only from Detail table even though Header table is joined. You may say that this join is useless because it does not get data from Header. True, but see, though we have linked the header to detail, as per the plan, SQL Server does not access the header. It knows that referential integrity is implemented and existence of parent records is guaranteed, hence it does not need to access the header. Now let’s remove the relationship between header and detail, and execute the same.

Query is same but in this case, SQL Server scans the header table. The reason is it does not know whether records are exist in header for every child, it accesses the header though there is no request for values in header. Can you see, referential integrity is not only for making sure that database is consistence, it improves the performance too.

Friday, January 14, 2011

MCTS: SQL Server Database Administrator – Maintaining a MS SQL Server Database 2008 – 6231A Course

Course 6231A: Maintaining a SQL Server 2008 Database – This is my first class for 2011. Class will be started on Sunday, 23rd January 2011. If you are a DBA or interest on SQL Server Administration, join with us. Some topics that are covered with this course are Installing and Configuring, Security, Back up strategies, Disaster Recovery, High Availability, Replication and data transferring (see more on http://www.microsoft.com/learning/en/us/course.aspx?id=6231A).

Good news! Microsoft gives 40% discount for students for Microsoft exams. All you have to do is, go to an authorized Centre and prove that you are a student of any university (both private and government), then you get 40% discount for any exam. Good Luck!

Monday, October 25, 2010

Plan Caching in SQL Server 2008

I got many questions on the post I published “Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?”, seems many are interesting on it. While looking for some info related to it, I found an interesting technical article that speaks about Plan Caching written by Greg Low. Read it, it explains everything well;

http://msdn.microsoft.com/en-us/library/ee343986.aspx

Thursday, October 21, 2010

Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?

While Preethi was presenting his presentation on Query Plans at the SQL Server Sri Lanka User Group meeting, he showed a situation where query plan gets recompiled when a value set for WHERE clause is vary. Although it recompiles, I doubt that it will not be the same if the query is a parameterized query with a stored procedure. There were couple of arguments on this favoring both sides; thought to test it and see.

Let me take simple two queries. The queries below are same except the value passed for the WHERE clause.

     SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= 1

    SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= 1000

SQL Server selects one plan for the first one but it does not use the same for the next though it is same.

This shows clearly that the statement has been recompiled since the value was different. But this is not going to happen if the logic is implemented with a stored procedure;

    CREATE PROC GetOrderDetails @Number int

    AS

        SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= @Number

    GO

The plan for the procedure will be generated at the first execution. Once generated, the generated plan will be used for next executions even though the value of parameters are vary greatly. See the code below;

    EXEC GetOrderDetails 1

    EXEC GetOrderDetails 1000

You can see that the same plan which was generated for the first execution has been used for second execution even though it is not the best plan for the value 1000. All we have to remember is, stored procedures are not getting re-compiled when values of parameters are vary. If you expect a situation like this, use WITH RECOMPILE with the stored procedure. This instructs SQL Server to not to stored the plan generated, hence it creates a plan at every execution. If it is rare, use WITH RECOMPILE when you call the stored procedure. This creates the plan again when it is executed.

    ALTER PROC GetOrderDetails @Number int

    WITH RECOMPILE

    AS

    ...

    -- OR

    EXEC GetOrderDetails 1000

    WITH RECOMPILE

The next is whether the stored procedure gets recompiled when an index is added to a table used in it. See what BOL says;

It clearly says that the stored procedure will not get recompiled when an index is added to a table used with it. But what I found is the opposite of it. Sometime one of the statements gets compiled, sometime the entire stored procedure gets compiled. This is how I tested;

This code creates a new table and then creates a stored procedure. The execution of the stored procedure creates a plan and it is used for subsequent execution.

    SELECT *

    INTO TempSalesOrderDetail

    FROM Sales.SalesOrderDetail

    CREATE PROC GetTempOrderDetails

    AS

        SELECT SalesOrderDetailID FROM TempSalesOrderDetail 

        ORDER BY SalesOrderDetailID

    GO

    EXEC GetTempOrderDetails

Here is the generated plan for it.

As per the BOL, stored procedure does not get recompiled when an index is added…

    CREATE INDEX IX_1 ON TempSalesOrderDetail (SalesOrderDetailID)

    GO

    EXEC GetTempOrderDetails

Here is the plan now.

Now the plan is different, means the stored procedure has been recompiled. There might be a situation where the behavior as same as BOL says, will see whether we can find a such situation.

Wednesday, September 29, 2010

SQL Server 2008 Service Pack 2 is available for downloading

SQL Server 2008 SP 2 is available for downloading now. Enhancements includes;

15K partitioning Improvement
Reporting Services in SharePoint Integrated Mode
SQL Server 2008 R2 Application and Multi-Server Management Compatibility with SQL Server 2008
Data-tier Application (DAC) Support

Read more info on this and download from here.

Table variable slower than Temporary table???

I used to use table variables when the resultset is smaller and want to hold it temporarily. The main reason for this selection is, data manipulation with table variables is faster than manipulation with temporary tables when the container contains a smaller amount of data. There were some instances, specially on JOINs, table variables show slowness, have been thinking to investigate but had no time. Thought start investigating on it again when I saw this post of Jack Li.

Simple Test
Let me start with an example, similar to an issue I had. The following code creates a table variable and loads sales data into it. Later, the table variable joins with a permanent table, to get a resultset. Note that there is no logic on the query, this is just for testing. Turn Include Actual Execution Plan on and run the query below.

     DECLARE @TestTable TABLE 

    (

        SalesOrderId int 

        , SalesOrderDetailID int

        , LineTotal money 

        , PRIMARY KEY (SalesOrderId, SalesOrderDetailID)

    )

    INSERT INTO @TestTable

    SELECT SalesOrderID

        , SalesOrderDetailID

        , LineTotal 

    FROM Sales.SalesOrderDetail

    SELECT t.SalesOrderId, s.UnitPrice

    FROM @TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

Now investigate the execution plan for SELECT query. See Estimated Number of Rows, it shows as 1 which is incorrect. This causes Optimizer to use Nested Loops instead of Merge Join and reduces the performance of data retrieval.

Is Merge Join better than Nested Loops?
According to the BOL, Query Optimizer determines the cost of query plan and selects based on two factors;

The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
The cost model of the algorithm dictated by the operators used in the query.

The second factor depends on first one. Therefore having a correct cardinality will make sure that faster execution plan is generated. As the plan generated for our code does not provide a correct cardinality, the algorithm used for joining is not a optimized way to join two results.

Optimizer selects Nested Loop if one of tables contain smaller number of records, based on Estimated Number of Rows, that is the reason for selecting Nested Loop algorithm. But the reality is, number of records in variable table is not small (in this case).

Merge Join works differently. Usually Optimizer selects this algorithm when both resultsets are sorted, in our case, yes, it is. The matching operation for this join works differently and it is usually faster than Nested Loop. For more info, read this.

Using #Temporary Table
The below code does the same, only different is a temporary table has been used instead of table variable.

    USE AdventureWorks2008

    GO

    CREATE TABLE #TestTable 

    (

        SalesOrderId int 

        , SalesOrderDetailID int

        , LineTotal money

        ,CONSTRAINT pk_TestTable PRIMARY KEY

            (SalesOrderId, SalesOrderDetailID)

    )

    INSERT INTO #TestTable

    SELECT SalesOrderID

        , SalesOrderDetailID

        , LineTotal 

    FROM Sales.SalesOrderDetail

    SELECT t.SalesOrderId, s.UnitPrice

    FROM #TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

Now you can see that cardinality of the plan is correct and Merge Join is used as the algorithm for joining two results.

A reason for wrong “Estimated Number of Rows” with table variable
As Jack Li explains, when the code is compiled as a batch, since the table variable does not contain data at the initial stage, optimizer makes an assumption, saying that table variable contains very low number of data, resulting a plan with Nested Loop. One other reason could be unavailability of statistics. SQL Server does not maintain statistics for table variables.

Solving the issue
Again, solution from Jack Li, all we have to do is, recompiling the SELECT before executing the SELECT code. At this time table variable contains data, hence Estimated Number of Rows is correctly determined. See the code below;

    SELECT t.SalesOrderId, s.UnitPrice

    FROM @TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

    OPTION (RECOMPILE)

There will be a cost for recompiling, hence always compare the cost and the improvement, then select the best way.

Friday, September 10, 2010

SQL Server data compression: PAGE

Although the storage cost is going down, we still think about storage optimization because once optimized, it is not only about the cost, it improves many things. One of the features available with SQL Server 2008 Enterprise Edition (and Developer Edition) is data compression. SQL Server supports compression on both ROW and PAGE; this discusses data compression on pages.

The PAGE compression minimizes the storage requirement for redundant data. This compression is applied for each page. Once enabled, SQL Server goes through two techniques; Prefix Compression and Dictionary Compression.

Prefix Compression: This identifies a value for each column that can be used for compression. Then a row that contains values for each column is created below the page header. Next, values in columns will be replaced with the reference to the value in newly created row. Note that column can only reference a value created for it, not for other columns. See the image taken from BOL. It shows a table with three columns. You can see how the column values are fully and partially linked with the references.

Dictionary Compression: This compresses the page further, finding the repeated values anywhere in the page and storing them in same way. Unlike prefix compression, any column value can reference any value added, not limiting to the value created for it. See the image taken from BOL.

Now let’s see how we enable this. Create the following table and insert records as given.

     CREATE TABLE TestTable

    (

        Id int IDENTITY(1,1) PRIMARY KEY

        , Value varchar(50) NOT NULL DEFAULT ('This is the default value')

    )

    GO

    INSERT INTO TestTable

        (Value) VALUES (DEFAULT)

    GO 100000

Retrieve all data, enabling STATISTICS IO on. You will notice that data is held with 523 pages.

sp_estimate_data_compression_savings: This system stored procedure allows us to determine the space can be saved when compression is applied. Execute the code given and analyze the result. First stored procedure can be used to find out the space it currently uses. Second shows the estimate data compression savings.

    -- check the space used

    sp_spaceused 'TestTable'

    -- determine data compression savings

    EXEC sp_estimate_data_compression_savings 'dbo', 'TestTable', NULL, NULL, 'PAGE'

The sp_spaceused says that 4168KB is required for storing data. Note the column “size_with_requested_compression_settings” coming from second stored procedure. It says that the size after compression would be 1096KB, encouraging us to go ahead with compression.

Enabling PAGE compression: This can be done with either CREATE TABLE or ALTER TABLE. Use below code for compressing the table. You can pass NONE, ROW, or PAGE for DATA_COMPRESSION.

    ALTER TABLE TestTable

    REBUILD WITH (DATA_COMPRESSION = PAGE)

Now the data is compressed. Check retrieving records with STATISTICS IO on, you will notice that only 139 pages are used now. Check the space used too. Has it gone down to 1096KB?

Disadvantages: Are there? Yes, there is. This increases the CPU cost for both reading and modifying. If you are to enable, consider this factor too.

Tuesday, July 27, 2010

SQL Server Sleeping Status and Connection Pooling

Are you seeing many “sleeping” status connections in your SQL Server? If so, have you ever thought to find out the reason for it? Is it normal? Does it indicate some sort of a problem related connection?

This topic was discussed yesterday when I was at one of clients sites. They had seen this with their SQL Server and had already been sorted out. Understanding the standard behavior and making sure everything goes properly would help us all for making the solution efficient, hence making this post.

Sleeping means that the connection is established and not closed, not active, and waiting for a command. This is normal because most of application keep the connection even though the business is done, in order to reduce the cost of opening and closing connections. The main purpose of these connections is, reusability. For example, if an application uses connections for retrieving data, the cost of the connection establishment can be minimized if existing one can be used without creating one again. Maintaining a lot of sleeping connection is an overhead? In a way, yes it is, though it is comparatively low. If everything is properly coded, you should not see many sleeping connection. Are we seeing sleeping connection which cannot be reused? Yes, there could be, and we should avoid such situation.

Testing with a Console Application
K, Let’s see how it happens. The code given below is written as a console application. It opens a connection and executes a command. Finally it closes the connection.

        1: SqlConnection sqlConnection = new SqlConnection("Server=(local);Database=TestDatabase;UId=Test;Pwd=password;");

       2: sqlConnection.Open();

       3:  

       4: SqlCommand sqlCommand = new SqlCommand("select id from dbo.Employee", sqlConnection);

       5: int id = (int)sqlCommand.ExecuteScalar();

       6: sqlConnection.Close();

       7: Console.WriteLine("Press enter to close.");

       8: Console.ReadLine();

       9:

Start debugging the code. Execute up to 3rd line. Now you have opened the connection. Let’s see how SQL Server maintains this connection now. Open Query Window and run sp_who for user Test.

Note that the connection is established and it is under master database, though we made connection to TestDatabase. Now execute the .NET code up to 6th line. Do not execute the 6th line. Run sp_who again and see.

Now the connection is set for correct database and command is executed too. Still connection is maintained by SQL server and waiting for a command because SQL Server has not received instructions for closing this. Now execute the 6th line, in order to close the connection. Run sp_who again and see. Has it been removed? NO, connection is still there with the status of sleeping. The reason is ADO.NET has not instructed to remove the connection yet, hence SQL Server still maintains it. Complete the execution and see again. Now you are not going to see the connection.

Am I supposed to call “Dispose” method?
Now we have a question. We closed the connection with sqlConnection.Close() method but instruction has not gone SQL Server. Should we call Dispose()? Have sqlConnection.Dispose(); between 6th and 7th lines and run the code again. Monitor the connection with sp_who as we did with previous exercise. Check the connection once newly added line is executed (before completing the full execution). Are you still seeing the connection, even after calling Dispose method? Yes, you should see it.

What we can understand with this testing is, the connection is maintained until we completely close the application. In other words, connection is removed when the objects are garbage collected (my assumption here is, dispose method does not immediately garbage collection the object). Why is that? This is the way to achieve reusability. ADO.NET can reuse the connection, without recreating, if need it again. See the below code.

       1: SqlConnection sqlConnection = new SqlConnection("Server=(local);Database=TestDatabase;UId=Test;Pwd=password;");

       2: sqlConnection.Open();

       3:  

       4: SqlCommand sqlCommand = new SqlCommand("select id from dbo.Employee", sqlConnection);

       5: int id = (int)sqlCommand.ExecuteScalar();

       6: sqlConnection.Close();

       7:  

       8: sqlConnection = new SqlConnection("Server=(local);Database=TestDatabase;UId=Test;Pwd=password;");

       9: sqlConnection.Open();

      10:  

      11: sqlCommand = new SqlCommand("select 1", sqlConnection);

      12: id = (int)sqlCommand.ExecuteScalar();

      13: sqlConnection.Close();

      14:  

      15: Console.WriteLine("Press enter to close.");

      16: Console.ReadLine();

In this code, we make two connections. At the 9th line we open another connection. Once the 9th line completed, run sp_who again and see. Are there two connections? No, ADO.NET uses the same connection which was used for the first command. It uses connections available in the pool. Yes, we call it as Connection Pool, or the process is referred as Connection Pooling. One factor to determine whether the connection can be reused is, connection string. The connection string used for the new connection should match with existing connections in the pool. If ADO.NET cannot find a matching one, new connection will be added to the pool and it will be used for executing the command. That’s how new connections get added to the pool. That is one factor!

Connection string is same but it is not used, it adds a new one
There is another factor for adding a new connection to the pool without using an existing one. This can happen even with the same connection string. Remove 6th line from above code (closing code) and debug. Once the 9th line is executed, check for connections. This is what I see;

New connection has been created, means the one we created before cannot be used. The reason for it is, we have not closed the connection, it is as an “unusable-connection”. If the connection is not closed, it cannot be reused even though the connection string is same. So, this could be one of the reasons for seeing many sleeping connections. In this case, it is NOT normal and wasting resources in the system. If you continue with same type of codes, you will end up with many number of sleeping connections. So make sure you close your connection once the command is executed.

Making sure that connection are reused
Is there a way to see whether sleeping connections are being used? Yes, there is way. It can be monitored with SQL Server Profiler. Open SQL Server Profiler and connect with your SQL Server instance. Use the standard template which include RPC:Completed for Stored Procedures. Select all columns (You may have to uncheck selected events and check again after) and filter for your testing database. Now start the trace and run the above code again. You should see “exec sp_reset_connection” which indicates that connection is reset, in other words, it is used for the new connection.

I do not want to reuse connections
If you ever need NOT to reuse connections and get rid of sleeping connections, there is a way. All you have to do is, adding Pooling=False; to the connection string. This forces to remove the connection as soon as the close method is called. Though this is possible, it is NOT recommended.

How about ASP.NET Application?
As you have noticed, connections are removed from SQL Server once the Console Application is closed. Do not expect same from ASP.NET applications. Though you close the browser, connections are maintained with sleeping status until the connection lifetime is expired (correct me if I am wrong).

Connection removal
Connections in the pool are removed based on two factors. One is Connection Lifetime which can be set with the connection string. The default value is 0, means no specific time. Other factor is validity of the connection. If the connection in the pool does not communicate with server, it is marked as an invalid connection. The Connection pooler scans the pool periodically and sees for invalid connection. If found, they are released from the pool.

What if I create 101 connections without closing?
You might have seen, experienced this error;
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached

The default maximum number of connection can be added to the pool is 100 and this can be adjusted. Does it mean that we can create only 100 connections without closing (or with different connection strings)? Actually NO. Have this code in ASP.NET page and run. Then check with sp_who for number of connections with sleeping status.

       1: SqlConnection sqlConnection = new SqlConnection("Server=(local);Database=TestDatabase;UId=Test;Pwd=password;");

       2:  

       3: for (int x = 0; x < 110; x++)

       4: {

       5:     sqlConnection = new SqlConnection("Server=(local);Database=TestDatabase;UId=Test;Pwd=password;");

       6:     sqlConnection.Open();

       7:  

       8:     SqlCommand sqlCommand = new SqlCommand("select id from dbo.Employee", sqlConnection);

       9:     int id = (int)sqlCommand.ExecuteScalar();

      10: }

This code creates 110 connections and guess what? It works fine. When I check with SQL Server for Test login, I can see only 87 connections. If you change the 110 as something like 150, then definitely you get the error and you will see 100 connections in SQL Server. Just to make sure that the code is working fine, I started the trace and ran the code again. I could clearly see the new connections, and “exec sp_reset_connection” too. This means that though the connection is unusable (because we have not closed), it is reset for a new connection. The factor or criteria for this behavior is still unknown for me, will update the post if the reason is found. Appreciate if you can share your thoughts on this if reason is known to you.