Dinesh's Blog :::: Being Compiled ::::: August 2013

Tuesday, August 27, 2013

Understanding Index INCLUDED COLUMNS - SS SLUG Aug 2013 – Demo III

Indexes are added mostly based on WHERE conditions used with queries execute frequently. Once the index is added, it speeds up the search on conditions added but does not improve the speed of loading values related to non-indexed columns. Adding non-indexed columns to the index and making all queries as part of the index is considered as Covering Index Pattern. However this post is not about Covering Index Pattern but something similar to it, called as Included Columns Pattern.

This was discussed in August user group meeting. This code is related to one of demos.

For testing purposes, I have used one of Microsoft sample databases which has millions of records. It is called ContosoRetailDW and it can be downloaded at: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=18279

Let’s create a new table using one of existing tables and add a clustered index on it.

USE ContosoRetailDW

GO

SELECT *

INTO dbo.FactSalesNew

FROM  dbo.FactSales

CREATE UNIQUE CLUSTERED INDEX IX_FactSalesNew ON dbo.FactSalesNew (SalesKey)

GO

Enable “Actual Execution Plan” and run the query below.

SET STATISTICS IO ON

SELECT SalesKey, SalesAmount, ReturnAmount, DiscountAmount

FROM dbo.FactSalesNew f

    INNER JOIN dbo.DimDate d

        ON d.Datekey = f.DateKey

            AND CalendarYear = 2007

WHERE ReturnAmount BETWEEN 1000 AND 1500

SET STATISTICS IO OFF

Have a look on pages read for loading the resultset into memory and have a look on execution plan.

This table has 3 million records and they are held using 52491 pages. As you see with the first result, SQL Server reads all pages for searching records (Clustered Index Scan) and returns only 2061 records. Let’s try to decrease the number of pages to be read by adding an index on ReturnAmount column.

CREATE INDEX IX_FactSalesNew_ReturnAmount ON dbo.FactSalesNew (ReturnAmount)

Once added, run the same query and see the result.

SQL Server uses the index added and number of pages read is 19930 now. It has improved the overall operation. However, can we improve little bit more? Yes, it is possible with included columns. You notice with the plan that even though there is an index, SQL Server still reads the clustered index because all other columns except RetuenAmount are not exist with the index. What if we include other columns into same index? See the below query.

DROP INDEX IX_FactSalesNew_ReturnAmount ON dbo.FactSalesNew

GO

CREATE INDEX IX_FactSalesNew_ReturnAmount ON dbo.FactSalesNew (ReturnAmount)

    INCLUDE (DateKey, SalesAmount, DiscountAmount)

As you see, other columns used in the query are added to index now using INCLUDE clause. Remember, they have not been added as key columns, hence they cannot be used for searching. If a query is executed with “WHERE SalesAmount BETWEEN 1000 AND 1500”, SQL Server cannot use the index for searching. This is the key difference between Covering Index Pattern and Included Columns Patter. See the result of the SELECT query now.

Less number of pages to be read and no need to go through the clustered index. In this case, improvement is very high, however everything has pros and cons. Since additional columns are maintained in the index, remember that cost is involved with this when updating records.

Sunday, August 25, 2013

Using GUID column as Clustered Index Key– SS SLUG Aug 2013 – Demo II

GUIDs are commonly used in distributed applications which require “uniqueness” across the entire world. Unfortunately I have seen the usage of GUIDs with clustered keys in non-distributed applications, where global uniqueness is not required. This was discussed in my presentation and showed how useful the GUIDs as clustered key as well as how it makes the index fragmented. This post is for the demo code related to the discussion.

Before going through the code, we must understand that GUIDs are not as bad as we think if it is managed well. You can make an uniqueidentifier column as PRIMARY KEY, as clustered key. Although it does not as efficient as int data type, it gives moderate efficiency. Let’s look at how this makes the clustered index fragmented and how it can be avoided.

USE tempdb

GO

-- create a table with uniqueidentifier

-- and make it as the clustered key

IF OBJECT_ID('dbo.GUID_Table') IS NOT NULL

    DROP TABLE dbo.GUID_Table

GO

CREATE TABLE dbo.GUID_Table

    Id uniqueidentifier PRIMARY KEY

    , name char(2000)

GO

-- insert 100 records with default values

INSERT INTO dbo.GUID_Table

VALUES

    (NEWID(), 'a')

GO 100

SELECT * FROM sys.dm_db_index_physical_stats (

    DB_ID(), OBJECT_ID('dbo.GUID_Table'), NULL, NULL, 'DETAILED')

Once the last statement is run, you will see how fragmented your table is. For more info on fragmentation, please refer: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html.

Both external fragmentation (97%) and internal fragmentation (54%) are very high. The reason is, page splits and records movement during insertions. Since GUIDs are not sequentially generated, record placement in pages is always an issue for SQL Server. What happen is, when a GUID is to be inserted as the key (entire record in this case), it looks for the page which record needs to be placed, and if no space in the page, it splits the page, moving 50% of records in the page to a new page, breaking the order of the pages which are ordered based on keys. Run the code below for seeing the linkage between pages.

DBCC IND (tempdb, [GUID_Table], -1)

As you see, SQL Server has to do many “read-back” for reading data sequentially, making all queries slowing down. The only way to avoid this with GUIDs is, use NEWSEQUENTIALID instead of NEWID. It generates GUIDs that are sequential to the last generated GUID. If Insertion is made using NEWSEQUENTIALID, external fragmentation will be lesser because of its sequential order on generation. Re-create the table and run the INSERT statement as below;

-- create table again

DROP TABLE dbo.GUID_Table

GO

CREATE TABLE dbo.GUID_Table

    Id uniqueidentifier PRIMARY KEY DEFAULT (NEWSEQUENTIALID())

    , name char(2000)

INSERT INTO dbo.GUID_Table

VALUES

    (DEFAULT, 'a')

GO 100

SELECT * FROM sys.dm_db_index_physical_stats (

    DB_ID('tempdb'), OBJECT_ID('GUID_Table'), NULL, NULL, 'DETAILED')

DBCC IND (tempdb, [GUID_Table], -1)

If you analyze PagePID and NextPagePID in DBCC IND result-set now, you will see how pages are ordered and no “read-backward” is needed. And the SELECT statement proves that no fragmentation has happened too. This clearly shows that with NEWSEQUENTIALID, split is not required as the value generated is always greater than the value exist. There are two key things not remember on it;

NEWSEQUENTIALID always generates a higher value greater than the one generated before by same server.
The uniqueness is limited to the server used only. Duplication can happen if values are generated with two servers.

Saturday, August 24, 2013

Understanding Index Fragmentation – SS SLUG Aug 2013 – Demo I

Index fragmentation is something we need to monitor frequently specifically on databases that are heavily indexed. If indexes added are fragmented, you will not get the expected performance from indexes. This is one of the areas discussed with last user group meeting, here is the sample codes that shows how indexes are getting fragmented.

Run below code to create a sample table with values.

USE tempdb

GO

-- creating test table

IF OBJECT_ID(N'dbo.TestTable', 'U') IS NOT NULL

BEGIN

DROP TABLE dbo.TestTable

END

GO

CREATE TABLE TestTable (Id int IDENTITY(1,1) PRIMARY KEY, [Value] varchar(900))

GO

-- Inserting sample values

DECLARE @a int, @b int

SET @a = 65

WHILE (@a < 91)

BEGIN

    SET @b = 0

    WHILE (@b < 20)

    BEGIN

        INSERT INTO TestTable ([Value])

        SELECT REPLICATE(CHAR(@a), 445) + CONVERT(VARCHAR(10), @b)

        SET @b = @b + 1

END

    SET @a = @a + 2

END

GO

-- See the values inserted, [Value] column contains

-- values like AAAA..., CCCC..., EEEE..., etc

SELECT * FROM dbo.TestTable ORDER BY Id

-- making an index on [Value] column

CREATE INDEX IX_TestTable ON dbo.TestTable([Value])

GO

-- checking for internal fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'), 2, NULL, 'DETAILED')

The last statement which is DMF used to check the fragementation of indexes. Two columns used for checking the fragmentation, avg_fragmentation_in_percent for External fragmentation and avg_page_space_used_in_percent for Internal fragmentation.

Here are few points discussed regarding fragmentation;

Internal fragmentation
Inefficient use of pages within an index because the amount of data stored within each page is less than the data page can contain.
- <= 30% reorganize index
- > 30% rebuild index
External fragmentation (Logical and Extent)
Inefficient use of pages within an index because the logical order of the page is wrong.
- <= 75% and >= 60% reorganize index
- < 60% rebuild index

As per result, we do not need to worry much on fragmentation. SQL Server uses 16 pages for holding 260 records (averagely 16 records per page). For testing purposes, let’s fragment the index . Here is the code;

-- reducing the zize of the index key for some records

UPDATE dbo.TestTable

SET [Value] = LEFT([Value], 1)

WHERE Id % 2 = 0

-- checking for internal fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'), 2, NULL, 'DETAILED')

And the result is;

As you see, since we removed some characters from Value column, pages have space now, only 47% of space has been used. You really do not need 16 pages for holding 260 records now but SQL Server still holds records in 260 pages which is not efficient, making more IO and using more memory for your queries, decreasing the performance of them. Rebuilding the index sorts this out;

-- removing the fragmentation

ALTER INDEX IX_TestTable ON dbo.TestTable REBUILD

What you have to remember is, update like above makes your indexes fragmented, hence try to avoid such updates, if performed, make sure you check for fragmentation and defragment if required.

Let’s start from the beginning for understanding External fragmentation. It is all about breaking the order of the records. Run the below code for understanding the current order. Make sure you run the first code segment above (table creation and inserting)again before running this.

-- checking for external fragmentation

SELECT * FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(N'dbo.TestTable', N'U'),2, NULL, 'DETAILED')

-- run this undocumented command to see the NextPageID

DBCC IND (tempdb, 'TestTable', 2)

Here are the results;

Look at the first result, it says that External fragmentation of the clustered index is 18.5% which is okay. Next result is from DBCC IND command (For more info on DBCC IND, refer: http://dinesql.blogspot.com/2013/08/dbcc-ind.html) shows how pages are linked. Focus on PagePID and NextPagePID columns, and PageType column. Page type 2 represents index pages. Index level 0 represents leaf pages. This says that next page of 312is 314 and next page of 314 is 316, which is a perfect order.

Let’s make the table externally fragmented. Run the code below;

-- inserting new values such as

-- Bs, Ds, Fs, etc....

-- Making page-splits

DECLARE @a INT

SET @a = 66

WHILE (@a < 86)

BEGIN

    INSERT INTO TestTable ([Value])

    SELECT REPLICATE(CHAR(@a), 445) + CONVERT(VARCHAR(10), @a)

    SET @a = @a + 2

END

GO

When records are inserted in between existing values (B has to be inserted between A and C), SQL Server needs to check and see the space availability of the correct page. If no space available, page-split occurs and 50% of the records are moved to a new page. This breaks the order of pages. Run the sys.dm_db_index_physical_stats and DBCC IND again and see;

We can clearly see what has happened to the order of pages, it is broken, it is fragmented. When you query data from this table (for example: data from pages 312, 341, 50699, 316), it has to read forward (which is fine) and read back (which is very costly) making more IOs and slowing down the generation of the resultset. If you rebuild the index, this can be fixed, however what you have to remember is, not to make your indexes externally fragmented from your actions, making query performance very poor. Here are some points you need to consider when dealing with index keys. These reduce external fragmentation.

Select a non-volatile columns for the keys.
Make sure column values are increasing values.

Presentation related to this demo is available at: SQL Server Universe.

DBCC IND

DBCC IND command is one of the important commands used when indexes are analyzed. Specifically this can be used for seeing the linkage between pages associated with a given table. It requires three parameters;

DBCC IND ( {‘db_name’ | ‘db_id’}, {‘table_name’ | ‘table_id’}, {‘index_name’ | ‘index_id’ | –1}

db_name | ‘db_id – requires database name or id. If 0 or ‘’ passed, current database will be used.
table_name | table_id – requires table name or object id of the table.
index_name | index_id | –1 – requires index id of the table. If –1 is used, result is generated for all the indexes.

Here is brief on output of IND command.

PageFID	File number where the page is located
PagePID	Page number for the page
IAMFID	File ID where the IAM page is located
IAMPID	Page ID for the page in the data file
ObjectID	Object ID for the associated table
IndexID	Index ID associated with the heap or index
PartitionNumber	Partition number for the heap or index
PartitionID	Partition ID for the heap or index
iam_chain_type	he type of IAM chain the extent is used for. Values can be in-row data, LOB data, and overflow data.
PageType	Number identifying the page type; 1 - Data page 2 - Index page 3 - Large object page 4 - Large object page 8 - Global Allocation Map page 9 - Share Global Allocation Map page 10 - Index Allocation Map page 11 - Page Free Space page 13 - Boot page 15 - File header page 16 - Differential Changed Map page 17 - Bulk Changed Map page
IndexLevel	Level at which the page exists in the page organizational structure. The levels are organized from 0 to N, where 0 is the lowest level of the index and N is the index root
NextPageFID	File number where the next page at the index level is located
NextPagePID	Page number for the next page at the index level
PrevPageFID	File number where the previous page at the index level is located
PrevPagePID	Page number for the previous page at the index level

Here is an example for running the command;

DBCC IND (tempdb, 'TestTable', 2)