Sunday, August 25, 2013

Using GUID column as Clustered Index Key– SS SLUG Aug 2013 – Demo II

GUIDs are commonly used in distributed applications which require “uniqueness” across the entire world. Unfortunately I have seen the usage of GUIDs with clustered keys in non-distributed applications, where global uniqueness is not required. This was discussed in my presentation and showed how useful the GUIDs as clustered key as well as how it makes the index fragmented. This post is for the demo code related to the discussion.

Before going through the code, we must understand that GUIDs are not as bad as we think if it is managed well. You can make an uniqueidentifier column as PRIMARY KEY, as clustered key. Although it does not as efficient as int data type, it gives moderate efficiency. Let’s look at how this makes the clustered index fragmented and how it can be avoided.

USE tempdb
GO
 
-- create a table with uniqueidentifier
-- and make it as the clustered key
IF OBJECT_ID('dbo.GUID_Table') IS NOT NULL
    DROP TABLE dbo.GUID_Table
GO
CREATE TABLE dbo.GUID_Table 
(
    Id uniqueidentifier PRIMARY KEY
    , name char(2000)
)
GO
 
-- insert 100 records with default values
INSERT INTO dbo.GUID_Table
VALUES
    (NEWID(), 'a')
GO 100
 
 
SELECT * FROM sys.dm_db_index_physical_stats (
    DB_ID(), OBJECT_ID('dbo.GUID_Table'), NULL, NULL, 'DETAILED')

Once the last statement is run, you will see how fragmented your table is. For more info on fragmentation, please refer: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html.

image

Both external fragmentation (97%) and internal fragmentation (54%) are very high. The reason is, page splits  and records movement during insertions. Since GUIDs are not sequentially generated, record placement in pages is always an issue for SQL Server. What happen is, when a GUID is to be inserted as the key (entire record in this case), it looks for the page which record needs to be placed, and if no space in the page, it splits the page, moving 50% of records in the page to a new page, breaking the order of the pages which are ordered based on keys. Run the code below for seeing the linkage between pages.

DBCC IND (tempdb, [GUID_Table], -1)

image

As you see, SQL Server has to do many “read-back” for reading data sequentially, making all queries slowing down. The only way to avoid this with GUIDs is, use NEWSEQUENTIALID instead of NEWID. It generates GUIDs that are sequential to the last generated GUID. If Insertion is made using NEWSEQUENTIALID, external fragmentation will be lesser because of its sequential order on generation. Re-create the table and run the INSERT statement as below;

-- create table again
DROP TABLE dbo.GUID_Table 
GO
CREATE TABLE dbo.GUID_Table 
(
    Id uniqueidentifier PRIMARY KEY DEFAULT (NEWSEQUENTIALID())
    , name char(2000)
)
 
INSERT INTO dbo.GUID_Table
VALUES
    (DEFAULT, 'a')
GO 100
 
SELECT * FROM sys.dm_db_index_physical_stats (
    DB_ID('tempdb'), OBJECT_ID('GUID_Table'), NULL, NULL, 'DETAILED')
 
DBCC IND (tempdb, [GUID_Table], -1)

If you analyze PagePID and NextPagePID in DBCC IND result-set now, you will see how pages are ordered and no “read-backward” is needed. And the SELECT statement proves that no fragmentation has happened too. This clearly shows that with NEWSEQUENTIALID, split is not required as the value generated is always greater than the value exist. There are two key things not remember on it;

  • NEWSEQUENTIALID always generates a higher value greater than the one generated before by same server.
  • The uniqueness is limited to the server used only. Duplication can happen if values are generated with two servers.

No comments: