Saturday, December 26, 2015

GUID or Uniqueidentifier as the key - I - Issues with it

While analyzing a database related to one of the companies, I noticed that the key of most tables is based on GUID (uniqueidentifier) that drastically hinders the performance unless it is managed properly. Unfortunately, as per my experience in consultancy, 90% of databases that use GUID as the key for tables are not properly managed and many are unaware of the damages it does though they experience poor performance on data manipulation and retrieval.

Most reason out that they need GUID as the key because of a need of an uniqueness among records specifically when records are joined from multiple companies or regions. But with many cases, it can be handled without using Unqiueidentifier but using a composite key that is formed using an integer value and a code that represents the company or region. Understanding the cost of it and understanding how it slows down your queries will surely guide you to take the right decision, hence here is a post on it.

Let's try to understand the most common issue related to GUID as the key. Following code creates a table called Sales_WithGUID and loads data from AdventureWorks. The key of the table, which is SalesKey is a GUID. Note that I do not load records as a batch, but load one by one simulating the real operation.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithGUID;
GO

-- Creating the table
CREATE TABLE dbo.Sales_WithGUID
(
 SalesKey uniqueidentifier PRIMARY KEY DEFAULT(NewID()),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

-- Loading records one by one
DECLARE cur CURSOR 
FOR
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  FROM AdventureWorksDW2014.dbo.FactInternetSales;

DECLARE @ProductKey int 
  , @OrderDateKey int
  , @DueDateKey int
  , @ShipDateKey int 
  , @CustomerKey int 
  , @PromotionKey int 
  , @CurrencyKey int 
  , @SalesTerritoryKey int 
  , @SalesOrderNumber nvarchar(20) 
  , @SalesOrderLineNumber tinyint 
  , @RevisionNumber tinyint 
  , @OrderQuantity smallint 
  , @UnitPrice money 
  , @ExtendedAmount money 
  , @UnitPriceDiscountPct float 
  , @DiscountAmount float 
  , @ProductStandardCost money 
  , @TotalProductCost money 
  , @SalesAmount money 
  , @TaxAmt money 
  , @Freight money 
  , @CarrierTrackingNumber nvarchar(25) 
  , @CustomerPONumber nvarchar(25)  
  , @OrderDate datetime  
  , @DueDate datetime  
  , @ShipDate datetime

OPEN cur

FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_WithGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
 VALUES
  (@ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
    , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
    , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
    , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
    , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate)

 FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate
END

-- Cleaning
CLOSE cur
DEALLOCATE cur

Let's create another table called Sales_WithoutGUID and loads the same. Note that its primary key is a combination of int and char column. This is what I explained above and suggested to the company: using two columns as a composite primary key rather than using uniqueidentifier. Using this composite key, company can be identified and records can be uniquely identified.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithoutGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithoutGUID;
GO

-- Create the second table
CREATE TABLE dbo.Sales_WithoutGUID
(
 SalesKey int identity(1,1),
 CompanyCode char(2) Default ('SL'),
 constraint pk_Sales_WithoutGUID primary key (SalesKey, CompanyCode),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

-- Loading
DECLARE cur CURSOR 
FOR
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  FROM AdventureWorksDW2014.dbo.FactInternetSales;

DECLARE @ProductKey int 
  , @OrderDateKey int
  , @DueDateKey int
  , @ShipDateKey int 
  , @CustomerKey int 
  , @PromotionKey int 
  , @CurrencyKey int 
  , @SalesTerritoryKey int 
  , @SalesOrderNumber nvarchar(20) 
  , @SalesOrderLineNumber tinyint 
  , @RevisionNumber tinyint 
  , @OrderQuantity smallint 
  , @UnitPrice money 
  , @ExtendedAmount money 
  , @UnitPriceDiscountPct float 
  , @DiscountAmount float 
  , @ProductStandardCost money 
  , @TotalProductCost money 
  , @SalesAmount money 
  , @TaxAmt money 
  , @Freight money 
  , @CarrierTrackingNumber nvarchar(25) 
  , @CustomerPONumber nvarchar(25)  
  , @OrderDate datetime  
  , @DueDate datetime  
  , @ShipDate datetime

OPEN cur

FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_WithoutGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
 VALUES
  (@ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
    , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
    , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
    , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
    , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate)

 FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate
END

-- Cleaning
CLOSE cur
DEALLOCATE cur

Now let's analyze and see. The below code has two SELECT statements that check the fragmentation of both tables. As you see, first table is Externally Fragmented and second table is not fragmented. This is the biggest and most common issue with GUIDs. Since there is no sequential values when generating, when inserting records, in order to maintain the clustered key order, SQL Server splits data pages for moving records to new page, causing External Fragmentation (Read more on fragmentation at: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html). Because of this, reading causes accessing larger number of data pages, slowing down the query and consuming resource.

Other issue is the space required. As you see, second table requires only 10MB for maintaining records while first table requires 15MB.

There is another considerable impact on non-clustered indexes if you, will make another post on it.

As you see the disadvantages with GUID now, try to minimize the usage of it if possible. There is a workaround if the order of the key is only the issue. The function NEWSEQUENTIALID can be used instead of NEWID for generating GUIDs but still the second issue discussed is remained unsorted.

SELECT * FROM sys.dm_db_index_physical_stats 
 (DB_ID(), OBJECT_ID(N'dbo.Sales_WithGUID', N'U'), 1, NULL, 'DETAILED')
SELECT * FROM sys.dm_db_index_physical_stats 
 (DB_ID(), OBJECT_ID(N'dbo.Sales_WithoutGUID', N'U'), 1, NULL, 'DETAILED')

EXEC sp_spaceused N'Sales_WithGUID';
EXEC sp_spaceused N'Sales_WithoutGUID';


No comments: