Monday, October 24, 2016

Disable PRIMARY KEY before data loading

In order to improve the loading performance, specifically when a data warehouse or a large table is loaded, it is recommended to disable constraints such as PRIMARY KEY, UNIQUE KEY and CHECK and indexes if the data is already cleansed and validated. Although we heavily practice this in data warehousing, we do have some limitations on OLTP databases because the way we structure tables is different from the way we do with data warehouses.

Can we disable primary key of the table and continue with data loading? The answer is, Yes and No.

Remember, when you mark one of your table columns as the primary key, it creates a Clustered Index for the column you picked. If the Clustered Index is already created (before creating the primary key), then it creates a Non-Clustered Index. If the primary key is created with a Clustered Index and if you disable it, table will not be able to accessed because the main structure is disabled. However, if the primary key is created with a non-clustered index, you can disable it and still work with it.

  1. USE tempdb;  
  2. GO  
  3.   
  4. -- create a table with a primary key which is clustered  
  5. CREATE TABLE dbo.Customer  
  6. (  
  7.  CustomerID int identity(1,1) not null  
  8.  , FirstName varchar(100) not null  
  9.  , Constraint PK_Customer PRIMARY KEY CLUSTERED (CustomerID)  
  10. );  
  11.   
  12. -- create a table with a primary key which is nonclustered  
  13. CREATE TABLE dbo.Employee  
  14. (  
  15.  EmployeeID int identity(1,1) not null  
  16.  , FirstName varchar(100) not null  
  17.  , Constraint PK_Employee PRIMARY KEY NONCLUSTERED (EmployeeID)  
  18. );  
  19.   
  20. -- Insert some records  
  21. INSERT INTO dbo.Customer  
  22.  (FirstName) VALUES ('Dinesh'), ('Yeshan');  
  23. INSERT INTO dbo.Employee  
  24.  (FirstName) VALUES ('Jane'), ('Jack');  
  25.   
  26. -- checking records  
  27. -- This should return all recorded we inserted  
  28. SELECT * FROM dbo.Customer;  
  29. SELECT * FROM dbo.Employee;  
  30.   
  31. -- Disable the primary key of the Customer  
  32. -- by disabling associated index  
  33. ALTER INDEX PK_Customer ON dbo.Customer DISABLE;  
  34.   
  35. -- Now following statements are not possible  
  36. INSERT INTO dbo.Customer  
  37.  (FirstName) VALUES ('Kate'), ('Mihen');  
  38. SELECT * FROM dbo.Customer;  
  39.   
  40. -- Disable the primary key of the Employee  
  41. -- by disabling the associated index  
  42. ALTER INDEX PK_Employee ON dbo.Employee DISABLE;  
  43.   
  44. -- These statements work without any issue  
  45. INSERT INTO dbo.Employee  
  46.  (FirstName) VALUES ('Nazir'), ('Daniel');  
  47. SELECT * FROM dbo.Employee;  
  48.   
  49. -- Enabling both  
  50. ALTER INDEX PK_Customer ON dbo.Customer REBUILD;  
  51. ALTER INDEX PK_Employee ON dbo.Employee REBUILD;  

Same goes to Unique Constraint as well.

Note that this is possible with data warehouse because, as a best practice, we do not (always, there are exceptions) make the column marked as primary key as the clustered key. For an example, we may create a primary key using all foreign keys columns but we create the clustered index using DataKey column.

No comments: