Wednesday, March 8, 2017

Adding a Hash column using HASHBYTES based on all columns to all tables

We use either Checksum or Hashbytes for generating a value for finding changes of records when need to transfer records from one source to another and changes cannot be identified at the source end. This is specially used in data warehousing. I have written two posts on it, you can read them for getting an idea on it;


I had to implement similar with another database but it was not at the design stage. The database is already developed and being used, and it was over 1TB. The requirement was, adding a column that has Hash based on all existing columns. Since there were more than 300 tables, it was not practical to open the Design of the table and add the column. The only option I had was, form a dynamic query that alters the table by adding the column. I had to consider few things with the implementation;
  • All tables should be altered.
  • All columns should be used for generating the Hash.
  • Tables that have records must set Hash immediately.
  • Null must be replaced with blank because Hashbytes does not accept nulls.
Considering all, I wrote the following code for altering tables. You may use the same if you have the same requirement;

  1. -- Getting table names into a table  
  2. -- A temporary table or table variable can be used for this  
  3. SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords   
  4. INTO dbo.TableNames  
  5. FROM  
  6.  (  
  7.  SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords   
  8.  FROM sys.partitions p  
  9.   INNER JOIN sys.tables t  
  10.    ON p.object_id = t.object_id  
  11.  WHERE p.index_id < 2 AND t.type = 'U'  
  12.  GROUP BY t.name, t.object_id) AS t;  
  13.   
  14. -- Adding a clustered index  
  15. -- This is not required if the nof tables is low  
  16. CREATE CLUSTERED INDEX IX_TableNames ON dbo.TableNames (Id);  
  17. GO  
  18.   
  19. DECLARE @Id int = 1;  
  20. DECLARE @LastId int = 0;  
  21. DECLARE @TableName varchar(500)  
  22. DECLARE @TableId int  
  23. DECLARE @Sql varchar(max)  
  24. DECLARE @Columns varchar(8000)  
  25.   
  26. SELECT @LastId = COUNT(*) FROM dbo.TableNames;  
  27.   
  28. -- Iterate through all tables  
  29. WHILE (@Id <= @LastId)  
  30. BEGIN  
  31.   
  32.  SELECT @TableName = TableName, @TableId = TableId FROM dbo.TableNames WHERE Id = @Id;  
  33.    
  34.  SET @Sql = 'ALTER TABLE dbo.' + @TableName;  
  35.  SET @Sql += ' ADD ';  
  36.  SET @Sql += ' MIG_HashKey AS HASHBYTES(''MD5'', ';  
  37.    
  38.  -- get all columns, convert them to varchar  
  39.  -- and replace null with blank value  
  40.  SELECT @Columns = STUFF((SELECT '+ IsNull(Convert(varchar(4000), ' + name + '),'''')' FROM sys.columns WHERE object_id = @TableId FOR XML PATH ('')), 1, 2, '');  
  41.   
  42.  SET @Sql += @Columns;  
  43.  SET @Sql += ') ';  
  44.   
  45.  -- Execute the code  
  46.  BEGIN TRY  
  47.   EXEC ( @sql);  
  48.  END TRY  
  49.  BEGIN CATCH  
  50.   
  51.   PRINT ERROR_MESSAGE()  
  52.   PRINT @Sql;  
  53.  END CATCH  
  54.   
  55.  SET @Sql = '';  
  56.  SET @Columns = ''  
  57.  SET @Id += 1;   
  58. END  
  59. GO  

No comments: