Wednesday, March 8, 2017

Adding a Hash column using HASHBYTES based on all columns to all tables

We use either Checksum or Hashbytes for generating a value for finding changes of records when need to transfer records from one source to another and changes cannot be identified at the source end. This is specially used in data warehousing. I have written two posts on it, you can read them for getting an idea on it;

I had to implement similar with another database but it was not at the design stage. The database is already developed and being used, and it was over 1TB. The requirement was, adding a column that has Hash based on all existing columns. Since there were more than 300 tables, it was not practical to open the Design of the table and add the column. The only option I had was, form a dynamic query that alters the table by adding the column. I had to consider few things with the implementation;
  • All tables should be altered.
  • All columns should be used for generating the Hash.
  • Tables that have records must set Hash immediately.
  • Null must be replaced with blank because Hashbytes does not accept nulls.
Considering all, I wrote the following code for altering tables. You may use the same if you have the same requirement;

-- Getting table names into a table
-- A temporary table or table variable can be used for this
SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
INTO dbo.TableNames
 SELECT TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY, t.object_id) AS t;

-- Adding a clustered index
-- This is not required if the nof tables is low
CREATE CLUSTERED INDEX IX_TableNames ON dbo.TableNames (Id);

DECLARE @Id int = 1;
DECLARE @LastId int = 0;
DECLARE @TableName varchar(500)
DECLARE @TableId int
DECLARE @Sql varchar(max)
DECLARE @Columns varchar(8000)

SELECT @LastId = COUNT(*) FROM dbo.TableNames;

-- Iterate through all tables
WHILE (@Id <= @LastId)

 SELECT @TableName = TableName, @TableId = TableId FROM dbo.TableNames WHERE Id = @Id;
 SET @Sql = 'ALTER TABLE dbo.' + @TableName;
 SET @Sql += ' ADD ';
 SET @Sql += ' MIG_HashKey AS HASHBYTES(''MD5'', ';
 -- get all columns, convert them to varchar
 -- and replace null with blank value
 SELECT @Columns = STUFF((SELECT '+ IsNull(Convert(varchar(4000), ' + name + '),'''')' FROM sys.columns WHERE object_id = @TableId FOR XML PATH ('')), 1, 2, '');

 SET @Sql += @Columns;
 SET @Sql += ') ';

 -- Execute the code
  EXEC ( @sql);

  PRINT @Sql;

 SET @Sql = '';
 SET @Columns = ''
 SET @Id += 1; 

No comments: