Monday, August 14, 2017

Azure SQL Data Warehouse - Part II - Analyzing Created Databases

I started writing on this long time back but had no way of completing it. My first post on this was Azure SQL Data Warehouse - Part I - Creating and Connecting and here is the second part of it.

The first post discussed about Azure SQL Data Warehouse and how to create a server and database using the portal. As the next step, let's talk about the architecture of it bit and then see how Azure data warehouse maintains and processes data.

Control Node

Azure SQL Data Warehouse is a distributed database. It means that data is distributed in multiple locations. However, once the data warehouse is created, we will be connecting with one component called Control Node. It is not exactly a SQL Server database but when connecting to it, it looks and feels like connecting to a SQL Server Database. Control node handles all communication and computation. When we make a request to the data warehouse, Control node accepts it, determines the way it should be distributed based on divide and conquer approach, get it processed and finally send the result to us.

Compute Node

Control node get the data processed in parallel using Compute Nodes. They are SQL Server databases and store all our records. Based on the number of DWU configured, data warehouse is set with one or more Compute Nodes.

Distribution

Data related to the data warehouse is stored in Azure Blob Storage and distributed in multiple locations. It is independent from Compute Nodes, hence they can be operated/adjusted independently. These locations are called as Distributions. The number of distributions for an Azure SQL data warehouse is a fixed number that is sixty (60). These distributions are assigned dynamically to Compute Nodes and when a query is executed, each distribution processes data related to them. This is how the parallel execution happens.

If you need more compute power, you can increase the number of Compute Nodes by increasing DWUs. When the number of Compute Nodes are getting changed, the number of distributions per Compute Node is getting changed as well.

Architecture

This image shows the architecture when you create an Azure SQL Data Warehouse with 100 DWU.


This image shows the architecture when you create an Azure SQL Data Warehouse with 400 DWU.


Let's create two databases and clarify this configuration.

I have discussed all steps related to server creation and database creation in my first post that is  Azure SQL Data Warehouse - Part I - Creating and Connecting , hence I am not going to discuss the same here. Note the image below. That is how I have created two databases for checking the configurations.


As you see, the first data warehouse is created using 100 DWUs and second one with 400 DWUs. Let's see how nodes have been created for these two databases. We can simply use sys.dm_pdw_nodes DMV for getting this information.

  1. SELECT * FROM sys.dm_pdw_nodes;  
  2.   
  3. SELECT type, Count(*) NofNodes  
  4. FROM sys.dm_pdw_nodes  
  5. GROUP BY type;  

Result with the first data warehouse that is created using 100 DWUs.


Result with the second data warehouse that is created using 400 DWUs.


Note the second data warehouse. Since we used more DWUs, it has been created with four Compute Nodes that gives better performance than the first one. Since this is a sample database and it has tables, we can check one of the tables and see how data is distributed with distributions.

The following code shows the distributions created for one data warehouse. As mentioned above, it is always 60.


Here is the code for seeing how rows of a table are distributed in distributions with the second data warehouse created. Note how each distributions are assigned to Compute Nodes.


Records are distributed based on the design of the table. Azure SQL Data Warehouse uses two types of distributions: Round Robin and Hash distributions. Let's talk about it with the next post.

Friday, August 4, 2017

bcp error - Invalid column type from bcp client for colid 1 - Solution

You might be experiencing the following error with bcp when loading text data to SQL Server table;


It is possible to see this error when there is a data type mismatch between a column related to the destination table and a field coming from the file. However, there can be a situation where you have used a view for the bcp and underline table has been changed. If so, you may see the same error.

In my case, my destination table has an identity column, hence I created a view for getting data via bcp. I started seeing this error after making some modifications to the table. Once the view is dropped and recreated, it worked without any issue.

Thursday, August 3, 2017

Script for creating DimTime Dimension Table and populating data

The posts I made on date dimension table (Script for populating Date Dimension with Financial Year and Loading Data Script for Snowflake type Date Dimension) had a good response and few asked about the same for Time related table. This is one of the old scripts I used for populating Time Dimension, it maintains up to the second but if you need more, the script can be easily change.

Here is the code for creating the table and populating data.

  1. CREATE TABLE dbo.Time  
  2. (  
  3.  TimeKey int identity(1,1)   
  4.  , Constraint pk_Time primary key (TimeKey)  
  5.  , [Hoursmallint not null  
  6.  , [24Hour] smallint not null  
  7.  , [Minutesmallint not null  
  8.  , [Secondsmallint not null  
  9.  , [Timechar(10) not null  
  10.  , [24Time] char(8) not null  
  11.  , HourBucket char(9)  
  12. );  
  13. GO  
  14.   
  15. DECLARE @Hour smallint = 0  
  16. DECLARE @Minute smallint = 0  
  17. DECLARE @Second smallint = 0  
  18.   
  19. WHILE (@Hour < 24)  
  20. BEGIN  
  21.   
  22.  WHILE (@Minute < 60)  
  23.  BEGIN  
  24.   
  25.   WHILE (@Second < 60)  
  26.   BEGIN  
  27.   
  28.    INSERT INTO dbo.[Time]  
  29.     ([Hour], [24Hour], [Minute], [Second], [Time], [24Time], HourBucket)  
  30.    VALUES  
  31.     (CASE WHEN @Hour > 12 THEN @Hour - 12 ELSE @Hour END  
  32.     , @Hour, @Minute, @Second  
  33.     , REPLICATE('0', 2 - LEN(CASE WHEN @Hour > 12 THEN @Hour - 1 ELSE @Hour END)) + CONVERT(varchar(2), CASE WHEN @Hour > 12 THEN @Hour - 1 ELSE @Hour END)   
  34.      + ':' + REPLICATE('0', 2 - LEN(@Minute)) + CONVERT(varchar(2), @Minute)   
  35.      + ':' + REPLICATE('0', 2 - LEN(@Second)) + CONVERT(varchar(2), @Second)   
  36.      + CASE WHEN @Hour > 12 THEN 'PM' ELSE 'AM' END  
  37.     , REPLICATE('0', 2 - LEN(@Hour)) + CONVERT(varchar(2), @Hour)   
  38.      + ':' + REPLICATE('0', 2 - LEN(@Minute)) + CONVERT(varchar(2), @Minute)   
  39.      + ':' + REPLICATE('0', 2 - LEN(@Second)) + CONVERT(varchar(2), @Second)   
  40.     , '[' + REPLICATE('0', 2 - LEN(@Hour)) + CONVERT(varchar(2), @Hour)   
  41.      + ' - '   
  42.      + REPLICATE('0', 2 - LEN(@Hour+1)) + CONVERT(varchar(2), @Hour+1) + ']')  
  43.    SET @Second += 1;  
  44.   END  
  45.   SET @Second = 0;  
  46.   SET @Minute += 1;  
  47.  END  
  48.  SET @Minute = 0;  
  49.  SET @Hour += 1;  
  50. END  
  51. GO  
  52.   
  53. SELECT * FROM dbo.[Time];  


Wednesday, August 2, 2017

How to reset IDENTITY in Memory-Optimized Tables

Even though SEQUENCE object is available with much more flexibility, we still use IDENTITY property for adding sequence values to tables, specifically when we need to introduce a surrogate key. However, if you need the same with Memory-Optimized tables, you need to know certain things.

Can I add the IDENTITY property to Memory-Optimized tables?
Yes, it is possible but it should be always IDENTITY(1,1). You cannot use a different values for seed and increment, they should be always set as 1.

  1. CREATE TABLE dbo.MemoryOptimizedTable  
  2. (  
  3.  Id int IDENTITY(1,1) NOT NULL primary key nonclustered,  
  4.  CurrentDate datetime NULL default (getdate())  
  5. )WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA )  
  6. GO  
  7.   
  8. INSERT INTO dbo.MemoryOptimizedTable (CurrentDate) VALUES (DEFAULT);  
  9. GO 5  

Can I reset the IDENTITY seed using DBCC CHECKIDENT?
No, this is not supported. If you need to reset the IDENTITY seed, only way is inserting a new value explicitly by turning SET IDENTITY_INSERT on. As shown in the example, once the record with value 100 is inserted, the next value of the seed is set to 100+1.

  1. SET IDENTITY_INSERT dbo.MemoryOptimizedTable ON  
  2. GO  
  3. INSERT INTO dbo.MemoryOptimizedTable (Id, CurrentDate) VALUES (100, DEFAULT);  
  4. GO  
  5. SET IDENTITY_INSERT dbo.MemoryOptimizedTable OFF  
  6. GO  
  7.   
  8. INSERT INTO dbo.MemoryOptimizedTable (CurrentDate) VALUES (DEFAULT);  
  9. GO  
  10.   
  11. SELECT * FROM dbo.MemoryOptimizedTable;  


What if insert a lower value explicitly?
It is possible as long as it does not violate any rules. Look at the code below. It inserts a record with value 50 explicitly. But it does not mean that the seed is getting reset to 50+1. This does not happen because the last generated value is greater than 50+1. Therefore, the value of next record is 102, not 51.

  1. SET IDENTITY_INSERT dbo.MemoryOptimizedTable ON  
  2. GO  
  3. INSERT INTO dbo.MemoryOptimizedTable (Id, CurrentDate) VALUES (50, DEFAULT);  
  4. GO  
  5. SET IDENTITY_INSERT dbo.MemoryOptimizedTable OFF  
  6. GO  
  7.   
  8. INSERT INTO dbo.MemoryOptimizedTable (CurrentDate) VALUES (DEFAULT);  
  9. GO  
  10.   
  11. SELECT * FROM dbo.MemoryOptimizedTable;  


Tuesday, August 1, 2017

Script for populating Date Dimension with Financial Year

Once I published the code I used for creating date dimension tables and populating data. But it had not handled finance year related date elements. Generally we handle financial year elements with Date Table with Star Schema not with Snowflake Schema. And in most cases (or old days), we usually use OLAP Data Warehouse  as the source for reporting and Analysis Services can easily handle financial dates with Start Schema implemented in the Relational Data Warehouse. However, in modern world, we try to make the solution just using the Relation Data Warehouse without using Multi-dimensional models (or using Tabular Models), thought make a another script for handling financial dates with Snowflake Schema.

With this script, four tables are created: Year, Quarter, Month and Date. If need to see elements related to the financial year, the link between Year and Quarter should be set with Year.YearKey and Quarter.FinancialYearKey. For calendar date elements, the link should be between Year.YearKey and Quarter.YearKey.

This is how you make a search for financial dates;


This is how you make a search for Calendar dates;


Here is the script for creating tables;

  1. CREATE TABLE dbo.DimYear  
  2. (  
  3.  YearKey smallint   
  4.  , Year smallint not null  
  5.  , Constraint pk_DimYear Primary Key (YearKey)  
  6. );  
  7. GO  
  8.   
  9. CREATE TABLE dbo.DimQuarter  
  10. (  
  11.  QuarterKey smallint   
  12.  , YearKey smallint not null  
  13.  , FinancialYearKey smallint not null  
  14.  , Quarter smallint not null  
  15.  , YearQuater varchar(20) not null  
  16.  , QuarterName varchar(20) not null  
  17.  , FinancialQuarter smallint not null  
  18.  , FinancialYearQuarter varchar(20) not null  
  19.  , Constraint pk_DimQuarter Primary Key (QuarterKey)  
  20.  , Constraint fk_DimQuarter_DimYear Foreign Key (YearKey)  
  21.   References dbo.DimYear (YearKey)  
  22.  , Constraint fk_DimQuarter_DimYear_Financial Foreign Key (FinancialYearKey)  
  23.   References dbo.DimYear (YearKey)  
  24. );  
  25. GO  
  26.   
  27. CREATE TABLE dbo.DimMonth  
  28. (  
  29.  MonthKey int   
  30.  , QuarterKey smallint not null  
  31.  , MonthNumber smallint not null  
  32.  , MonthName varchar(20) not null  
  33.  , YearMonth varchar(20) not null  
  34.  , MonthShortName char(3) not null  
  35.  , FinancialMonthNumber smallint not null  
  36.  , Constraint pk_DimMonth Primary Key (MonthKey)  
  37.  , Constraint fk_DimMonth_DimQuarter Foreign Key (QuarterKey)  
  38.   References dbo.DimQuarter (QuarterKey)  
  39. );  
  40. GO  
  41.   
  42. CREATE TABLE dbo.DimDate  
  43. (  
  44.  DateKey int   
  45.  , MonthKey int not null  
  46.  , Date date not null  
  47.  , WeekDayNumber smallint not null  
  48.  , WeekDayName varchar(20) not null  
  49.  , DayOfMonth smallint not null  
  50.  , DayOfYear smallint not null  
  51.  , IsWeekend bit not null  
  52.  , IsHoliday bit not null  
  53.  , WeekNumberOfYear smallint not null  
  54.  , Constraint pk_DimDate Primary Key (DateKey)  
  55.  , Constraint fk_DimDate_DimMonth Foreign Key (MonthKey)  
  56.   References dbo.DimMonth (MonthKey)  
  57. );  
  58. GO  

Here is the script for populating dates;

  1. SET DATEFIRST 1;  
  2.   
  3. DECLARE @StartDate date = '1990-01-01';  
  4. DECLARE @EndDate date = '2019-12-31';  
  5. DECLARE @FinancialYearStartingQuarter smallint = 3; -- Starting from July, If it starts from April then 2  
  6.   
  7. DECLARE @YearKey smallint;  
  8. DECLARE @Quarter smallint, @QuarterKey smallint;  
  9. DECLARE @Month int, @MonthKey int;  
  10. DECLARE @Date int, @DateKey int;  
  11.   
  12. DECLARE @FinancialStartingMonth smallint;  
  13. SET @FinancialStartingMonth = CASE @FinancialYearStartingQuarter WHEN 1 THEN 1 WHEN 2 THEN 4 WHEN 3 THEN 7 WHEN 4 THEN 10 END  
  14.   
  15. INSERT INTO dbo.DimYear   
  16.  (YearKey, [Year])   
  17. VALUES   
  18.  (YEAR(@StartDate) - 1, YEAR(@StartDate) - 1);  
  19.   
  20.   
  21. WHILE (@StartDate <= @EndDate)  
  22. BEGIN  
  23.   
  24.  -- Inserting years  
  25.  SET @YearKey = YEAR(@StartDate);  
  26.  IF NOT EXISTS (SELECT * FROM dbo.DimYear WHERE YearKey = @YearKey)  
  27.   INSERT INTO dbo.DimYear (YearKey, [Year])   
  28.   VALUES   
  29.   (@YearKey, @YearKey);  
  30.   
  31.   
  32.  -- Inserting quarters  
  33.  SET @QuarterKey = Convert(smallintConvert(varchar(4), YEAR(@StartDate)) + Convert(varchar(1), DATEPART(q, @StartDate)))  
  34.  SET @Quarter = DATEPART(q, @StartDate);  
  35.   
  36.  IF NOT EXISTS (SELECT * FROM dbo.DimQuarter WHERE QuarterKey = @QuarterKey)  
  37.  INSERT INTO dbo.DimQuarter   
  38.   (QuarterKey, YearKey  
  39.   , FinancialYearKey  
  40.   , [Quarter], YearQuater, QuarterName  
  41.   , FinancialQuarter, FinancialYearQuarter)   
  42.  VALUES (@QuarterKey, @YearKey  
  43.   , CASE WHEN @Quarter < @FinancialYearStartingQuarter THEN @YearKey -1 ELSE @YearKey END  
  44.   , @Quarter, Convert(varchar(4), YEAR(@StartDate)) + 'Q' + Convert(varchar(4), @Quarter)  
  45.   , CASE @Quarter WHEN 1 THEN 'First Quarter' WHEN 2 THEN 'Second Quarter' WHEN 3 THEN 'Third Quarter' ELSE 'Forth Quarter' END  
  46.   , CASE @Quarter WHEN 1 THEN 3 WHEN 2 THEN 4 WHEN 3 THEN 1 WHEN 4 THEN 2 END   
  47.   , CASE @Quarter WHEN 1 THEN Convert(varchar(4), YEAR(@StartDate) - 1) + 'Q3'  
  48.       WHEN 2 THEN Convert(varchar(4), YEAR(@StartDate) - 1) + 'Q4'  
  49.       WHEN 3 THEN Convert(varchar(4), YEAR(@StartDate)) + 'Q1'  
  50.       WHEN 4 THEN Convert(varchar(4), YEAR(@StartDate)) + 'Q2'END  
  51.   );  
  52.     
  53.   
  54.  ---- Inserting months  
  55.  SET @MonthKey = Convert(intConvert(varchar(4), YEAR(@StartDate)) + (CASE LEN(MONTH(@StartDate)) WHEN 1 THEN '0' ELSE '' END) + Convert(varchar(2), MONTH(@StartDate)));  
  56.  SET @Month = MONTH(@StartDate)  
  57.  IF NOT EXISTS (SELECT * FROM dbo.DimMonth WHERE MonthKey = @MonthKey)  
  58.   INSERT INTO dbo.DimMonth   
  59.    (MonthKey, QuarterKey, MonthNumber, MonthName, YearMonth, MonthShortName  
  60.    , FinancialMonthNumber)   
  61.   VALUES   
  62.    (@MonthKey, @QuarterKey, @Month, DATENAME(MONTH, @StartDate)  
  63.    , Convert(varchar(4), YEAR(@StartDate)) + ' ' + LEFT(DATENAME(MONTH, @StartDate), 3)  
  64.    , LEFT(DATENAME(MONTH, @StartDate), 3)  
  65.    , CASE   
  66.      WHEN @FinancialStartingMonth = 1 THEN @Month  
  67.      WHEN @FinancialStartingMonth = 4 AND @Month  < @FinancialStartingMonth THEN @Month + 9  
  68.      WHEN @FinancialStartingMonth = 4 AND @Month  >= @FinancialStartingMonth THEN @Month - 3  
  69.      WHEN @FinancialStartingMonth = 7 AND @Month  < @FinancialStartingMonth THEN @Month + 6  
  70.      WHEN @FinancialStartingMonth = 7 AND @Month  >= @FinancialStartingMonth THEN @Month - 6  
  71.      WHEN @FinancialStartingMonth = 10 AND @Month  < @FinancialStartingMonth THEN @Month + 6  
  72.      WHEN @FinancialStartingMonth = 10 AND @Month  >= @FinancialStartingMonth THEN @Month + 3  
  73.     END  
  74.    );  
  75.     
  76.   ---- Inserting dates  
  77.   SET @DateKey = Convert(intConvert(varchar(8), @StartDate, 112))  
  78.   IF NOT EXISTS (SELECT * FROM dbo.DimDate WHERE DateKey = @DateKey)  
  79.    INSERT INTO dbo.DimDate (DateKey, MonthKey, Date, WeekDayNumber, WeekDayName, DayOfMonth  
  80.     , DayOfYear, IsWeekend, IsHoliday, WeekNumberOfYear)   
  81.    VALUES (@DateKey, @MonthKey, @StartDate, DatePart(WEEKDAY, @StartDate), DateName(WEEKDAY, @StartDate), DatePart(DAY, @StartDate)  
  82.     , DatePart(DAYOFYEAR, @StartDate), CASE WHEN DatePart(WEEKDAY, @StartDate) IN (6, 7) THEN 1 ELSE 0 END, 0  
  83.     , DatePart(WEEK, @StartDate));  
  84.   
  85.   SET @StartDate = DATEADD(dd, 1, @StartDate);  
  86. END;  
  87. GO