Dinesh's Blog :::: Being Compiled ::::: March 2017

Saturday, March 25, 2017

Loading Data Script for Snowflake type Date Dimension

Update: If you need the script for the same with financial year elements, read the new post.

Date dimension is the most common dimension in data warehousing and it is generally designed as a single table. However, if it needs to be used with multiple Fact Tables with different levels, Date dimension table should be broken into multiple tables that creates a Snowflake Schema. If the table is broken into multiple tables such as Year, Quarter, Month and Date, loading them is bit difficult. A script for loading a single table is available in many sites but it is bit difficult to find a script for loading multiple tables. I had a same requirement and could not find a proper script, hence wrote one.

Here is the script if you need the same;

Here is the script for creating tables;

CREATE TABLE dbo.DimYear
(
 YearKey smallint 
 , Year smallint not null
 , Constraint pk_DimYear Primary Key (YearKey)
);
GO

CREATE TABLE dbo.DimQuarter
(
 QuarterKey smallint 
 , YearKey smallint not null
 , Quarter smallint not null
 , YearQuater varchar(20) not null
 , QuarterName varchar(20) not null
 , Constraint pk_DimQuarter Primary Key (QuarterKey)
 , Constraint fk_DimQuarter_DimYear Foreign Key (YearKey)
  References dbo.DimYear (YearKey)
);
GO

CREATE TABLE dbo.DimMonth
(
 MonthKey int 
 , QuarterKey smallint not null
 , MonthNumber smallint not null
 , MonthName varchar(20) not null
 , YearMonth varchar(20) not null
 , MonthShortName char(3) not null
 , Constraint pk_DimMonth Primary Key (MonthKey)
 , Constraint fk_DimMonth_DimQuarter Foreign Key (QuarterKey)
  References dbo.DimQuarter (QuarterKey)
);
GO

CREATE TABLE dbo.DimDate
(
 DateKey int 
 , MonthKey int not null
 , Date date not null
 , WeekDayNumber smallint not null
 , WeekDayName varchar(20) not null
 , DayOfMonth smallint not null
 , DayOfYear smallint not null
 , IsWeekend bit not null
 , IsHoliday bit not null
 , WeekNumberOfYear smallint not null
 , Constraint pk_DimDate Primary Key (DateKey)
 , Constraint fk_DimDate_DimMonth Foreign Key (MonthKey)
  References dbo.DimMonth (MonthKey)
);
GO

And the script for loading all tables.

SET DATEFIRST 1;

DECLARE @StartDate date = '2010-01-01';
DECLARE @EndDate date = '2019-12-31';
DECLARE @YearKey smallint;
DECLARE @Quarter smallint, @QuarterKey smallint;
DECLARE @Month int, @MonthKey int;
DECLARE @Date int, @DateKey int;

WHILE (@StartDate <= @EndDate)
BEGIN

 -- Inserting years
 SET @YearKey = YEAR(@StartDate);
 IF NOT EXISTS (SELECT * FROM dbo.DimYear WHERE YearKey = @YearKey)
  INSERT INTO dbo.DimYear (YearKey, [Year]) VALUES (@YearKey, @YearKey);


 -- Inserting quarters
 SET @QuarterKey = Convert(smallint, Convert(varchar(4), YEAR(@StartDate)) + Convert(varchar(1), DATEPART(q, @StartDate)))
 SET @Quarter = DATEPART(q, @StartDate);

 IF NOT EXISTS (SELECT * FROM dbo.DimQuarter WHERE QuarterKey = @QuarterKey)
  INSERT INTO dbo.DimQuarter (QuarterKey, YearKey, [Quarter], YearQuater, QuarterName) 
  VALUES (@QuarterKey, @YearKey, @Quarter, Convert(varchar(4), YEAR(@StartDate)) + 'Q' + Convert(varchar(4), @Quarter)
   , CASE @Quarter WHEN 1 THEN 'First Quarter' WHEN 2 THEN 'Second Quarter' WHEN 3 THEN 'Third Quarter' ELSE 'Forth Quarter' END);
  

 ---- Inserting months
 SET @MonthKey = Convert(int, Convert(varchar(4), YEAR(@StartDate)) + (CASE LEN(MONTH(@StartDate)) WHEN 1 THEN '0' ELSE '' END) + Convert(varchar(2), MONTH(@StartDate)));
 SET @Month = MONTH(@StartDate)
 IF NOT EXISTS (SELECT * FROM dbo.DimMonth WHERE MonthKey = @MonthKey)
  INSERT INTO dbo.DimMonth (MonthKey, QuarterKey, MonthNumber, MonthName, YearMonth, MonthShortName) 
  VALUES (@MonthKey, @QuarterKey, @Month, DATENAME(MONTH, @StartDate)
  , Convert(varchar(4), YEAR(@StartDate)) + ' ' + LEFT(DATENAME(MONTH, @StartDate), 3)
  , LEFT(DATENAME(MONTH, @StartDate), 3));
  
 ---- Inserting dates
 SET @DateKey = Convert(int, Convert(varchar(8), @StartDate, 112))
 IF NOT EXISTS (SELECT * FROM dbo.DimDate WHERE DateKey = @DateKey)
  INSERT INTO dbo.DimDate (DateKey, MonthKey, Date, WeekDayNumber, WeekDayName, DayOfMonth
   , DayOfYear, IsWeekend, IsHoliday, WeekNumberOfYear) 
  VALUES (@DateKey, @MonthKey, @StartDate, DatePart(WEEKDAY, @StartDate), DateName(WEEKDAY, @StartDate), DatePart(DAY, @StartDate)
   , DatePart(DAYOFYEAR, @StartDate), CASE WHEN DatePart(WEEKDAY, @StartDate) IN (6, 7) THEN 1 ELSE 0 END, 0
   , DatePart(WEEK, @StartDate));

 SET @StartDate = DATEADD(dd, 1, @StartDate);
END;
GO

Saturday, March 18, 2017

SQL Server Backup Expiry Date Setting

I would say, this is the most confusing property in the Microsoft SQL Server. Most misunderstand it and most do not know how it works. Although there are many posts on this, only few explains it with an example, and some articles/posts make it more confused, hence making this post the way I understand;

If you refer MSDN, it explains it well but in simple term, what it says it, if you set an expiry date for your backup set (if you need to understand what is backup media set, backup media family, backup device or backup set, read this) or if you set Retention days, the backup set can be overwritten only when the backup set is expired or after number of days mentioned with Retention days (this is shown as After x days in GUI).

Simply, if I take a backup today (19-Mar-2017) and set the expiry date as 25-Mar-2017, I will not be able to overwrite the backup set until 25-Mar-2017. Remember, backup can be restored any day regardless of the expiry date set.

Here is an example;

Let's take a backup of AdventureWorks2014 database. See both ways; using GUI and TSQL. Note that I have mentioned a name for Media too. Without this, expiry setting will not as we expect.

BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH  EXPIREDATE = N'03/25/2017 00:00:00', FORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

Now if you try to take another backup to the same media by overwriting the existing backup sets;

BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH NOFORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, NOSKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

SQL Server will throw this error;

If you still need to overwrite it, with TSQL, you can use SKIP instead of NOSKIP, or with GUI, do not mention the Media name.

Wednesday, March 15, 2017

Cannot change the SQL Server Tempdb log file size during the installation

Every SQL Server database has two type of files added; Data file and Log file and same goes for system databases as well. You can change the size of User-Defined Databases when they are created but not when System Databases are getting created by the Installation. However, with SQL Server 2016, we have a new page in the installation wizard that allows you to make some changes to the Tempdb.

As you see, you can change the number of files required for the tempdb, initial size of them and the increment for Auto Growth. Although file size can be changed as you want, it does not allow to change the Log Initial Size beyond 1,024 MB. If you set a larger value, it automatically sets to 1,024MB.

This does not mean that you can set a larger value to tempdb log file. Once the installation is done, you can change it later with the size you need. I assume that the reason for not allowing to set the size larger than 1,024MB during the installation is for saving the overall time for the installation.

Tuesday, March 14, 2017

Which protocol has been used for my SQL Server connection?

SQL Server uses 3 protocols to make the communication between client and the server. Initially there were 4 protocols but now it supports only 3: Shared Memory, Named Pipes and TCP/IP. We can enable/disable these protocols from server-end and change the priority order from client-end. Now, how do I know which protocol has been used for my connection?

We can easily see this by using sys.dm_exec_connection dynamic management view. It shows all current connection along with the used protocol. The net_transport is the one that shows it.

Here is a sample code. The first connection 54, was made without specifying anything additional when connecting, hence it has used Shared Memory. This protocol is used when it is enabled and connection made using the same machine that hosts the SQL Server. The second connection 56 has been established using Named Pipes because I forced to use Named Pipes for my connection.

How can I force the protocol when connecting via SSMS? It is simple. When connecting, if you use lpc: as the prefix for the server name, it uses Shared Memory. If you use np:, then it uses Named Piped.

Wednesday, March 8, 2017

Adding a Hash column using HASHBYTES based on all columns to all tables

We use either Checksum or Hashbytes for generating a value for finding changes of records when need to transfer records from one source to another and changes cannot be identified at the source end. This is specially used in data warehousing. I have written two posts on it, you can read them for getting an idea on it;

Incremental ETL: How to identify changes for fact data when no support from the source

Ways of incrementally loading data to a data warehouse

I had to implement similar with another database but it was not at the design stage. The database is already developed and being used, and it was over 1TB. The requirement was, adding a column that has Hash based on all existing columns. Since there were more than 300 tables, it was not practical to open the Design of the table and add the column. The only option I had was, form a dynamic query that alters the table by adding the column. I had to consider few things with the implementation;

All tables should be altered.
All columns should be used for generating the Hash.
Tables that have records must set Hash immediately.
Null must be replaced with blank because Hashbytes does not accept nulls.

Considering all, I wrote the following code for altering tables. You may use the same if you have the same requirement;

-- Getting table names into a table
-- A temporary table or table variable can be used for this
SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;

-- Adding a clustered index
-- This is not required if the nof tables is low
CREATE CLUSTERED INDEX IX_TableNames ON dbo.TableNames (Id);
GO

DECLARE @Id int = 1;
DECLARE @LastId int = 0;
DECLARE @TableName varchar(500)
DECLARE @TableId int
DECLARE @Sql varchar(max)
DECLARE @Columns varchar(8000)

SELECT @LastId = COUNT(*) FROM dbo.TableNames;

-- Iterate through all tables
WHILE (@Id <= @LastId)
BEGIN

 SELECT @TableName = TableName, @TableId = TableId FROM dbo.TableNames WHERE Id = @Id;
 
 SET @Sql = 'ALTER TABLE dbo.' + @TableName;
 SET @Sql += ' ADD ';
 SET @Sql += ' MIG_HashKey AS HASHBYTES(''MD5'', ';
 
 -- get all columns, convert them to varchar
 -- and replace null with blank value
 SELECT @Columns = STUFF((SELECT '+ IsNull(Convert(varchar(4000), ' + name + '),'''')' FROM sys.columns WHERE object_id = @TableId FOR XML PATH ('')), 1, 2, '');

 SET @Sql += @Columns;
 SET @Sql += ') ';

 -- Execute the code
 BEGIN TRY
  EXEC ( @sql);
 END TRY
 BEGIN CATCH

  PRINT ERROR_MESSAGE()
  PRINT @Sql;
 END CATCH

 SET @Sql = '';
 SET @Columns = ''
 SET @Id += 1; 
END
GO

Tuesday, March 7, 2017

How to get the total row count of all SQL Server tables

I had a requirement for getting the record count of all tables in one of client databases that had many tables with over 10 millions records. There are many ways of getting this, hence explored some to find the most efficient way. I analyzed many techniques using various approaches. Here are some of the ways I used and time it took for producing the result;

Using sys.partitions Catalog View - 1 second
Using SELECT COUNT(*) with sp_MSforeachtable - 10 minutes
Using sys.indexes and dm_db_partition_stats - 1 seconds

One thing we need to remember is, the database we have to work with can have tables with different structures. One can have a heap and another can have clustered structure. Not only that, if we use Dynamic Management Objects or objects that depend on Statistics, we may not get the accurate output. However, 1st option worked well for me, here is the code I wrote for getting result;

SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
--INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;

Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part I

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part II

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;

Azure Feature Pack is installed.
Azure Storage is created and you have the name and key of it.
Azure Active Directory Application is created and linked with the subscription used.

Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription

Here are the steps for creating it.

1. Login to the portal and open the Azure Active Directory blade.

2. Click on App Registration and click Add.

3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.

4. Once the application is created, you need to take the Application ID copied.

5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.

6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.

7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.

9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.

10. With the Add Users Blade, search for the Application Name and select it.

At the end of this steps, you have;

Application Id
Application Key
Tenant Id
Subscription Id
Resource Group Name

Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task

Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.

2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.

3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.

3. Next, fill the following;

Subscription Id - enter the copied one
ResourceGroup - enter the resource group used for the storage.
Location - Select the location used for creating the storage. Everything should be in the same locaiton.
ClusterName - enter the name of the cluster you need.
ClusterSize - enter the number of Worker Nodes you need in your cluster.
BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
Username and Password - enter the user name and password for the cluster.
SshUserName and Password - enter the user name and the password for accessing the cluster remotely.

Everything required is done. If you execute the package or task now, it will create the cluster for you.

Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.

Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.

Sunday, March 5, 2017

Table Value Function or Stored Procedure - SQL Server Brain Basher of the Week #067

Let's talk about a very common scenario that you do with all solutions. If you need to implement a business logic using a database object such as View, Stored Procedure, or Function, how do you decide the best for your requirements.

Here is the Interview Question of the week. Should you use a Stored Procedure or Table Value Function (TVF) for the given scenario?

You have a table called Customer that has a column called CreditLimit. You must write module for accepting the CustomerId and Credit Limit, and updating the CreditLimit column with submitted Credit Limit. You need to make sure that exceptions are handled as well.

In order to answer this question, you should certain set of factors that can be used for deciding whether the logic must be implemented using a Stored Procedure or Function. Let's see each an every factor.

First thing you should know is, the output of the Stored Procedure cannot be easily used with another output generated by SELECT though INSERT EXEC is possible. However this can be easily done with TVFs. In addition to that, Stored Procedures allow us to return multiple results but function can return a single result.

Other main thing is modification. You cannot modify records inside Functions but it is possible within Stored Procedures. And you can form a statement dynamically and execute inside the Stored Procedure but it is not possible with Functions.

Exception handling is not possible with functions but details exception handling is possible with Stored Procedures.

Considering these factors, the answer is Stored Procedure because data modification and exception handling are not possible with Functions.

Saturday, March 4, 2017

Get all SQL Server tables that have IDENTITY enabled

Here is a useful script. If you need to find out tables that have Identity property enabled, you can simply query the sys.tables Catalog View combining with OBJECTPROPERTY function.

USE AdventureWorks2014;
GO

SELECT SCHEMA_NAME(schema_id) + '.' + name TableName 
FROM sys.tables
WHERE OBJECTPROPERTY(object_id, 'TableHasIdentity') = 1;

Remember, this OBJECTPROPERTY function can be used to check many properties related to SQL Server objects. See this MSDN page for more details on it;

https://msdn.microsoft.com/en-us/library/ms176105.aspx

Friday, March 3, 2017

SQL Server View does not show newly added columns

Few months back, this was experienced by one of my clients and I was asked the reason for this. Although I wanted to make a post on it, I could not and forgot but this popped up again while I was going through notes related to my class.

How can this be happened? Assume that you have created a view that references a table called Customer using SELECT * statement. When you access the view with SELECT statement, it returns all columns in Customer table.

USE tempdb;
GO

CREATE TABLE dbo.Customer
(
 CustomerId int identity(1,1) Primary key
 , Name varchar(200) not null
);
GO

CREATE OR ALTER VIEW dbo.vwCustomer
AS
SELECT * FROM dbo.Customer;
GO

SELECT * FROM dbo.vwCustomer;

Next, you add a new column to the table and you expect to see all columns including the newly added one when retrieve records using the view.

ALTER TABLE dbo.Customer
ADD CreditLimit decimal(16, 4) not null;
GO

SELECT * FROM dbo.vwCustomer;

As you see, result does not have the newly added column. What is the reason?

This is because of;

We have used SELECT * statement inside the view.
Metadata of the view has not be updated/refreshed.

If you avoid SELECT * statement when you create views, obviously you do not see this issue because view it is with a defined column set. If you have used SELECT *, then only option is either ALTER the view or refresh the view.

EXEC sp_refreshview 'dbo.vwCustomer';

Once the view is refreshed, new column will be appeared in the result.

Thursday, March 2, 2017

Speeding up INSERTs to Columnstore index by bypassing the Deltastore

Columnstore index is designed for retrieving large number of records much faster with aggregations and grouping. With SQL Server 2016, we can make the table as Clustered Columnstore table and have additional Rowstore indexes for handing SEEK types of requests. Generally we make our data warehousing tables as Clustered Columnstore table but it is not limited to data warehouses.

When inserting records to Clustered Columnstore table, there are few best practices for speeding up the process;

Use Bulk Operations for inserting instead of inserting row by row.
Load data parallel

There are multiple ways of inserting records as a bulk insert. Generally, when we have data to be inserted, in data warehousing, we load the data into staging and then load to main table. When loading from staging to main table, if we use INSERT INTO SELECT... then it becomes a bulk operation. What is the benefit? Benefit is, getting data directly into Row Group without sending them into Deltastore. However, this happens only when the batch size is 102,400 or above.

See below code. First code load data using a bulk operation but number of records is less than 102,400. See the result after that.

INSERT INTO dbo.Sales_2
 ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
SELECT TOP (5000) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009);

As you see, Deltastore is used for holding records. The next code inserts more than 102,400 records. See the result.

SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');

INSERT INTO dbo.Sales_2
 ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
SELECT TOP (102401) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009)
ORDER BY DateKey;

-- check all rowgroups
SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');

As you see, records go directly to Compressed Row Group which is the best. See what happen when you insert same number of records without using a bulk operation.

SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');

DECLARE cur CURSOR
FOR
SELECT TOP (102401) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009)
ORDER BY DateKey;

DECLARE @DateKey datetime ,@channelKey int ,@StoreKey int ,@ProductKey int ,@PromotionKey int ,@CurrencyKey int ,
  @UnitCost money ,@UnitPrice money ,@SalesQuantity int ,@ReturnQuantity int ,@ReturnAmount money ,@DiscountQuantity int ,
  @DiscountAmount money ,@TotalCost money ,@SalesAmount money ,@ETLLoadID int ,@LoadDate datetime ,@UpdateDate datetime 

OPEN cur
FETCH NEXT FROM cur INTO @DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_2
  ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
    ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
    ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
 values
  (@DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate)

 FETCH NEXT FROM cur INTO @DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate
END 

CLOSE cur
DEALLOCATE cur

-- check all rowgroups
SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');
GO

See the result. Row group is not compressed. This will be automatically compressed when the number of records exceed 1,048,576 but it is better to use a bulk operation for getting it compressed with initial loading.

Wednesday, March 1, 2017

SQL Server Licensing for High Availability

Licensing is always complex and sometime we cannot figure out the number of licenses we have to buy for the production. It is always better to contact someone from Microsoft for licensing because it is not always as we think, it gets changed with many options.

This post is not about how to decide the number of licenses you need for Microsoft SQL Server, it is about the licenses you have to buy when you configure High Availability.

There are two models in SQL Server Licensing; Server + CAL and Core based. If you the devices and users connect to SQL Server and it is not much, Server + CAL model works well with it. If the number of connections are unknown and expect many connections, then Core-based model is the best. However, you need to calculate the cost and compare before making the decision.

We use many techniques for implementing High Availability, we can use Log Shipping, Mirroring (both are deprecated), Clustering and AlwaysOn Availability Group. I was recently working on a similar implementation, had a question whether we need to purchase a license for the secondary server as it is not always active. Do we really need to purchase licenses for the secondary/stand-by server?

This was with Enterprise Edition, hence only model we can use with SQL Server 2016 is Core-based. While searching on this, I found a good resource that explains everything need, here are some I have taken from it.

The resource can be found at: http://download.microsoft.com/download/9/c/6/9c6eb70a-8d52-48f4-9f04-08970411b7a3/sql_server_2016_licensing_guide_en_us.pdf

First of all, let's see what is the core.

Your CPU can have multiple Processors and a processor can have multiple cores. The above image shows a CPU with 2 processors and processor is a Octo Core Processor that has 8 cores. If you have a CPU like above, you need to buy 16 Core licenses.

When you implement High Availability with Windows Clustering (whether it is Cloud or On-Premises, it is same), one node becomes Active and other node becomes Passive. Since the Passive Node will not be used unless there is a fail-over, it does not requires licenses for its cores.

However, if you use AlwaysOn Availability Group with Windows Clustering and second node will be used as a read-only instance for queries and reporting, then it needs licenses based on the number of cores.