Tuesday, December 6, 2016

Azure SQL Data Warehouse - Part I - Creating and Connecting

We have been implementing on-premises data warehouses for years and everyone is well aware on difficulties-faced at the first phase of the project, that is requesting hardware for the project. In most cases, either the project is stopped or postponed by the management or client saying that no-budget. But now we have a solution that addresses the initial issue we have. Not only that, this solution gives you many benefits in terms of performance as well. It is Azure  SQL Data Warehouse.

The Azure SQL Data Warehouse is a cloud-based SQL Data Warehouse which can be easily scaled with cloud-based-resources and pay as-it-grows. It is a fully managed service that that covers maintenance, software patches and backups. It can store and process both relational and non-relational data, and many Azure features such as Azure Blob Storage, Azure Data Factory can be used for implementing it.

Let's talk about some benefits in details;

Scalability and Cost
This is one of the major benefits. You can scale Azure Data Warehouse, increasing resources when required and cost will be calculated based on it. Since there is no upfront cost involved with this, you can simply start with less number of resources and adjust when you want. You will be charged on two resource usage types: Storage and Compute.

The storage configured for Azure SQL Data Warehouse is automatically re-scaled as you add and remove data, hence you pay for what you use.

The value set for Compute is the measurement for performance of execution in Azure SQL Data Warehouse. It is measured in Data Warehouse Units (DWU) and you can start with 100 DWU (USD 2 per hour) to 6000 DWU (USD 123 per hour) as per the requirements. This value can be anytime changed and cost of the usage will be calculated as per the value set.


Performance
Since this can be scaled out as you need, data and queries can be distributed, increasing the performance. This uses Massive Parallel Processing Architecture (MPP), hence workload can be spread across many nodes using complex statistics calculated, increasing the performance significantly.

This uses Columnstore Indexing that is 10 times faster than Rowstore Indexing. Since it supports 5 times more compression than standard Rowstore Indexes reducing IOs, it improves the performance more.

Hybrid Cloud
It is a hybrid cloud environment. It supports seamless integration with on-premises data, cloud data and of course unstructured data. With Polybase, different type of sources can be queried using TSQL as data is stored in local, making sure that everything required is shown in a single database.

Not only that, copying data from other sources can be easily done via Azure Data Factory.

Okay, now let's see how we can create one and star using it. First of all, you need an Azure Account. Once an account is created, you need to create a SQL Server which is a logical server for holding your data warehouse. If you have already created a SQL Sever, then it can be used, else follow the steps below for creating a logical server.

1. Login to the Azure Portal and open SQL Servers Blade.
2. Set Sever name, Server admin login, Password, Resource Group and Location. Note that you can add database specific logins later.


3. Click on Create button to get it created. If you want to see the server in Azure Dashboard, make sure that Pin to dashboard is checked.

Now the server is ready. Let's create a data warehouse.

3. Open the SQL Data Warehouses blade.
4. Enter the Database Name. With this test, we will be using AdventureWorks (we will be creating different databases later), hence set the name as AdventureWorksDW.
4. Select the appropriate Resource Group. Make sure that you select the same selected for the server.
5. Select Sample for Select source.
6. Select AdventureWorsDW as the sample database.
7. Select the server created with above steps and enter the admin account and its password.
8. Select DWU as you need. Let's initially set this as 200.


9. Select Pin to dashboard and click Create to get the data warehouse created.

Database is created. Remember, you cannot connect with the server configured until you are permitted to connect from your IP address. Initially you can configure your IP address to the server using the portal. Later, you can connect to the server using Management Studio and configure IP addresses for others using TSQL, either at server level or database level. Here are the steps.

1. Open the portal and open SQL Servers blade.
2. Click on the server created (or the one used for adding the data warehouse) to get its properties.
3. You can add your IP address by clicking Add client IP button and clicking Save button. Or else, you can manually set it using the given grid.


4. Add it and save.

Now firewall rule is configured for you. Let's connect and see.

1. Open Management Studio and connect with the server account as below.


2. Once connected, open a new Query Window and type below code for seeing firewall rules set.


3. If you need to add more rules for others, use sp_set_firewall_rule for server level rules and sp_set_database_firewall_rule for adding database level rules.

-- Adding server level rules
EXECUTE sp_set_firewall_rule @name = N'dinesqlserverAdminRule',
     @start_ip_address = '192.168.10.100', @end_ip_address = '192.168.10.100'

-- Use this for removing it
-- EXECUTE sp_delete_firewall_rule @name = N'dinesqlserverAdminRule'



-- Adding database level rules
-- *** Connect with database and execute
EXEC sp_set_database_firewall_rule @name = N'dinesqlserverDevelopersRule', 
     @start_ip_address = '192.168.10.220', @end_ip_address = '192.168.10.225'

EXEC sp_delete_database_firewall_rule @name = N'dinesqlserverDevelopersRule'

4. Query your database and see now. You should be able to query your table without any issue.


With the next post, let's see how we can use other tools like Visual Studio and Power BI for connecting with the data warehouse created.



3 comments:

  1. Very informative. In this approach where are the ETLs are reside ?

    ReplyDelete
  2. Thanks Nisal, we can use either SSIS or Azure Data Factory, I will be covering those with next parts of this.

    ReplyDelete