Showing posts with label Azure. Show all posts
Showing posts with label Azure. Show all posts

Saturday, April 2, 2022

Azure Synapse Analytics - Part III - Serverless SQL Pool



Azure Synapse Serverless SQL Pool facilitates data exploration, transformations and data warehousing with multiple functionalities, allowing us to work with it using SQL. This video discusses how it works, what we can do and demos on use cases. This is the content:

  • Introduction to Serverless SQL Pool
  • How useful it is
  • Is it same as Dedicated SQL Pool?
  • Demo
    • Data exploration
    • Data transformation
    • Making a logical data warehouse and accessing it using Power BI

Read more at:
Serverless SQL Pool Architecture Serverless SQL Pool - Microsoft Learn

Azure Synapse Analytics - Part II - Hypersapce

 *** This is the session I have done for Sri Lanka Data Community – Feb 2022 episode. This does not contain the entire event.



With modern data solutions, where we see data warehousing and big data, analytics are not limited to columnar based queries, highly selective queries are part of the trend now. Here is the way of optimizing such queries with Hyperspace: An indexing subsystem for Apache Spark. Read more at:
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-performance-hyperspace?pivots=programming-language-csharp/?WT.mc_id=DP-MVP-33296

This talks about - What is Hyperspace and how it helps us to add indexes - How to see indexes, what changes it makes to the data lake - How to check whether indexes are used for queries - How to compare indexed query and non-indexed query with given APIs If you have any questions, please add to the comment section.

Wednesday, April 8, 2020

Automating Resume and Pause of Azure Analysis Services - YouTube Video

This video discusses how to automate starting and stopping Azure Analysis Services using Azure Automation Account. Following are demonstrated;
  • Create Azure Automation Account
  • Import libraries required
  • Create a PowerShell Runbook
  • Schedule the Runbook

Saturday, March 21, 2020

Power BI App for seeing Coronavirus - COVID-19 updates in Sri Lanka

Here is an Azure site built for showing updates related to Coronavirus - COVID-19, specifically for Sri Lanka.

This has been designed using Azure Data Factory, Azure SQL Database, Power BI and Azure App Services.

 Coronavirus COVID-19 - Sri Lanka - Update

Tuesday, April 2, 2019

Azure Databricks - What, Why & How - YouTube Video

Here is the first video on Azure Databricks. This talks about;

  • Why we need to Azure Databricks for processing data
  • What is Azure Databricks
  • Demo: How to provision an Azure Databricks workspace, a cluster, a database and a table.



Sunday, June 24, 2018

Azure Data Factory V2 - Lookup and ForEach Activities

Here is my third video on Azure Data Factory. This video discusses and demonstrates;

  • Why we need iteration in ETLs
  • Lookup Activity
  • ForEach Activity
  • Demo: Iterating over a data collection using ForEach Activity, with data loaded from Lookup Activity

Thursday, February 1, 2018

Azure Data Factory V2 - Creating a Pipeline using Author-Tab

Here is another video on Azure Data Factory V2. This video explains how to create a Pipeline manually, using Author-Tab. This video discusses and demonstrates;
  • How to create Linked Services manually
  • How to create Datasets manually
  • How to create a Pipeline and add Activities
  • And how to schedule the Pipeline






Thursday, January 25, 2018

Azure Data Factory V2 - Copying On-Premise SQL Server data to Azure Data Lake Store

Azure Data Factory has been enhanced significantly with V2 and its support on Cloud-ETL-and-ELT is excellent now. First version of it was not much user-friendly but now it is not that difficult to implement ETL solutions with V2.

This video discusses;
  • Why we need Azure Data Factory
  • What is Azure Data Factory
  • How it works and how it should be used
  • Integration Runtime - installing for connecting with sources
  • Copying On-Premise SQL Server Data to Azure Data Lake using Data Factory

Here is the video;


Here are some sample screens;





Monday, January 15, 2018

Introduction to Azure Data Lake Analytics - and basics of U-SQL

I made another video on Azure Data Lake, specifically on Azure Data Lake Analytics. This is a 40-minutes video and it discusses following items along with demonstrations;

  • What is Azure Data Lake Analytics
  • Data Lake Architecture and how it works
  • Comparison between Azure Data Lake Analytics, HDInsight and Hadoop for processing Big Data.
  • What is U-SQL and basics of it.
  • Demo on How to create an Azure Data Lake Analytics account
  • Demo on How to execute a simple U-SQL using Azure Portal
  • Demo on How to extract multiple files, transform using C# methods and referenced assembly, making multiple results with bit complex transformations using Visual Studio.
Here is the video.



Thursday, January 4, 2018

Introduction to Azure Data Lake Store - Does it replace the Data Warehouse?

I have made a video on Azure Data Lake and Azure Data Lake Store and published in YouTube. This video explains What is Azure Data Lake, Its Characteristics, Components of it, How it works, Usage, Comparison between Data Lake Store and Blob Storage, Modern Data Warehousing and How can we use Azure Data Lake with modern data warehousing.



Have a look on it and let me know you thoughts or questions you have.

Monday, October 2, 2017

Introduction to Azure HDInsight

I have written few posts on HDInsight and thought to make series of Video on it as well. Here is the first one, with this video, I discuss, what is Hadoop, What is HDInsight, Configuration of HDInsight, Important Settings related to HDInsight and how to access the cluster using Azure Portal and PuTTY.

Watch and see!


Tuesday, September 12, 2017

Run Hive Queries using Visual Studio

Once HDInsight cluster is configured, we generally use either the portal dashboard (Powered by Ambari) or a tool like PuTTY for executing queries against data loaded. Although they are not exactly a developer related tools, or in other words, not an IDE, we had to use because we did not have much options. However, now we can use the IDE we have been using for years for connecting with HDInsight and executing various types of queries such as Hive, Pig and USQL. It is Visual Studio.

Let's see how we can use Visual Studio for accessing HDInsight.

Making Visual Studio read for HDInsight

In order to work with HDInsight using Visual Studio, you need to install few tools on Visual Studio. Here are the supported versions;
  • Visual Studio 2013 Community/Professional/Premium/Ultimate with Update 4
  • Visual Studio 2015 any edition
  • Visual Studio 2017 any edition
You need to make sure that you have installed Azure SDK on your Visual Studio. Click here for downloading the Web Platform Installer and make sure following are installed;


This installs Microsoft Azure Data Lake Tools for Visual Studio as well, make sure it is installed.


Now your Visual Studio is ready for accessing HDInsight. 

Connecting with HDInsight

Good thing is, you can connect with your cluster even without creating a project. However, once the SDK is installed, you can see new Templates called Azure Data Lake - HIVE (HDInsight), Pig (HDInsight), Storm (HDInsight) and USQL (ADLA) and HIVE template can be used for creating a project.


Project creates one hql file for you and you can use it from executing your Hive Queries. In addition to that, You can open Server Explorer (View Menu -> Server Explorer), and expand Azure (or connect to your Azure account and then expand) for seeing all components related to Azure.


As you see, you can see all databases, internal and external tables, views and columns. Not only that, by right-clicking the cluster, you can open a windows for writing a query or viewing jobs. Here is the screen when I use the first option that is Write a Hive Query.


Did you notice Intelli-Sense? Yes, it supports with almost all metadata, hence it is really easy to write a query.

Executing Queries

If you need to see records in tables without limiting data with predicates or constructing the query with additional functions, you can simply right-click on the table in Server Explorer and select View top 100 Rows


If you need to construct a query, then use the above method for opening a window and write the query. There are two ways of executing the code: Batch and Interactive. Batch mode does not give you the result immediately but you will be able to see or download once the job submitted is completed. If you use the Interactive, then it is similar to SSMS result.


If you use the Batch mode, you can see the way job is getting executed. Once the job is completed, you can click on Job Output for seeing or downloading the output.


As you see, there is no graphical interface to see the job execution. Visual Studio will show the job execution using a graphical interface only when the job is executed by Tez Engine. Remember, HDInsight will always use Tez Engine to execute Hive Queries but simpler queries will be executed using Map Reduce Engine.

See this query that has some computation;


Can we create table with this IDE?
Yes, it is possible. You can right-click on the your database in Azure Server Explorer and select Create table menu item. 


Let's talk about more on this with later posts.

Sunday, September 10, 2017

How to refer files in HDInsight - Azure Storage using different ways

If you have started working with Big Data, you surely need to check the Microsoft support on it via Azure platform - HDInsight service. HDInsight allows you to create a Hadoop environment within few minutes and it can be anytime scaled out or in based on your requirements. I have written few posts on this, you can have a look on them using following links;


In order to work with data loaded to HDInsight, or Hadoop, data files have to be refereed using supported syntax. There are multiple ways for referring files in the storage with HDFS. Here are the ways;

Fully qualified path with wasb(s) protocol

This is most accurate and correct way of referring files in the storage. Here is the pattern;

wasb://container_name@storage_account_name.blob.core.windows.net/file_path.

Here is an example using Putty, connecting with HDInsight and reading a file (processed with Hive) exist. My container name is dinesqlashdinsight and storage name is dinesqlasstorage. File path is data/cleanlog/000000_0 (this is a Hive table in fact).


Connecting with the default container

If your files are in the default container, you can skip the container name and storage name as follow;

wasb:///file_path

Note the three slashes. It is required when you do not mentioned the container name.


Connecting using Hadoop/Linux/Unix native ways

Generally, when you work with Hadoop using Linux/Unix, you refer files without the protocol. Azure HDInsight supports the same and we can refer files using that syntax.

file_path


Do I need double quotes for my paths?

It is required when you have some odd characters like equal (=) sign with your path. See the example below. I try to read a data file exist in a the cluster and the path has equal signs, hence path is encased with double quotes.



Wednesday, September 6, 2017

How to Connect HDInsight using Putty

I have written many articles on HDInsight Service which is an Azure service that allows us to provision Apache Hadoop Cluster and use it for various type of big data related implementations. If you need to see all my articles written on HDInsight, please use the link: http://dinesql.blogspot.com/search?q=hdinsight.

Although I have mainly used Azure portal to demonstrate HDInsight (Ambari dashboard), many use a tool called PuTTY that is a free implementation of SSH and Telnet for Windows and Unix platform. It is a good tool to work with, specifically if you need to connect the cluster remotely and work with it for navigating nodes or doing some data work with some sub projects like Hive against the cluster. Remember, if you have configured your HDInsight cluster as a Linux-based cluster, you need a SSH client to remotely connect with it. Let's see the way of using it.

First of all you need to download PuTTY for using it as Windows machines do not have it as a built-in tool (Linux and Mac OS computers have a SSH client interface built-in). Here is the link for downloading PuTTY.


There is no installation with it. You can just use the putty.exe. However, you need a host name to connect. If you go to the portal and open your cluster blade, you can get the Host Name for connecting with a SSH Client by clicking Secure Shell (SSH) button.


Note that there are two, one for Windows and another for Linux, Unix and OS X users, make sure that you copy the Windows one.


When you open PuTTY, you should see an input box called Host Name for entering the copied host name. Enter the copied value and make sure that the Connection Type is selected as SSH.

All you have to do now is, click Open. This opens the standard window with standard font and font size. If you need a larger font, you can go to Window - Appearance  and change settings as you want.

When you open this first time, you will get a security warning as below;


This says that we do not have the key related to the host cached locally, hence it is not secure or not sure whether we connect to the right one. Since we know that we connect to the right host, we can simply click Yes and continue.

Once connected, you will be promoted for user id and password. Note that you need to use the one you configured as SSHUser and the Password when the cluster was created. It does not accept the HTTP User.


Enter the SSHUser and the password, you will be connected with the default node.


Everything is ready now. As you see, first part of the last line is the user connected and hn0 is one of the head nodes. Last part of it shows the cluster name but we see it partially as it is lengthy. Now I can navigate, make folders, move files or connect with Hive without any issue.





Monday, August 14, 2017

Azure SQL Data Warehouse - Part II - Analyzing Created Databases

I started writing on this long time back but had no way of completing it. My first post on this was Azure SQL Data Warehouse - Part I - Creating and Connecting and here is the second part of it.

The first post discussed about Azure SQL Data Warehouse and how to create a server and database using the portal. As the next step, let's talk about the architecture of it bit and then see how Azure data warehouse maintains and processes data.

Control Node

Azure SQL Data Warehouse is a distributed database. It means that data is distributed in multiple locations. However, once the data warehouse is created, we will be connecting with one component called Control Node. It is not exactly a SQL Server database but when connecting to it, it looks and feels like connecting to a SQL Server Database. Control node handles all communication and computation. When we make a request to the data warehouse, Control node accepts it, determines the way it should be distributed based on divide and conquer approach, get it processed and finally send the result to us.

Compute Node

Control node get the data processed in parallel using Compute Nodes. They are SQL Server databases and store all our records. Based on the number of DWU configured, data warehouse is set with one or more Compute Nodes.

Distribution

Data related to the data warehouse is stored in Azure Blob Storage and distributed in multiple locations. It is independent from Compute Nodes, hence they can be operated/adjusted independently. These locations are called as Distributions. The number of distributions for an Azure SQL data warehouse is a fixed number that is sixty (60). These distributions are assigned dynamically to Compute Nodes and when a query is executed, each distribution processes data related to them. This is how the parallel execution happens.

If you need more compute power, you can increase the number of Compute Nodes by increasing DWUs. When the number of Compute Nodes are getting changed, the number of distributions per Compute Node is getting changed as well.

Architecture

This image shows the architecture when you create an Azure SQL Data Warehouse with 100 DWU.


This image shows the architecture when you create an Azure SQL Data Warehouse with 400 DWU.


Let's create two databases and clarify this configuration.

I have discussed all steps related to server creation and database creation in my first post that is  Azure SQL Data Warehouse - Part I - Creating and Connecting , hence I am not going to discuss the same here. Note the image below. That is how I have created two databases for checking the configurations.


As you see, the first data warehouse is created using 100 DWUs and second one with 400 DWUs. Let's see how nodes have been created for these two databases. We can simply use sys.dm_pdw_nodes DMV for getting this information.

SELECT * FROM sys.dm_pdw_nodes;

SELECT type, Count(*) NofNodes
FROM sys.dm_pdw_nodes
GROUP BY type;

Result with the first data warehouse that is created using 100 DWUs.


Result with the second data warehouse that is created using 400 DWUs.


Note the second data warehouse. Since we used more DWUs, it has been created with four Compute Nodes that gives better performance than the first one. Since this is a sample database and it has tables, we can check one of the tables and see how data is distributed with distributions.

The following code shows the distributions created for one data warehouse. As mentioned above, it is always 60.


Here is the code for seeing how rows of a table are distributed in distributions with the second data warehouse created. Note how each distributions are assigned to Compute Nodes.


Records are distributed based on the design of the table. Azure SQL Data Warehouse uses two types of distributions: Round Robin and Hash distributions. Let's talk about it with the next post.

Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;
  1. Azure Feature Pack is installed.
  2. Azure Storage is created and you have the name and key of it.
  3. Azure Active Directory Application is created and linked with the subscription used.
Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription
Here are the steps for creating it.
1. Login to the portal and open the Azure Active Directory blade.
2. Click on App Registration and click Add.


3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.


4. Once the application is created, you need to take the Application ID copied. 


5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.


6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.


7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.


9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.


10. With the Add Users Blade, search for the Application Name and select it.



At the end of this steps, you have;
  1. Application Id
  2. Application Key
  3. Tenant Id
  4. Subscription Id
  5. Resource Group Name
Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task
Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.


2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.


3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.


3. Next, fill the following;
  • Subscription Id - enter the copied one
  • ResourceGroup - enter the resource group used for the storage.
  • Location - Select the location used for creating the storage. Everything should be in the same locaiton.
  • ClusterName - enter the name of the cluster you need.
  • ClusterSize - enter the number of Worker Nodes you need in your cluster.
  • BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
  • Username and Password - enter the user name and password for the cluster.
  • SshUserName and Password - enter the user name and the password for accessing the cluster remotely.
Everything required is done. If you execute the package or task now, it will create the cluster for you.


Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.


Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.



Sunday, February 26, 2017

Azure SQL Database or SQL Server on Azure Virtual Machine - SQL Server Brain Basher of the Week #066

Here is a modern interview question on SQL Server. In addition to general questions on administration and maintenance, Azure based questions have become common now because many maintain databases either in a VM hosted in cloud or as a Cloud-Managed-Database. Therefore, let's talk about an interview question related to Azure.

Here is the question. What are the factors you can use for determining whether the database has to be set up with a Virtual Machine hosted on Azure or set up as an Azure SQL Database?.



There is a very common answer for this question. If we want to get the entire database managed by Azure, such as Disaster Recovery, Backup, or Performance Tuning, then Azure SQL Database is the best. If we need to manage the instance by ourselves, then we need to go ahead with SQL Server on Azure Virtual Machine.

There is nothing wrong with the answer but it always better to know few more factors that can be used for deciding the way. Let's talk about some important factors.

Azure SQL Database is a relational Database-as-a-Service (DBaaS) (that falls under industry categories of Software-as-a-Service - SaaS and Platform-as-a-Service - PaaS) and SQL Server on Azure Virtual Machine is a Infrastructure-as-a-Services - IaaSAzure SQL Database is completely managed by Microsoft, you do not need to worry about the Operating System, configuring hardware, service packs, patches. But if it is SQL Server on Azure Virtual Machine, then we need to manage everything by us. It is good if you have a team or dedicated DBA for managing the database and you need to be the administrator of it, not Microsoft.

Azure SQL Database Server is logical server though it can hold multiple databases. It is not considered as an Instance that you configure in SQL Server on Azure Virtual Machine. You know that, you can have multiple unrestricted administrators in an instance which is possible with SQL Server on Azure Virtual Machine but not possible with Azure SQL Database.

Another key thing you need to remember is, Microsoft makes most of new features available initially with Azure SQL Database before making them available with other versions. That is why you see features such as Treat Detection, Temporal Tables with Retention Policies only available with Azure SQL Database. In addition to that, this has an Intelligence Database Service that recommends possible optimizations for performance tuning.

Do not forget, Azure SQL Database is limited to 1 TB whereas instance of SQL Server on Azure Virtual Machine can consume 64 TB with multiple databases.

We do not need to worry on High Availability with DBaaS as it is provided with the service. With IaaS needs a manual set up with multiple VMs using clustering + Availability Group as High Availability is only for the VM, not for the SQL Server instance.

Azure SQL Database has no hardware and administrative cost where as SQL Server on Azure Virtual Machine has administrative cost. When it comes to License cost, Azure SQL Database is sold as a service based on multiple tiers and SQL Server on Azure Virtual Machine comes with a license but you can use your own license if you need.

If you need to use other components such as Integration Services, Analysis Services or Reporting Services, then SQL Server on Azure Virtual Machine is the only option though some services such as Analysis Services available as a managed service.

There are few more tiny reasons for picking one over other. But as an Interviewee, the mentioned ones should be known.


Thursday, February 16, 2017

Creating multiple administrative accounts in Azure SQL Database Server

You know that we can create multiple administrator accounts in SQL Server if it is a on-premises instance or an instance configured in a VM (hosted in cloud or on-premises). What we generally do is, create a login and add the login to sysadmin fixed server role. Can we do the same in Azure SQL Database (Server)?

There is only one unrestricted administrative account that can be be created in Azure SQL Database Server. It is created when creating the server. In addition to that, you can add either one Azure Active Directory Account or Security Group Account (that has multiple accounts as members). If you open the Properties of your server, you can see your administrative accounts;


However, if you need to add multiple administrative accounts (not unrestricted administrative accounts), then there is a way of adding them. There are two server-level administrative roles that are available in the master database which user accounts can be added for granting permissions for creating and managing databases and logins.

The first role is dbmanager. This role has permission to create new databases, hence members of this role can create databases. This role exist in the master database. Therefore, only users in the master database can become members of this role.

-- In master database
-- creating a user
CREATE USER Dinesh2 WITH PASSWORD = 'Pa$$w0rd';

-- adding the user to dbmanager role
ALTER ROLE dbmanager ADD MEMBER Dinesh2;

The other role is loginmanager. This role has permissions for creating logins in the master database. Again, this role is master database, hence only users in master database can become members of it.

-- In master database
-- creating a user
CREATE USER Dinesh3 WITH PASSWORD = 'Pa$$w0rd';

-- adding the user to loginmanager role
ALTER ROLE loginmanager ADD MEMBER Dinesh2;

Wednesday, February 15, 2017

Should I create Azure SQL Logins or Azure SQL Users?

Unless it is a contained database, we must create a login for connecting with SQL Server in order to access a database. Databases are accessed using Users and Users are created using Logins. Is it same with Azure SQL Databases? Should I create a login before creating a user?


With Azure SQL Database, it is not a must as Azure SQL database is a contained database (portable). We can create a user account with a password for accessing the database without creating a login. However, it allows us to create a login in the master and then create a user in our database using the login created in the master.

What if I created a user in my database using a login?
The main advantage is maintenance. If the login needs to be dropped or disabled, it can be simply done in the master database. Not only that, since multiple user accounts can be created with multiple databases using the same login, one login can be used for connecting the multiple databases.
The biggest disadvantage with this is, database becomes non-portable. If databases needs to be copied, moved or replicated, login must be created with new servers.

What if I created a user in my database without creating a login?
The main benefit we get is a portable database. Since all users with passwords are maintained inside the database, it can be copied, moved, replicated without any issue.
However, if the environment has multiple databases and users need to access more than one database, then multiple user accounts have to be created.

Monday, February 13, 2017

The Integration between Azure Data Lake Store and Azure SQL Data Warehouse

As a part of my presentation, I had to show the integration between Azure Data Lake Store (ADL) and Azure SQL Data Warehouse (Azure DW), hence to created the image below;


Generally ADL is used for storing unlimited data in any format (structured and unstructured) and processing them using various tools given with Massively Parallel Processing (MPP). We use Azure DW for storing a large volume of structured data, again using MPP. In most cases, Azure SQL DW is the source for the Business Intelligence. Earlier, Azure DW could not ingest data from ADL but now it is possible with PolyBase, hence, best practice is, load data into ADL, process them and then transfer to Azure SQL DW for reporting.