Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;
  1. Azure Feature Pack is installed.
  2. Azure Storage is created and you have the name and key of it.
  3. Azure Active Directory Application is created and linked with the subscription used.
Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription
Here are the steps for creating it.
1. Login to the portal and open the Azure Active Directory blade.
2. Click on App Registration and click Add.


3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.


4. Once the application is created, you need to take the Application ID copied. 


5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.


6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.


7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.


9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.


10. With the Add Users Blade, search for the Application Name and select it.



At the end of this steps, you have;
  1. Application Id
  2. Application Key
  3. Tenant Id
  4. Subscription Id
  5. Resource Group Name
Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task
Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.


2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.


3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.


3. Next, fill the following;
  • Subscription Id - enter the copied one
  • ResourceGroup - enter the resource group used for the storage.
  • Location - Select the location used for creating the storage. Everything should be in the same locaiton.
  • ClusterName - enter the name of the cluster you need.
  • ClusterSize - enter the number of Worker Nodes you need in your cluster.
  • BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
  • Username and Password - enter the user name and password for the cluster.
  • SshUserName and Password - enter the user name and the password for accessing the cluster remotely.
Everything required is done. If you execute the package or task now, it will create the cluster for you.


Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.


Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.



No comments: