Dinesh's Blog :::: Being Compiled ::::: SSIS

Showing posts with label SSIS. Show all posts

Saturday, October 21, 2017

Microsoft Access Database Engine 2016 Redistributable Setup - You cannot install the 32-bit version, You cannot install the 64-bit version

In order to access Excel workbook using Integration Services, one component we need to install is Microsoft Access Database Engine 2016 Redistributable. If 32-bit version of Office is installed, then 32-bit version of Microsoft Access Database Engine 2016 Redistributable should be installed. If 64-bit version of Office is installed, then 64-bit version of it has to be installed.

If it is not installed, this is what we see when try to access an Excel workbook

The requested OLE DB provider Microsoft.ACE.OLEDB.12.0 is not registered. If the 32-bit drivers is no.....

Generally, once it is installed, we can connect to Excel and work with it without any issue. However, I started experiencing below error when I try to install 32-bit version of Microsoft Access Database Engine 2016 Redistributable.

You cannot install the 32-bit version of Microsoft Access Database Engine 2016 because you currently have 64-bit Office products installed. If you want to.....

I was confused with this message because I have installed 32-bit version of Office. However, just for checking, I tried to install 64-bit version of Microsoft Access Database Engine 2016 Redistributable and this is what I got when I tried........

You cannot install the 64-bit version of Microsoft Access Database Engine 2016 because you currently have 32-bit Office products installed. If you want to....

Not sure what is the reason but it stopped me continuing, leaving me only option as uninstalling Office and installing again. Since it is not the best way, just thought to try out older version of Microsoft Access Database Engine Redistributable, hence downloaded 32-bit version of Microsoft Access Database Engine 2010 Redistributable from https://www.microsoft.com/en-us/download/details.aspx?id=13255. Good news is, I was able to install it and the best news is, SSIS could connect to Excel without any issue. This may not be the best solution for above error but for an urgent situation, you can try this and see.

Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part I

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part II

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;

Azure Feature Pack is installed.
Azure Storage is created and you have the name and key of it.
Azure Active Directory Application is created and linked with the subscription used.

Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription

Here are the steps for creating it.

1. Login to the portal and open the Azure Active Directory blade.

2. Click on App Registration and click Add.

3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.

4. Once the application is created, you need to take the Application ID copied.

5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.

6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.

7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.

9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.

10. With the Add Users Blade, search for the Application Name and select it.

At the end of this steps, you have;

Application Id
Application Key
Tenant Id
Subscription Id
Resource Group Name

Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task

Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.

2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.

3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.

3. Next, fill the following;

Subscription Id - enter the copied one
ResourceGroup - enter the resource group used for the storage.
Location - Select the location used for creating the storage. Everything should be in the same locaiton.
ClusterName - enter the name of the cluster you need.
ClusterSize - enter the number of Worker Nodes you need in your cluster.
BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
Username and Password - enter the user name and password for the cluster.
SshUserName and Password - enter the user name and the password for accessing the cluster remotely.

Everything required is done. If you execute the package or task now, it will create the cluster for you.

Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.

Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.

Saturday, February 11, 2017

Azure Feature Pack for Integration Services (SSIS) - New Version Available

Microsoft has made the latest version of Azure Feature Pack for Integration Services (SSIS) and it is available for downloading. This was released on 28-Dec-2016 and there are few addition components added when comparing with the previous version.

Why we need this? Simply to work with Big Data as part of our ETL solution with the support of Azure Big Data platform which is HDInsight and supportive projects.

I wrote few posts on previous version of this, you can read more on them at;

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part I

Creating HDInsight Hadoop Cluster using SSIS and processing unstructured data using Hive Task - Azure Feature Pack - Part II

The above posts explain how to download and install the feature pack, configure tasks for accessing HDInsight and process some unstructured data using tasks given with SSIS.

You can get the latest version of Azure Feature Pack from: https://msdn.microsoft.com/en-us/library/mt146770.aspx

I noticed one Control Flow new Task; Azure SQL DW Upload Task and two Data Flow items; Azure Data Lake Store Source and Azure Data Lake Store Destination that were not available with the previous version.

Previous Version:

Latest Version:

Need to test these new items, must be really interesting, will surely share them via blog posts.

Friday, November 25, 2016

SSIS OLE DB Destination sends all records to Error Output

SSIS OLE DB Destination is one of the destinations we heavily used for loading data to databases. Generally we used Data access mode - fast load for loading data because it is the most efficient and fastest way of loading data. However with default setting, you may see some issues if you try to capture all error records while loading the destination unless you make some changes to some settings.

Let's make a simple data set and try to load to a table. I have following table created in tempdb.

USE tempdb

CREATE TABLE dbo.Product
(
 ProductId int IDENTITY(1,1) NOT NULL,
 ProductName varchar(200) NOT NULL,
 Class char(1) NOT NULL,
 Cost money NOT NULL,
 PRIMARY KEY CLUSTERED (ProductId ASC)
)
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Class 
CHECK  ((Class='A' OR Class='a' OR Class='B' OR Class='b'))
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Cost CHECK  ((Cost>(0)))
GO

I have a text file named Products.txt that contains 15 records as below;

ProductName,Class,Cost

P01,A,100

P02,B,150

P03,A,344

P04,A,555

P05,A,100

P06,B,150

P07,X,344

P08,A,555

P09,A,555

P10,A,100

P11,B,150

P12,A,344

P13,A,555

P14,A,344

P15,A,555

As you see, P07 record is not a valid record as it has X as the Class. Let's set up a package with a Data Flow Task and configure it to read data from this file and load into Product table as below.

I have used OLE DB Destination with its default settings.

And have configured the Error Output to Redirect Row on Error.

Then I have set a Record Destination just for capturing the error output. If you do, you can configure DataReader Destination as well. This is Data Flow looks like;

Now if I run the package, packages retrieves all records but it send all records to Recordset Destination even though there is only one record that violates a constraint.

What is the reason for this. The reason is the value we have set (or the default value) for Maximum insert commit size property. This property indicates the batch size that the OLE DB Destination tries to commit data during the operation. Since we have only 15 records and the batch size set is 2147483647, it tries to commit all at once. Since there is an invalid record, entire batch is getting failed and none of the record is getting inserted (or committed).

This is fine with a small dataset like this but this is not acceptable for a large data set. Hence we need to make sure that it does not send all records to error flow but an acceptable range.

For this, let's make it as 5 and see.

Here is the result of execution after changes;

This makes sure that at least certain batches are getting inserted without sending all records to error flow.

However, you need to decide the batch size without degrading the performance. If you set it for a small value, you may not get the expected performance form fast-load option.

Thursday, November 24, 2016

SSIS Exception deserializing the package The process cannot access the file .ispac because it is being used by another process

For some reasons, my SQL Server Data Tools were crashing on a specific package and I could not get it up and running after that until I restarted the machine. However, I started experiencing the same again (not sure about the reason) and this is what I got when I tried to open the package and execute after the crash.

Exception deserializing the package "The process cannot access the file 'C:\Users\DineshPriyankara\Documents\Visual Studio 2015\Projects\TestDataFlowErrors\TestDataFlowErrors\bin\Development\TestDataFlowErrors.ispac' because it is being used by another process.".

When I checked my Task Manager, I noticed that few SSIS Debug Host are running;

Since I was not running any SSIS project, decided to kill all SSIS related task and of course, problem solved with it. I was able to open and run the package without any issue.

Now, not sure whether it is the best way of solving the issue, have asked from some experts, waiting for some explanations....

Wednesday, November 23, 2016

SSIS Pivot Transformation

I have written a post on this in 2011 based on old configuration which is PIVOT and UNPIVOT: Integration Services. Setting Pivot Transformation is not that easy with versions prior to 2012 but GUI given after can be easily used and implemented. If you need to understand the old way of pivoting, read the previous blog post, this post is based on the GUI given with SQL Server 2012 onward.

Pivot allows us to convert a normalized dataset into a less normalized format, transferring (pivoting) row values into columns. In simple term, it identifies distinct values in a designated column and forms columns based on identified values. Note that it is not fully dynamic and automated, it does not recognize the column names to be generated as Pivot Columns, you need to manually specify new column names but the new GUI can be used for getting the column names.

Let's see an example. The following query generates an output with Year+Customer, Month and OrderAmount;

USE AdventureWorks2014;
GO

SELECT 
    Convert(varchar(4), YEAR(OrderDate)) + '_' +  p.LastName OrderYear_Customer
    , DATENAME(mm, OrderDate) OrderMonth
    , SUM(SubTotal) OrderAmount
  FROM Sales.SalesOrderHeader h
    INNER JOIN Sales.Customer c
      ON c.CustomerID = h.CustomerID
    INNER JOIN Person.Person p    
      ON p.BusinessEntityID = c.PersonID
  GROUP BY 
 Convert(varchar(4), YEAR(OrderDate)) + '_' + p.LastName
    ,DATENAME(mm, OrderDate)
  ORDER BY 1, 2

Let's pivot this result using Integration Services as below output;

As you see, what we have to do is, transfer values in OrderMonth column into multiple columns and use OrderAmount column for calculating the relevant total for each month. Use this script for creating a table for holding above resultset. You can create it in a new database, in my case, it has been created in a database called DataWarehouse.

CREATE TABLE dbo.SalesForMonths
(
 Id int identity(1,1) primary key
 , OrderYear_Customer nvarchar(200) NOT NULL
 , January money NULL default(0)
 , February money NULL default(0)
 , March money NULL default(0)
 , April money NULL default(0)
 , May money NULL default(0)
 , June money NULL default(0)
 , July money NULL default(0)
 , August money NULL default(0)
 , September money NULL default(0)
 , October money NULL default(0)
 , November money NULL default(0)
 , December money NULL default(0)
)

Next step is creating a SSIS project and adding a package. Here are the steps for loading and pivoting data;

1. Add an OLE DB Data Source and configure it with above SELECT statement.
2. Add a Pivot Transformation and connect with OLE DB Source. Open the Editor of it (You can either double-click on it or select Edit menu item from the context menu).
3. We need to set following items;

Pivot Key: This needs a column that contains values for setting new columns (column names). With this example, it needs to be set with OrderMonth.
Set Key: This needs a sorted column that contains values for grouping. Select OrderYear_Customer for this property.
Pivot Value: This needs a column that contains values for new pivot output columns. Set OrderAmount to this.

4. Next is setting up output column names. Either we have to hard code this as January,February, March,..... or get the list generated from the Pivot Key Column set. Let's get the list generated. Select ignore un_matched Pivot Key values and report them after DataFlow execution checkbox and click OK.

Although we have not completed the package, let's execute;

Once the execution is done, go to Progress tab and get the values generated copied.

As mentioned above, either you can type these values for generating pivot column names or copy this and paste without typing.

5. Let's open the Editor of Pivot Transformation again and paste the copied value. Once pasted, click on Generate Column Now button.

6. Click OK to complete the Pivot Transformation.

7. Add an OLE DB Destination and configure for the table created above.

8. Match the columns flowing from the Pivot with it and save the OLE DB Destination.

9. Execute the package now.

10. If you query for dbo.SalesForMonths now, you should see the loaded pivot result.

Tuesday, November 22, 2016

What is SSIS dtsx.designer file?

Have you noticed that there are additional files with extension dtsx.designer in your Integration Services 2016 projects?

You are 100% sure that you did not create a file like that and also, it can be opened; it is an XML file. If you analyze the file, you should notice that it is similar to your dtsx file but has additional nodes.

dtsx file:

dtsx.designer file:

As you see, dtsx.designer has more nodes though the content is similar. This file is automatically created when you use one or more Package Parts in your package. This is one of the new features introduced with SQL Server Integration Services 2016.

Note that this file is used only at the design-time. At the run-time, it package parts are getting added to your original package (in my case LoadProducts).

For more info on this, read: https://msdn.microsoft.com/en-us/library/mt620034.aspx

Monday, November 21, 2016

New way of opening the SSIS toolbox

Did you notice that SSIS toolbox appears at the right-corner of the package designer in SQL Server 2016 Data Tools? I can remember and I have seen as well, whenever the toolbox is missing, how we struggle to find it even though a menu item is given.

Anyway, now it really easy, this is icon you need click;

Sunday, July 31, 2016

Dimension Loading: Source sends same record with different Source ID

An interesting question was raised during my lecture today and it was related to dimension loading. The question was; how can we identify and update the dimension record that is already exist if the key of the entity has been changed in the source.

Let me explain the question with more details. Assume that we have a dimension named DimCustomer and it is loaded with following two records;

And with the next loading, what if we get some new records (last three records - 150, 151, 152) and Id 151 is actually the old record which had Id of 100. This can happen with various business scenario, and, since the source id is new, we always consider it as a new record without checking.

This has to be checked, we need to make sure that SourceKey of the dimension record is changed with new SourceId (CustomerId) without inserting a new record, duplicating this customer: Jane Knight. How do we do it with our SSIS package?

Generally, if we cannot find changes in the source, we get all records from the source to ETL package and then check for new records using a Lookup configured for the data warehouse. With a scenario like this, what we have to do is, once new records are identified, use FuzzyLookup for checking and seeing whether records that are similar are exist with records found as "New". We can use exact matching (equi-join) without using Fuzzy Lookup but there can be slight changes which cannot be found using exact matches. If we find some matching rows, we need to update them without inserting.

Have a look data flow in this ETL package;

This package extracts Customers from the source using Get Customers from the source. I have used the query shown above for that. Then the Lookup: Lookup - Find new Customers is used for matching with existing records in DimCustomer and finding new customers. The Lokkup No Match Output gives us new records.

Next is the Fuzzy Lookup for finding whether records identified as "New" are already exist with old records. For this, we have to connect again with the data warehouse and look for similarities on Customer Name and Town. If exist, need to get the existing CustomerKey as CurrentCustomerKey. Not that the Similarity Threshold is set to 95%, not to 99%.

If we get a value for CurrentCustomerKey, and similarity is closer to 100%, we can consider the records as an old records though we get it as a new record from the source. The added Conditional Split checks it and allows us to create two flows for "real new" records and "old but new" records.

After that, we can have two destinations, one for inserting new records, and one for updating old records with new CustomerId. For update, as you see, Script Component is used as below;

xxx

When run the package, as you see, 4 records are coming from the source, three records are identified as new records, Fuzzly Look along with Conditional Split gives 2 records as new and 1 record as a new record but related to an old record. Therefore package inserts two new records and update one records. Here is the result now;

Saturday, July 30, 2016

SQL Server Brain Basher of the Week #049 - SSIS Package Store

Let's talk about Integration Services in this week for Brain Basher. SQL Server supports two deployment models named Package Deployment Model and Project Deployment Model. The Package Deployment Model was the initial model available with SQL Server and this allows us to deploy packages and manage individually. The Project Deployment Model was introduced with SQL Server 2012 and it allows us to deploy all packages as a single unit. This week question is based on old model; Package Deployment Model.

What is SSIS Package Store?

When deploying with Package Deployment Model, individual files can be deployed either to msdb database or to file system. There are multiple ways of deploying individual packages. One way is, importing packages using Import Package in SSMS Integration Services to import into SQL Server. When importing to SQL Server using Import Package, if you select the destination as File System instead of MSDB database, the package is saved in the SSIS Package Store.

The path of the SSIS Package Store is C:\Program Files\Microsoft SQL Server\{version}\DTS\Packages. Let's deploy a package to SQL Server / File System and see how it goes to SSIS Package Store.

Let's open SSMS and connect with Integration Services;

Once connected, right click on File System and select Import Package, you need to select the package for importing (or deploying).

Once imported, you can have a look on relevant installation folder, you should see the package deployed.

Should we use this or not? It depends, I prefer File System not SQL Server / File System or SQL Server / MSDB. Biggest issue with this is, no specific folders in this folder for instances if you have multiple instanced from the same version. You may encouter some issues with security as well. Therefore, for older way, File System is better and if possible, go for the latest way which Project Deployment Model.