Showing posts with label Integration Services. Show all posts
Showing posts with label Integration Services. Show all posts

Saturday, October 21, 2017

Microsoft Access Database Engine 2016 Redistributable Setup - You cannot install the 32-bit version, You cannot install the 64-bit version

In order to access Excel workbook using Integration Services, one component we need to install is Microsoft Access Database Engine 2016 Redistributable. If 32-bit version of Office is installed, then 32-bit version of Microsoft Access Database Engine 2016 Redistributable should be installed. If 64-bit version of Office is installed, then 64-bit version of it has to be installed.

If it is not installed, this is what we see when try to access an Excel workbook

The requested OLE DB provider Microsoft.ACE.OLEDB.12.0 is not registered. If the 32-bit drivers is no.....


Generally, once it is installed, we can connect to Excel and work with it without any issue. However, I started experiencing below error when I try to install 32-bit version of Microsoft Access Database Engine 2016 Redistributable.

You cannot install the 32-bit version of Microsoft Access Database Engine 2016 because you currently have 64-bit Office products installed. If you want to.....


I was confused with this message because I have installed 32-bit version of Office. However, just for checking, I tried to install 64-bit version of Microsoft Access Database Engine 2016 Redistributable and this is what I got when I tried........

You cannot install the 64-bit version of Microsoft Access Database Engine 2016 because you currently have 32-bit Office products installed. If you want to....

Not sure what is the reason but it stopped me continuing, leaving me only option as uninstalling Office and installing again. Since it is not the best way, just thought to try out older version of Microsoft Access Database Engine Redistributable, hence downloaded 32-bit version of Microsoft Access Database Engine 2010 Redistributable from https://www.microsoft.com/en-us/download/details.aspx?id=13255. Good news is, I was able to install it and the best news is, SSIS could connect to Excel without any issue. This may not be the best solution for above error but for an urgent situation, you can try this and see.

Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;
  1. Azure Feature Pack is installed.
  2. Azure Storage is created and you have the name and key of it.
  3. Azure Active Directory Application is created and linked with the subscription used.
Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription
Here are the steps for creating it.
1. Login to the portal and open the Azure Active Directory blade.
2. Click on App Registration and click Add.


3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.


4. Once the application is created, you need to take the Application ID copied. 


5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.


6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.


7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.


9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.


10. With the Add Users Blade, search for the Application Name and select it.



At the end of this steps, you have;
  1. Application Id
  2. Application Key
  3. Tenant Id
  4. Subscription Id
  5. Resource Group Name
Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task
Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.


2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.


3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.


3. Next, fill the following;
  • Subscription Id - enter the copied one
  • ResourceGroup - enter the resource group used for the storage.
  • Location - Select the location used for creating the storage. Everything should be in the same locaiton.
  • ClusterName - enter the name of the cluster you need.
  • ClusterSize - enter the number of Worker Nodes you need in your cluster.
  • BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
  • Username and Password - enter the user name and password for the cluster.
  • SshUserName and Password - enter the user name and the password for accessing the cluster remotely.
Everything required is done. If you execute the package or task now, it will create the cluster for you.


Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.


Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.



Friday, November 25, 2016

SSIS OLE DB Destination sends all records to Error Output

SSIS OLE DB Destination is one of the destinations we heavily used for loading data to databases. Generally we used Data access mode - fast load for loading data because it is the most efficient and fastest way of loading data. However with default setting, you may see some issues if you try to capture all error records while loading the destination unless you make some changes to some settings.

Let's make a simple data set and try to load to a table. I have following table created in tempdb.

USE tempdb

CREATE TABLE dbo.Product
(
 ProductId int IDENTITY(1,1) NOT NULL,
 ProductName varchar(200) NOT NULL,
 Class char(1) NOT NULL,
 Cost money NOT NULL,
 PRIMARY KEY CLUSTERED (ProductId ASC)
)
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Class 
CHECK  ((Class='A' OR Class='a' OR Class='B' OR Class='b'))
GO

ALTER TABLE dbo.Product  
ADD  CONSTRAINT CH_Product_Cost CHECK  ((Cost>(0)))
GO

I have a text file named Products.txt that contains 15 records as below;

ProductName,Class,Cost
P01,A,100
P02,B,150
P03,A,344
P04,A,555
P05,A,100
P06,B,150
P07,X,344
P08,A,555
P09,A,555
P10,A,100
P11,B,150
P12,A,344
P13,A,555
P14,A,344
P15,A,555

As you see, P07 record is not a valid record as it has X as the Class. Let's set up a package with a Data Flow Task and configure it to read data from this file and load into Product table as below.

I have used OLE DB Destination with its default settings.


And have configured the Error Output to Redirect Row on Error.


Then I have set a Record Destination just for capturing the error output. If you do, you can configure DataReader Destination as well. This is Data Flow looks like;


Now if I run the package, packages retrieves all records but it send all records to Recordset Destination even though there is only one record that violates a constraint.


What is the reason for this. The reason is the value we have set (or the default value) for Maximum insert commit size property. This property indicates the batch size that the OLE DB Destination tries to commit data during the operation. Since we have only 15 records and the batch size set is 2147483647, it tries to commit all at once. Since there is an invalid record, entire batch is getting failed and none of the record is getting inserted (or committed).

This is fine with a small dataset like this but this is not acceptable for a large data set. Hence we need to make sure that it does not send all records to error flow but an acceptable range.

For this, let's make it as 5 and see.


Here is the result of execution after changes;


This makes sure that at least certain batches are getting inserted without sending all records to error flow.


However, you need to decide the batch size without degrading the performance. If you set it for a small value, you may not get the expected performance form fast-load option.


Thursday, November 24, 2016

SSIS Exception deserializing the package The process cannot access the file .ispac because it is being used by another process

For some reasons, my  SQL Server Data Tools were crashing on a specific package and I could not get it up and running after that until I restarted the machine. However, I started experiencing the same again (not sure about the reason) and this is what I got when I tried to open the package and execute after the crash.

Exception deserializing the package "The process cannot access the file 'C:\Users\DineshPriyankara\Documents\Visual Studio 2015\Projects\TestDataFlowErrors\TestDataFlowErrors\bin\Development\TestDataFlowErrors.ispac' because it is being used by another process.".


When I checked my Task Manager, I noticed that few SSIS Debug Host are running;


Since I was not running any SSIS project, decided to kill all SSIS related task and of course, problem solved with it. I was able to open and run the package without any issue.

Now, not sure whether it is the best way of solving the issue, have asked from some experts, waiting for some explanations....

Wednesday, November 23, 2016

SSIS Pivot Transformation

I have written a post on this in 2011 based on old configuration which is PIVOT and UNPIVOT: Integration Services. Setting Pivot Transformation is not that easy with versions prior to 2012 but GUI given after can be easily used and implemented. If you need to understand the old way of pivoting, read the previous blog post, this post is based on the GUI given with SQL Server 2012 onward.

Pivot allows us to convert a normalized dataset into a less normalized format, transferring (pivoting) row values into columns. In simple term, it identifies distinct values in a designated column and forms columns based on identified values. Note that it is not fully dynamic and automated, it does not recognize the column names to be generated as Pivot Columns, you need to manually specify new column names but the new GUI can be used for getting the column names.

Let's see an example. The following query generates an output with Year+Customer, Month and OrderAmount;

USE AdventureWorks2014;
GO

SELECT 
    Convert(varchar(4), YEAR(OrderDate)) + '_' +  p.LastName OrderYear_Customer
    , DATENAME(mm, OrderDate) OrderMonth
    , SUM(SubTotal) OrderAmount
  FROM Sales.SalesOrderHeader h
    INNER JOIN Sales.Customer c
      ON c.CustomerID = h.CustomerID
    INNER JOIN Person.Person p    
      ON p.BusinessEntityID = c.PersonID
  GROUP BY 
 Convert(varchar(4), YEAR(OrderDate)) + '_' + p.LastName
    ,DATENAME(mm, OrderDate)
  ORDER BY 1, 2


Let's pivot this result using Integration Services as below output;


As you see, what we have to do is, transfer values in OrderMonth column into multiple columns and use OrderAmount column for calculating the relevant total for each month. Use this script for creating a table for holding above resultset. You can create it in a new database, in my case, it has been created in a database called DataWarehouse.

CREATE TABLE dbo.SalesForMonths
(
 Id int identity(1,1) primary key
 , OrderYear_Customer nvarchar(200) NOT NULL
 , January money NULL default(0)
 , February money NULL default(0)
 , March money NULL default(0)
 , April money NULL default(0)
 , May money NULL default(0)
 , June money NULL default(0)
 , July money NULL default(0)
 , August money NULL default(0)
 , September money NULL default(0)
 , October money NULL default(0)
 , November money NULL default(0)
 , December money NULL default(0)
)

Next step is creating a SSIS project and adding a package. Here are the steps for loading and pivoting data;

1. Add an OLE DB Data Source and configure it with above SELECT statement.
2. Add a Pivot Transformation and connect with OLE DB Source. Open the Editor of it (You can either double-click on it or select Edit menu item from the context menu).
3. We need to set following items;
  • Pivot Key: This needs a column that contains values for setting new columns (column names). With this example, it needs to be set with OrderMonth.
  • Set Key: This needs a sorted column that contains values for grouping. Select OrderYear_Customer for this property.
  • Pivot Value: This needs a column that contains values for new pivot output columns. Set OrderAmount to this.
4. Next is setting up output column names. Either we have to hard code this as January,February, March,..... or get the list generated from the Pivot Key Column set. Let's get the list generated. Select ignore un_matched Pivot Key values and report them after DataFlow execution checkbox and click OK.


Although we have not completed the package, let's execute;


Once the execution is done, go to Progress tab and get the values generated copied.


As mentioned above, either you can type these values for generating pivot column names or copy this and paste without typing. 

5. Let's open the Editor of Pivot Transformation again and paste the copied value. Once pasted, click on Generate Column Now button.


6. Click OK to complete the Pivot Transformation.
7. Add an OLE DB Destination and configure for the table created above. 
8. Match the columns flowing from the Pivot with it and save the OLE DB Destination.


9. Execute the package now. 


10. If you query for dbo.SalesForMonths now, you should see the loaded pivot result.



Tuesday, November 22, 2016

What is SSIS dtsx.designer file?

Have you noticed that there are additional files with extension dtsx.designer in your Integration Services 2016 projects?


You are 100% sure that you did not create a file like that and also, it can be opened; it is an XML file. If you analyze the file, you should notice that it is similar to your dtsx file but has additional nodes.

dtsx file:


dtsx.designer file:

As you see, dtsx.designer has more nodes though the content is similar. This file is automatically created when you use one or more Package Parts in your package. This is one of the new features introduced with SQL Server Integration Services 2016.

Note that this file is used only at the design-time. At the run-time, it package parts are getting added to your original package (in my case LoadProducts).


Monday, November 21, 2016

New way of opening the SSIS toolbox

Did you notice that SSIS toolbox appears at the right-corner of the package designer in SQL Server 2016 Data Tools? I can remember and I have seen as well, whenever the toolbox is missing, how we struggle to find it even though a menu item is given.

Anyway, now it really easy, this is icon you need click;


Sunday, September 11, 2016

Could not retrieve the table information for the connection manager Excel Connection Manager

You may experience the below error if you try to connect with latest version of Microsoft Excel using Integration Services;

Could not retrieve the table information for the connection manager 'Excel Connection Manager'. Failed to connect to the source using the connection manager 'Excel Connection Manager'


This error comes when you try to retrieve Excel Sheets from the connected Excel Workbook. In order to solve this, you need to install correct OLE DB Providers and Drivers for Microsoft Office files. Follow the link below for downloading the correct version;


Note that if you have 32-bit version of Office, then you need 32-bit drivers else you need 64-bit drivers.

Tuesday, July 19, 2016

How to add a Watch Windows in SSIS

My post SSIS Progress / Execution Results Tab talked about few things related to debugging and thought to add another post on it, specifically on two windows provided by SQL Server Data Tools for observing values during debugging.

There are two windows that can be used for seeing values of variables and parameters: Locals and Watch Windows. The Locals windows allows us to see systems setting, values of variables and parameters that are currently in scope. The Watch window helps us to specify a specific item such as variable or parameter and track its value through out debugging. If you have multiple variable and parameters and you need to see how they are getting changed during the execution (or debugging), Watch window is the best. You can add multiple Watch windows for tracing multiple items.

Let'see how we can add a Watch and use it when debugging. Follow given steps for creating a package and testing;

1. Create a SSIS project and add a package.
2. Create a variable called TestVariable. Set the type as int.


3. Add a Script Task. Make sure that TestVariable is added under ReadWriteVariables of the Script Task and following code is set with the task.



public void Main()
{
    // TODO: Add your code here

 
    int x = Convert.ToInt32(Dts.Variables["User::TestVariable"].Value);
    x = x + 1;
    Dts.Variables["User::TestVariable"].Value = x;
    Dts.TaskResult = (int)ScriptResults.Success;
}

This code increases the value of the variable by one.

4. Copy the Script Task added and duplicate it two times. Now you have three Script Tasks. Connect them with Success Precedence Constraints.


5. Select the first Script Task and press F9 or use Debug -> Toggle Breakpoint menu for adding a breakpoint. Do the same for other two tasks.


6. Now start debugging. It stops at the first Script Task as we have set a breakpoint. For opening Locals window, use Debug -> Windows -> Locals menu item.

7. Once the Locals window is opened, navigate to find TestVariable. Right-click on it and add a Watch on it.


8. You should see a Watch window now. You will see that Watch window shows how value is getting changed when you continue with debugging.


Monday, July 18, 2016

SSIS Progress / Execution Results Tab

SQL Server Integration Services is The Enterprise ETL platform that supports on all ETLing scenario. Covering from small data transfer to large complex ETL solutions, it has become the number one on ETL solutions for many projects and organizations.

Just like other implementations with other tools, the packages we create with Integrations Services need to troubleshoot when it does not work as we expected. There are many way of troubleshooting, however, the most common technique is debugging. Debugging is the process of finding problems that occur during the execution, either at design stage or after deployment. We can debug SSIS package too.

There are various ways of debugging SSIS package at the development, will discuss it with different post, but let's focus on one common technique. When we run the package using SQL Server Data Tools, most of the events are recorded and can be seen with one of the Tabs in design window. This tab cannot be seen when you open the package for designing but it becomes visible as Progress when you execute and visible as Execution Results when you complete the execution. See the images below;


This tab is very useful as it explains how tasks and containers involve with the execution in a hierarchical manner. You can navigate via the tree and find out the problematic one if there is an issue.


There is a slight performance impact on this collection. Since it has to collect all information, it adds an overhead to the package execution. Even though it is a slight one, if you want, it can be disabled by toggling Debug Process Reporting menu item in the SSIS menu;


Saturday, July 2, 2016

Connecting to the Integration Services - "Access is denied."

If required permissions are not configured for Integration Services for your account, you will experience following error when trying to connect with Integration Services;

Connecting to the Integration Services service on the computer "" failed with the following error: "Access is denied."


This clearly indicates that you need permission on it. How can I add required permissions for any account?

Here are the steps for giving required permission;
1. Search for Component Services and open it.
2. Expand Component Services -> Computers -> My Computer -> DCOM Config.


3. Find Microsoft SQL Server Integration Services, right click on it and get Properties. Go to Security tab and add accounts you need with relevant permissions for Launch and Activation Permissions, Access Permissions, and Configuration Permissions by clicking Edit button.



Click OK to save all. You need to restart the Integration Services service for getting added permission applied. Once restarted, you will be able to connect with it without any issue.