Thursday, April 13, 2017

SQL Server Splits the page when I change an Integer Column Value

I have written few posts on Index Fragmentation and how they can be checked. External Fragmentation occurs as a result of Page-Split and it happens when there is no space in the page to accommodate the change (new record or an update). When no space, it moves approximately 50% of records to a new page, accommodating the change to relevant page.

Read Understanding Index Fragmentation – SS SLUG Aug 2013 – Demo I post for understanding fragmentation.

Read SQL Server - Checking Index Fragmentation - Understanding LIMITED, SAMPLED and DETAILED for understanding various modes that can be used for checking fragmentation.

For understanding page-split, lets take an example. Have a look on below table.

-- creating test table
CREATE TABLE dbo.TestTable
(
    Id int PRIMARY KEY
    , Value char(4000) NOT NULL
)
GO
 
-- inserting 2 records
INSERT INTO dbo.TestTable
    VALUES 
    (2, replicate('a', 4000)), (5, replicate('b', 4000));

It has two records. Since the approximate size of a record is 4004 bytes (4 for int column and 4000 for char column), SQL Server can use one page for holding both records. The size of a data page is 8KB. The below query proves that both records are in the same page.

SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph;


If I make a change to Value column, SQL Server does not need to split the page as the current value uses 4000 bytes. Example, if I change the first record value as REPLICATE('x', 4000), it can simply replace the old value, hence no page split. See below code and the result.

UPDATE dbo.TestTable
    SET Value = replicate('x', 4000)
WHERE id = 2
 
SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph;



Just like that, if we make a change to the integer value, we should not see any change on pages used. Let's make a small change and see.

UPDATE dbo.TestTable
    SET Id = 3
WHERE id = 2;
 
SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph;



Did you notice the change? Why the second record has been moved to a new page? Mean Page-Split has been occurred with this changed?

Is it because of the data type? It is not. Note that this is not just an integer column. It is the PRIMARY KEY; The clustered key. What happened was, SQL Server updated the Value column as an in-place update. It does not require additional space. However SQL Server does not perform in-place update for key columns. It performs delete-plus-insert operation for key column updates. It requires space for the INSERT. That is the reason for the page split. Understand that this behavior is not only for integer, it is for all key columns.

This is why we discourage you to select a column as the key column if it can be changed by a business operation. It is always recommended to use a non-volatile column as the key column that do not expect changes on values.

SQL Server - Checking Index Fragmentation - Understanding LIMITED, SAMPLED and DETAILED

Indexes get fragmented based on the operation we perform against records and modification we do against index keys. We use one of the Dynamic Management Function which is called sys.dm_db_index_physical_stats for checking both Internal Fragmentation and External Fragmentation. Today, I had to check one my clients data table for fragmentation which is a very large table. Since this is a quick look, I decided the use SAMPLED mode for checking the fragmentation instead of my usual mode DETAILED. Once the check is done, we had a short-conversation on the mode selected; difference between them and why should use them. This conversation resulted this post.

If you need to know Internal and External Fragmentation with a sample code, please see my post: Understanding Index Fragmentation – SS SLUG Aug 2013 – Demo I.

Here are the differences between three modes;

Mode Details
LIMITED
  • Fastest way of scanning the index for fragmentation.
  • For B-Tree indexes, only the Parent Level Pages are scanned.
  • For Heaps, associated PFS and IAM pages are checked and data pages are scanned.
  • This cannot be used for checking Internal Fragmentation.
  • This still shows External Fragmentation because it uses Pointers to the Leaf Level in Parent Pages for calculating the External Fragmentation.
SAMPLED
  • Slower than LIMITED as it scans leaf pages too.
  • It uses 1% of all pages for scanning, hence the fragmentation values are approximate.
  • If the number of pages are lesser than 10,000, then DETAILED mode is used instead of SAMPLED mode.
DETAILED
  • Slower than LIMITED as it scans leaf pages too.
  • It uses 1% of all pages for scanning, hence the fragmentation values are approximate.
  • If the number of pages are lesser than 10,000, then DETAILED mode is used instead of SAMPLED mode.


The reason for me to use SAMPLED mode for checking is, the number of records it has. Since it takes long time for scanning all pages, I decided to use SAMPLED mode because it could help to me determine whether the index is fragmented or not.

This shows how the fragmentation is shown with all three modes;



Wednesday, April 12, 2017

How to find the related file and the page numbers of records in SQL Server data table

Sometime, for some administration works, we need to know which file has been used for holding our records and the related page numbers. How can we easily find these information?

Generally we use DBCC IND and DBCC PAGE but there are two more great functions that can be used for finding the same.

The first function is %%PHYSLOC%%. This returns the RID (Record Identifier) as a hexadecimal value. The RID consists file number, page number and, record number. The second function is a table-valued function which is fn_PhysLocCracker. It accepts the RID returning from %%PHYSLOC%% and returns three-columned table for file number, page number and, record number. Combing these two, we can get the information as we want. Here is an example;

SELECT p.BusinessEntityID, p.Title
    , p.FirstName, p.LastName
    , ph.*
FROM [Person].[Person] p
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph



Monday, April 10, 2017

Adding Power BI Reports to Reporting Services

The integration between Reporting Services and Power BI was started with Reporting Services 2016, allowing us to pin SSRS Reports to Power BI Dashboards. Not only that, we can have Power BI files hosted in Reporting Services but Reporting Services does not support opening them inside the portal.


This integration has been extended and it will be available with the next version of SQL Server. With this, we can create reports using Power BI and add them directly to Reporting Services. Not only that, Reporting Services Portal supports opening Power BI Reports inside the portal. Can we see it now? Yes, Technical Preview is available for testing this.

In order to test this, you need to download two files; Power BI Desktop For SSRS (PBIDesktopRS_x64) and SQLServerReportingServices.

Power BI Desktop for SSRS is a separate Power BI instance and it can be installed side by side without removing existing Power BI installation. SQLServerReportingServices is the latest installation for SQL Server which is a standalone installation. It installs Reporting Services and allows us to configure just like the way we do with Reporting Services 2016.


Installing Reporting Services
Let's start it with Reporting Services installation. Just like the way you install any other software, double-click on SQLServerReportingServices.exe and start the installation. You will get the usual agreement window, accept it and continue.


And you should see the final output within less than one minute;


This works without any issue even if you have SQL Server 2016 installation in your machine. The configuration of newly installed Reporting Services can be started by either clicking the button in the last window or searching and opening the right Configuration Manager. If you search for it, you will see two with the exact name (if you have already installed 2016), need to pick the right one.


When you open the Configuration Manager for the one you installed, you should get a similar screen, note the instance name: RSServer.


If you remember the 2016 installation, you know that you get an option for configuring Reporting Services during the installation: Install and Configure or Install Only. Since we did not get anything like that with this installation, we need to configure everything manually. First thing is creating the database.

Go to Database page and click on Change Database.


It starts the wizard, make sure that you select Create a new report server database. This requires a SQL Server database engine. If you have not installed SQL Server, then you need to install it before configuring the report server database. If you have already installed Reporting Services 2016, then you have a Report Server database that is for 2016 instance. If you have, DO NOT OVERWRITE THE EXISTING ReportServer DATABASE. Make sure you give a different name for it. As you see below, I have named my Report Server database as ReportServer_2017.


You can accept the default values for other pages in the wizard unless you need to change it.


Once it is configured, you need to configure Web Portal URL and Web Service URL. Again, make sure it does not conflict with your existing Reporting Services URLs. As you see, I have set the Virtual Directory for Web Portal as Reports_2017.


You need to do the same for Web Service URL as well. Once done, you should be able to browse the Reporting Server with the configured URL.

Installing Power BI
No difference between this installation and standard Power BI Desktop installation.


Once the installation is completed, open it (Note that if you had previous Power BI Desktop installation, now you have two Power BI instances). You should see SQL Server Reporting Services tag when opening Power BI. That confirms that you open the right instance.



Creating Reports and Publishing to Reporting Services
Note that this can be used for creating standard Power BI reports as well. However, if you create a report for Reporting Services, you need to make sure the following;
  • Source is Analysis Services
  • Connection type is Live Connection

At the moment, it supports only Analysis Services but we will surely see all other types with future releases.

Connect with your Analysis Services database and create a report as you need. This is what I created.


In order to publish this to Reporting Services, you can either save this directly to Reporting Services or save as a local file and upload it using Reporting Services Upload File menu. Let's save it directly to Reporting Services.

You should see new saving option as SQL Server Reporting Services.


Select it and enter the URL configured for the portal.


Name it and click OK.


You should see the success message if it can connect with your Reporting Services. Once saved, go to the portal and see it. You should see the added Power BI report. With the previous version of Reporting Services, if you click on added Power BI file, it downloads the file. But with this version, if you click on it, it opens the report and all functionalities works just like the way it works with Power BI online service.


Good thing is, it allows you to edit the report using Power BI;


Once you modified, you do not need to upload the file again because Power BI can directly save it to Reporting Services. The changes will be immediately appear in Reporting Services.


See, how easy it is. Try and see, you will see a lot more options with future releases.



Saturday, March 25, 2017

Loading Data Script for Snowflake type Date Dimension

Date dimension is the most common dimension in data warehousing and it is generally designed as a single table. However, if it needs to be used with multiple Fact Tables with different levels, Date dimension table should be broken into multiple tables that creates a Snowflake Schema. If the table is broken into multiple tables such as Year, Quarter, Month and Date, loading them is bit difficult. A script for loading a single table is available in many sites but it is bit difficult to find a script for loading multiple tables. I had a same requirement and could not find a proper script, hence wrote one.

Here is the script if you need the same;

Here is the script for creating tables;
CREATE TABLE dbo.DimYear
(
 YearKey smallint 
 , Year smallint not null
 , Constraint pk_DimYear Primary Key (YearKey)
);
GO

CREATE TABLE dbo.DimQuarter
(
 QuarterKey smallint 
 , YearKey smallint not null
 , Quarter smallint not null
 , YearQuater varchar(20) not null
 , QuarterName varchar(20) not null
 , Constraint pk_DimQuarter Primary Key (QuarterKey)
 , Constraint fk_DimQuarter_DimYear Foreign Key (YearKey)
  References dbo.DimYear (YearKey)
);
GO

CREATE TABLE dbo.DimMonth
(
 MonthKey int 
 , QuarterKey smallint not null
 , MonthNumber smallint not null
 , MonthName varchar(20) not null
 , YearMonth varchar(20) not null
 , MonthShortName char(3) not null
 , Constraint pk_DimMonth Primary Key (MonthKey)
 , Constraint fk_DimMonth_DimQuarter Foreign Key (QuarterKey)
  References dbo.DimQuarter (QuarterKey)
);
GO

CREATE TABLE dbo.DimDate
(
 DateKey int 
 , MonthKey int not null
 , Date date not null
 , WeekDayNumber smallint not null
 , WeekDayName varchar(20) not null
 , DayOfMonth smallint not null
 , DayOfYear smallint not null
 , IsWeekend bit not null
 , IsHoliday bit not null
 , WeekNumberOfYear smallint not null
 , Constraint pk_DimDate Primary Key (DateKey)
 , Constraint fk_DimDate_DimMonth Foreign Key (MonthKey)
  References dbo.DimMonth (MonthKey)
);
GO


And the script for loading all tables.
SET DATEFIRST 1;

DECLARE @StartDate date = '2010-01-01';
DECLARE @EndDate date = '2019-12-31';
DECLARE @YearKey smallint;
DECLARE @Quarter smallint, @QuarterKey smallint;
DECLARE @Month int, @MonthKey int;
DECLARE @Date int, @DateKey int;

WHILE (@StartDate <= @EndDate)
BEGIN

 -- Inserting years
 SET @YearKey = YEAR(@StartDate);
 IF NOT EXISTS (SELECT * FROM dbo.DimYear WHERE YearKey = @YearKey)
  INSERT INTO dbo.DimYear (YearKey, [Year]) VALUES (@YearKey, @YearKey);


 -- Inserting quarters
 SET @QuarterKey = Convert(smallint, Convert(varchar(4), YEAR(@StartDate)) + Convert(varchar(1), DATEPART(q, @StartDate)))
 SET @Quarter = DATEPART(q, @StartDate);

 IF NOT EXISTS (SELECT * FROM dbo.DimQuarter WHERE QuarterKey = @QuarterKey)
  INSERT INTO dbo.DimQuarter (QuarterKey, YearKey, [Quarter], YearQuater, QuarterName) 
  VALUES (@QuarterKey, @YearKey, @Quarter, Convert(varchar(4), YEAR(@StartDate)) + 'Q' + Convert(varchar(4), @Quarter)
   , CASE @Quarter WHEN 1 THEN 'First Quarter' WHEN 2 THEN 'Second Quarter' WHEN 3 THEN 'Third Quarter' ELSE 'Forth Quarter' END);
  

 ---- Inserting months
 SET @MonthKey = Convert(int, Convert(varchar(4), YEAR(@StartDate)) + (CASE LEN(MONTH(@StartDate)) WHEN 1 THEN '0' ELSE '' END) + Convert(varchar(2), MONTH(@StartDate)));
 SET @Month = MONTH(@StartDate)
 IF NOT EXISTS (SELECT * FROM dbo.DimMonth WHERE MonthKey = @MonthKey)
  INSERT INTO dbo.DimMonth (MonthKey, QuarterKey, MonthNumber, MonthName, YearMonth, MonthShortName) 
  VALUES (@MonthKey, @QuarterKey, @Month, DATENAME(MONTH, @StartDate)
  , Convert(varchar(4), YEAR(@StartDate)) + ' ' + LEFT(DATENAME(MONTH, @StartDate), 3)
  , LEFT(DATENAME(MONTH, @StartDate), 3));
  
 ---- Inserting dates
 SET @DateKey = Convert(int, Convert(varchar(8), @StartDate, 112))
 IF NOT EXISTS (SELECT * FROM dbo.DimDate WHERE DateKey = @DateKey)
  INSERT INTO dbo.DimDate (DateKey, MonthKey, Date, WeekDayNumber, WeekDayName, DayOfMonth
   , DayOfYear, IsWeekend, IsHoliday, WeekNumberOfYear) 
  VALUES (@DateKey, @MonthKey, @StartDate, DatePart(WEEKDAY, @StartDate), DateName(WEEKDAY, @StartDate), DatePart(DAY, @StartDate)
   , DatePart(DAYOFYEAR, @StartDate), CASE WHEN DatePart(WEEKDAY, @StartDate) IN (6, 7) THEN 1 ELSE 0 END, 0
   , DatePart(WEEK, @StartDate));

 SET @StartDate = DATEADD(dd, 1, @StartDate);
END;
GO


Saturday, March 18, 2017

SQL Server Backup Expiry Date Setting

I would say, this is the most confusing property in the Microsoft SQL Server. Most misunderstand it and most do not know how it works. Although there are many posts on this, only few explains it with an example, and some articles/posts make it more confused, hence making this post the way I understand;

If you refer MSDN, it explains it well but in simple term, what it says it, if you set an expiry date for your backup set (if you need to understand what is backup media set, backup media family, backup device or backup set, read this)  or if you set Retention days, the backup set can be overwritten only when the backup set is expired or after number of days mentioned with Retention days (this is shown as After x days in GUI).

Simply, if I take a backup today (19-Mar-2017) and set the expiry date as 25-Mar-2017, I will not be able to overwrite the backup set until 25-Mar-2017. Remember, backup can be restored any day regardless of the expiry date set.

Here is an example;

Let's take a backup of AdventureWorks2014 database. See both ways; using GUI and TSQL. Note that I have mentioned a name for Media too. Without this, expiry setting will not as we expect.




BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH  EXPIREDATE = N'03/25/2017 00:00:00', FORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

Now if you try to take another backup to the same media by overwriting the existing backup sets;


BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH NOFORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, NOSKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

SQL Server will throw this error;


If you still need to overwrite it, with TSQL, you can use SKIP instead of NOSKIP, or with GUI, do not mention the Media name.

Wednesday, March 15, 2017

Cannot change the SQL Server Tempdb log file size during the installation

Every SQL Server database has two type of files added; Data file and Log file and same goes for system databases as well. You can change the size of User-Defined Databases when they are created but not when System Databases are getting created by the Installation. However, with SQL Server 2016, we have a new page in the installation wizard that allows you to make some changes to the Tempdb.


As you see, you can change the number of files required for the tempdb, initial size of them and the increment for Auto Growth. Although file size can be changed as you want, it does not allow to change the Log Initial Size beyond 1,024 MB. If you set a larger value, it automatically sets to 1,024MB.

This does not mean that you can set a larger value to tempdb log file. Once the installation is done, you can change it later with the size you need. I assume that the reason for not allowing to set the size larger than 1,024MB during the installation is for saving the overall time for the installation.

Tuesday, March 14, 2017

Which protocol has been used for my SQL Server connection?

SQL Server uses 3 protocols to make the communication between client and the server. Initially there were 4 protocols but now it supports only 3: Shared Memory, Named Pipes and TCP/IP. We can enable/disable these protocols from server-end and change the priority order from client-end. Now, how do I know which protocol has been used for my connection?

We can easily see this by using sys.dm_exec_connection dynamic management view. It shows all current connection along with the used protocol. The net_transport is the one that shows it.

Here is a sample code. The first connection 54, was made without specifying anything additional when connecting, hence it has used Shared Memory. This protocol is used when it is enabled and connection made using the same machine that hosts the SQL Server. The second connection 56 has been established using Named Pipes because I forced to use Named Pipes for my connection.


How can I force the protocol when connecting via SSMS? It is simple. When connecting, if you use lpc: as the prefix for the server name, it uses Shared Memory. If you use np:, then it uses Named Piped.


Wednesday, March 8, 2017

Adding a Hash column using HASHBYTES based on all columns to all tables

We use either Checksum or Hashbytes for generating a value for finding changes of records when need to transfer records from one source to another and changes cannot be identified at the source end. This is specially used in data warehousing. I have written two posts on it, you can read them for getting an idea on it;


I had to implement similar with another database but it was not at the design stage. The database is already developed and being used, and it was over 1TB. The requirement was, adding a column that has Hash based on all existing columns. Since there were more than 300 tables, it was not practical to open the Design of the table and add the column. The only option I had was, form a dynamic query that alters the table by adding the column. I had to consider few things with the implementation;
  • All tables should be altered.
  • All columns should be used for generating the Hash.
  • Tables that have records must set Hash immediately.
  • Null must be replaced with blank because Hashbytes does not accept nulls.
Considering all, I wrote the following code for altering tables. You may use the same if you have the same requirement;

-- Getting table names into a table
-- A temporary table or table variable can be used for this
SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;

-- Adding a clustered index
-- This is not required if the nof tables is low
CREATE CLUSTERED INDEX IX_TableNames ON dbo.TableNames (Id);
GO

DECLARE @Id int = 1;
DECLARE @LastId int = 0;
DECLARE @TableName varchar(500)
DECLARE @TableId int
DECLARE @Sql varchar(max)
DECLARE @Columns varchar(8000)

SELECT @LastId = COUNT(*) FROM dbo.TableNames;

-- Iterate through all tables
WHILE (@Id <= @LastId)
BEGIN

 SELECT @TableName = TableName, @TableId = TableId FROM dbo.TableNames WHERE Id = @Id;
 
 SET @Sql = 'ALTER TABLE dbo.' + @TableName;
 SET @Sql += ' ADD ';
 SET @Sql += ' MIG_HashKey AS HASHBYTES(''MD5'', ';
 
 -- get all columns, convert them to varchar
 -- and replace null with blank value
 SELECT @Columns = STUFF((SELECT '+ IsNull(Convert(varchar(4000), ' + name + '),'''')' FROM sys.columns WHERE object_id = @TableId FOR XML PATH ('')), 1, 2, '');

 SET @Sql += @Columns;
 SET @Sql += ') ';

 -- Execute the code
 BEGIN TRY
  EXEC ( @sql);
 END TRY
 BEGIN CATCH

  PRINT ERROR_MESSAGE()
  PRINT @Sql;
 END CATCH

 SET @Sql = '';
 SET @Columns = ''
 SET @Id += 1; 
END
GO

Tuesday, March 7, 2017

How to get the total row count of all SQL Server tables

I had a requirement for getting the record count of all tables in one of client databases that had many tables with over 10 millions records. There are many ways of getting this, hence explored some to find the most efficient way. I analyzed many techniques using various approaches. Here are some of the ways I used and time it took for producing the result;

  1. Using sys.partitions Catalog View - 1 second
  2. Using SELECT COUNT(*) with sp_MSforeachtable - 10 minutes
  3. Using sys.indexes and dm_db_partition_stats - 1 seconds

One thing we need to remember is, the database we have to work with can have tables with different structures. One can have a heap and another can have clustered structure. Not only that, if we use Dynamic Management Objects or objects that depend on Statistics, we may not get the accurate output. However, 1st option worked well for me, here is the code I wrote for getting result;

SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
--INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;


Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;
  1. Azure Feature Pack is installed.
  2. Azure Storage is created and you have the name and key of it.
  3. Azure Active Directory Application is created and linked with the subscription used.
Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription
Here are the steps for creating it.
1. Login to the portal and open the Azure Active Directory blade.
2. Click on App Registration and click Add.


3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.


4. Once the application is created, you need to take the Application ID copied. 


5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.


6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.


7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.


9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.


10. With the Add Users Blade, search for the Application Name and select it.



At the end of this steps, you have;
  1. Application Id
  2. Application Key
  3. Tenant Id
  4. Subscription Id
  5. Resource Group Name
Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task
Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.


2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.


3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.


3. Next, fill the following;
  • Subscription Id - enter the copied one
  • ResourceGroup - enter the resource group used for the storage.
  • Location - Select the location used for creating the storage. Everything should be in the same locaiton.
  • ClusterName - enter the name of the cluster you need.
  • ClusterSize - enter the number of Worker Nodes you need in your cluster.
  • BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
  • Username and Password - enter the user name and password for the cluster.
  • SshUserName and Password - enter the user name and the password for accessing the cluster remotely.
Everything required is done. If you execute the package or task now, it will create the cluster for you.


Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.


Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.



Sunday, March 5, 2017

Table Value Function or Stored Procedure - SQL Server Brain Basher of the Week #067

Let's talk about a very common scenario that you do with all solutions. If you need to implement a business logic using a database object such as View, Stored Procedure, or Function, how do you decide the best for your requirements.

Here is the Interview Question of the week. Should you use a Stored Procedure or Table Value Function (TVF) for the given scenario?

You have a table called Customer that has a column called CreditLimit. You must write module for accepting the CustomerId and Credit Limit, and updating the CreditLimit column with submitted Credit Limit. You need to make sure that exceptions are handled as well.

In order to answer this question, you should certain set of factors that can be used for deciding whether the logic must be implemented using a Stored Procedure or Function. Let's see each an every factor.

First thing you should know is, the output of the Stored Procedure cannot be easily used with another output generated by SELECT though INSERT EXEC is possible. However this can be easily done with TVFs. In addition to that, Stored Procedures allow us to return multiple results but function can return a single result.

Other main thing is modification. You cannot modify records inside Functions but it is possible within Stored Procedures. And you can form a statement dynamically and execute inside the Stored Procedure but it is not possible with Functions.

Exception handling is not possible with functions but details exception handling is possible with Stored Procedures.

Considering these factors, the answer is Stored Procedure because data modification and exception handling are not possible with Functions.

Saturday, March 4, 2017

Get all SQL Server tables that have IDENTITY enabled

Here is a useful script. If you need to find out tables that have Identity property enabled, you can simply query the sys.tables Catalog View combining with OBJECTPROPERTY function.

USE AdventureWorks2014;
GO

SELECT SCHEMA_NAME(schema_id) + '.' + name TableName 
FROM sys.tables
WHERE OBJECTPROPERTY(object_id, 'TableHasIdentity') = 1;

Remember, this OBJECTPROPERTY function can be used to check many properties related to SQL Server objects. See this MSDN page for more details on it;


Friday, March 3, 2017

SQL Server View does not show newly added columns

Few months back, this was experienced by one of my clients and I was asked the reason for this. Although I wanted to make a post on it, I could not and forgot but this popped up again while I was going through notes related to my class.

How can this be happened? Assume that you have created a view that references a table called Customer using SELECT * statement. When you access the view with SELECT statement, it returns all columns in Customer table.

USE tempdb;
GO

CREATE TABLE dbo.Customer
(
 CustomerId int identity(1,1) Primary key
 , Name varchar(200) not null
);
GO

CREATE OR ALTER VIEW dbo.vwCustomer
AS
SELECT * FROM dbo.Customer;
GO

SELECT * FROM dbo.vwCustomer;


Next, you add a new column to the table and you expect to see all columns including the newly added one when retrieve records using the view.

ALTER TABLE dbo.Customer
ADD CreditLimit decimal(16, 4) not null;
GO

SELECT * FROM dbo.vwCustomer;


As you see, result does not have the newly added column. What is the reason?

This is because of;
  1. We have used SELECT * statement inside the view.
  2. Metadata of the view has not be updated/refreshed.
If you avoid SELECT * statement when you create views, obviously you do not see this issue because view it is with a defined column set. If you have used SELECT *, then only option is either ALTER the view or refresh the view.

EXEC sp_refreshview 'dbo.vwCustomer';

Once the view is refreshed, new column will be appeared in the result.