Monday, August 22, 2016

SQL Server Heap Structure shows many IO than expected

It is very rare to see a table with Heap Structure as it is not a common practice but we still find such tables with some implementations. When performing performance tuning on problematic tables or queries, the first check we have in the checklist is whether the query produces an odd number of IOs than it should show, hence, same is applied to a query written against a heap structure as well. What if it shows a significant higher number for IOs?

This can happen because of Forwarding Pointers. When a value of a column is updated that requires more space and if the Data Page has no space, the page is set with a new page or last page it finds and keeps a pointer in the original page. SQL Server does this because by doing like this, it does not need to update the RID (8byte – file number: page number: row number) of non-clustered indexes created. But if the table has many records and many similar updates, it creates many forwarding pointers and increases the IO for satisfying the request.

Let's check this with a simple code;

The following code creates a table and inserts set of records;

USE tempdb;
GO

CREATE TABLE dbo.TestTable
(
 Id int identity(1,1)
 , Name varchar(200) not null
);
GO

INSERT INTO dbo.TestTable
 (Name) VALUES ('Name');
GO 1000000

This code checks the space. As you see, the table consumes 21800KB and if we divide it by 8, we can get the approximate number of pages it needs to hold records, which is 2725. The last statement shows the pages read.

-- This shows the space used for data is 21800 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 21800 / 8 = 2725
-- And this proves that our calculation is correct

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


Now let's update the table by increasing the size of the second column. Since all pages are full with initial inserts, SQL Server moves updated records to new pages, adding pointers to the original location. If we check the same after the update, you will see that our calculation shows that it needs only 16630 pages but there are 489,382 page reads.

UPDATE dbo.TestTable
 SET Name = REPLICATE('N', 200)
WHERE Id % 2 = 0;

-- This shows the space used for data is 133040 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 133040 / 8 = 16630
-- But we do not see the same now

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


This will surely slow down the queries. How this has happened? It has happened because of forwarding pointers set. The following code shows you the number of forwarding pointers created by our update statements.


Have a look on the output, the 489,382 IOs is formed as Page Count + Forwarded Record Count which is 16630 + 472,752.

If you see a similar issue with your table, it is better to upgrade the table from Heap to Clustered structure. However, as a temporary measure, you can run the following code for fixing it.

ALTER TABLE dbo.TestTable REBUILD;

Sunday, August 21, 2016

SQL Server Brain Basher of the Week #051 - SQL Server 2016

SQL Server 2016 is released and many have started using it. Here is a simple question on SQL Server 2016;

True or False: SQL Server 2016 is available for both x32 and x64 architectures.

Although earlier versions of SQL Server supported both x32 and x64 architectures, SQL Server 2016 is supported on x64 processor only. It is no longer supported x86 processors (x32 architecture).

Friday, August 19, 2016

How to Recover Deleted Tables in SQL Database [Guest Post]

This post is written by: Priyanka Chouhan
Priyanka Chouhan is a technical writer in "Stellar Data Recovery" with 5 years of experience and has written several articles on SQL. She has the fine knowledge of SharePoint and SQL Server. In the spear time she loves reading and gardening.

The SQL Server deleted table can be recovered using transaction log and Log Sequence Number (LSN). Transaction logs are responsible for logging the database activities. Using the undocumented function “fn_dblog” you can find out the unauthorized deletion of data and can recover back the deleted data contents. 

If the estimated time and record of data is known, it is not a difficult task to recover your data. In case, if you are not aware of the date and time of its deletion, then it requires detecting the real cause of deletion. The initial task is to find out the accurate log sequence numbers (LSN) under which the DELETE statement is executed, and then the data will be recovered until the exact LSN is reached.

What is an LSN?
Each record in SQL Server database transaction log is identified by a unique LSN. It is associated with every record at which the substantial event occurred. This may be helpful for building accurate restore sequence. LSNs are used internally for tracking the point in time at which data is restored during the RESTORE sequence. While restoring a backup, the data is restored to LSN, which is associated with that point in time during which the data was recorded. 

Restore Deleted SQL Database Tables with Log Sequence Number
SQL Server deleted data can be restored easily by overwriting the original database using backup files. If the DROP TABLE statement is executed accidently, and the full database backup is available, it can be considered that no modifications were made after the table was dropped.

If this is not the case, deleted data can still be recovered even if:
  • the database objects has been modified after executing the DROP TABLE statement,
    or
  • valid database backup is not available
Every DROP operation is logged to the transaction log in SQL Server. This implies that the transaction log saves information required to roll back transactions, considering that the SQL Database is in full recovery mode and supports point in time recovery from the respective DROP TABLE statement.

If the DROP SQL database operation is logged to the SQL Server transaction log, it is possible to roll back the operation even if it is compressed within an active transaction.

DROP TABLE Customer1
SELECT * FROM Customer1

Since the table ‘Customer1’ has been dropped, the above SQL Query will result in “Msg 208, Level 16, State 1” error message.

Since the DROP operation is rolled back, the following SQL Query will return entire rows existing in the Customer1 table.

BEGIN TRAN 
DROP TABLE Customer1
ROLLBACK 
SELECT * FROM Customer1

Can Dropped Tables be restored? 

In case, the transaction with DROP operation is no longer active, the table Customer1 cannot roll back since it has been committed. Therefore, the data it stores will be deleted. The question that now arises is whether the data being deleted due to DROP operation can be recovered even if the full backup is not available. The answer is yes. SQL tables can be recovered using third party software even in cases where the backup is not available. 

Follow the steps below to recover SQL Server deleted table data and gain access to integrated attributes with the help of Stellar Phoenix SQL Database Repair:
  1. Download, install and launch the software.
  2. The home screen will appear with a message box stating that the SQL Server must be stopped before initializing the recovery process.
  3. In the next window, select the database (MDF file) in which table has to be restored.
  4. Under Find Database column, provide the location of the folder to search the database.
  5. Alternatively, click the Find button if the database location is unknown.
  6. You can select the option to recover deleted records as well by checking the box which reads “Include Deleted Records”.
  7. Select the Repair button to initiate the procedure to recover the SQL Server deleted tables.
  8. The tool will display all the items of the database including tables, indexes, Views, Triggers, Rules, Functions, etc. on the left panel.
  9. You can check the boxes available against each item to recover the specific item. In this case, you need to select Tables.
  10. Next, preview all the details incorporated in the selected table including deleted records will be displayed on the right panel of the tool.
  11. The Electronic Registration wizard will open. Enter the Serial Number delivered to your Email ID.
  12. The tool will provide the option to save the recovered database into four file formats, namely, MSSQL, CSV, HTML, and XLS.
  13. Next, click on the Browse button to provide the destination path for storing the recovered MDF file.
  14. Next, click on OK to proceed further with the process to restore the deleted SQL Server table.
  15. You can optionally select to open the recovered database by checking the box “Automatically open the destination folder after saving”.
  16. Then, click on OK.
  17. Select the Saving option to restore the recovered tables from the MDF file either from: 
    1. New Database: Selecting this option restores recovered data to new database, or
    2. Live Database: Selecting this option restores recovered data to existing database

      Note: If you select to store recovered data to the existing database, you need to provide the SQL Server Database name and other credentials.
  18. Next, the message box will appear stating that the recovered data is restored to the selected format
You can now view the newly stored database or MDF file to access the recovered tables.





Thursday, August 18, 2016

NoSQL, Not only SQL, NewSQL: Are they same?

Last year, I wrote a post on NoSQL Document Database: What is NoSQL Document Database? What is Azure DocumentDB?, that speaks about Document Database and how it stores data. Although the post explains about NoSQL as well, it does not talk about other terms like Not Only SQL or NewSQL, hence thought to make a post on it.

NoSQL in simple term
We talk about NoSQL when we talk about different type of databases (or storage). In most cases, our projects were done with relational databases but for certain scenario, it needs a different structure to hold data, specifically when the data needs to hold is semi-structured or unstructured. As I mentioned in the previous post, it is not a new database type, it is as old as relational databases. In simple terms, NoSQL database is a non-relational database that allows us to store mainly large mount of semi-structured data.

NoSQL has become prevalent in recent years because;
  • Previously-ignored-Big-Data has become the most important.
    Generally, applications built ignore some of the data not because of they are not important but difficult to process with traditional DBMSs. Large streams of data, semi-strucutured or unstrcutured data coming from various equipments are some of the examples. With modern technologies and techniques developed, it has become easy to process them, hence previously-ignored data needs to be captured and stored. This is one of the reasons for getting NoSQL databases are populated.
  • Less-expensive-scaled-out solutions
    For processing large datasets, specifically data without a proper structure, it needs a good infrastructure solution. Initially, the required infrastructure was very expensive, but now, with powerful commodity hardware, virtualization technologies, cloud services, the cost has relativiely has gone down. Not only that, since some services can be taken on demand and pay as you go, the usage of them has become easy and affordable. This has definetly become another factor for higher usage of NoSQL.
Not Only SQL
Although NoSQL databases are non-relational, some of databases now support SQL as well, hence sometime they are referred as Not Only SQL.

NewSQL
The NewSQL represents relational database that offers NoSQL capabilities. It offers similar scalability and allows us to work with semi-structured and unstructured data as we do with NoSQL databases.

You can see types of NoSQL databases available with this link: http://nosql-database.org/



Sunday, August 7, 2016

SQL Server Brain Basher of the Week #050 - SSRS Client Printing

Here is the weekly question on the SQL Server for improving your knowledge for facing interviews. This time, let's talk about an area on Reporting Services.

SQL Server Reporting Services is a great platform for all types of reporting solutions and its Web Portal (former Report Manager) is one of the tools for managing deployed reports. The Web Portal allows you to view and manage reports published and it allows you to print viewed reports like below;


Now the question is How can you disable the printing facility from the portal?

If you have a requirement like that, it can be done with Reporting Services Properties using the Management Studio. Simply, connect to Reporting Services using the Management Studio and get its properties by right-clicking the instance. There is a property called EnableClientPrinting that controls the functionality, you can set this has False for disabling client printing.

Once disabled, the Print icon will not be appeared.



Saturday, August 6, 2016

The shared dataset definition is not valid. Details: The required attribute 'Name' is missing

Are you experiencing this error when trying to deploy a shared dataset? 


This has been mentioned in several forums and considered as a bug. Searched for a permanent solution but could not find, therefore this is what you have to do if experience the same;

    1. Open the project folder and locate the Shared DataSet.


    2. Open the shared dataset using a Text Editor and add the Name Attribute as below.


    3. Now deploy and see. You should be able to do it.

Thursday, August 4, 2016

Generate aggregates for different levels in the same queries without using sub queries

While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did. 

It was something similar to below query;

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

SELECT YEAR(OrderDate) SalesYear
 , DATEPART(q, OrderDate) SalesQuarter
 , MONTH(OrderDate) SalesMonth
 , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) 
   AND  DATEPART(q, o2.OrderDate) =  DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount
FROM Sales.Orders o1
 INNER JOIN Sales.OrderLines ol1
  ON o1.OrderID = ol1.OrderID
GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
ORDER BY 1, 2, 3;

You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.


I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;


The quickest solution came to my mind was CTE, hence applied it to the query as below.

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

WITH YearlySales (SalesYear, YearlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate)
),
QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate)
),
MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
)
SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount
FROM YearlySales y
 INNER JOIN QuarterlySales q
  ON y.SalesYear = q.SalesYear
 INNER JOIN MonthlySales m
  ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter
ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;

And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.


A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.

Wednesday, August 3, 2016

Arithmetic overflow error converting expression to data type int - It works for larger number but not for a smaller number

One of my colleagues sent me a simple code that has two lines. The first line works without any issue but second throws an error. This is the code;

SELECT 9*10000000000000000
SELECT 9*1000000000

This is the output;


What is the issue, the first calculation is much larger to second one but it works. The reason is, how SQL Server evaluates the expression and decides the data types. If we consider the second line, the largest value in the expression is 1,000,000,000 which is lower to the highest value that int can hold. The maximum number int can hold is 2,147,483,647. Since both 9 and 1,000,000,000 fall into the range related int, it sets the data type of the return value has int as well. However, when 1,000,000,000 gets multiplied by 9, it exceeds the range. That is the reason for the error.

With the first line, 10,000,000,000,000,000 is considered as a bigint type number and the result becomes the same. Therefore it works without any issue.

If we write the query as below, it will work without any issue.

DECLARE @a bigint
SET @A = 10000000000000000
SELECT  9*@A

SET @A = 1000000000
SELECT  9*@A

Tuesday, August 2, 2016

SQL Server Variable Assignment - SET or SELECT?

Declaration variables and assigning values in different ways is something we see as a very common code in modules like stored procedures and functions. It is possible to assign a value at the declaration or a value can be assigned after the declaration either using SET or SELECT. A question raised on it, what would be the best and which gives better performance, when assigning the value using SET or SELECT. This is what I explained;

Let's see the differences one by one;
1. Only one variable can be set with a value with a single SET statement but with the SELECT statement, multiple variables can be set with values using a single SELECT statement.

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- New two SET statements for assigning values
SET @Variable01 = 100; SET @Variable02 = 200; 
SELECT @Variable01, @Variable02;

-- Assigning values using a single SELECT
SELECT @Variable01 = 300, @Variable02 = 300;
SELECT @Variable01, @Variable02;
GO


2. Both SET and SELECT support assigning values using a query. Just like the number (1), SELECT can be used for setting multiple variables.

USE WideWorldImporters;
GO

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- Assigning a value using a query, this works as expected
SET @Variable01 = (SELECT COUNT(*) FROM Sales.Customers); 
SELECT @Variable01;

-- Assigning values using a query, this works as expected
SELECT @Variable01 = COUNT(*), @Variable02 = AVG(CustomerID) FROM Sales.Customers

SELECT @Variable01, @Variable02;
GO


3. When assigning values using a query, if the query returns more than one record, SET returns an error whereas SELECT takes the first record and assign the value to the variable.

USE WideWorldImporters;
GO

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- Assigning a value using a query, this query returns more than one value
-- , hence SET throws an error
SET @Variable01 = (SELECT CustomerID FROM Sales.Customers); 
SELECT @Variable01;

-- Assigning a value using a query, this query returns more than one value
-- , but SELECT takes the first value without throwing an error
SELECT @Variable01 = CustomerID FROM Sales.Customers

SELECT @Variable01, @Variable02;
GO


4. When assigning values using a query, if the query returns no record, NULL will be set to the variable with SET statement but SELECT statement will keep the old value without changing.

USE WideWorldImporters;
GO

-- Declaring variables, assigning values
DECLARE @Variable01 int = 0;
DECLARE @Variable02 int = 0;

-- Assigning a value using a query
-- This query does not return any record, hence variable becomes NULL
SET @Variable01 = (SELECT CustomerID FROM Sales.Customers WHERE BuyingGroupID = 100 ); 

-- Assigning a value using a query
-- This query does not return any record, but initial value will not be replaced with a NULL
SELECT @Variable02 = CustomerID FROM Sales.Customers WHERE BuyingGroupID = 100;

SELECT @Variable01, @Variable02;

5. One more point, remember that SET is ANSI standard for assigning variables but SELECT is not.

Sunday, July 31, 2016

Dimension Loading: Source sends same record with different Source ID

An interesting question was raised during my lecture today and it was related to dimension loading. The question was; how can we identify and update the dimension record that is already exist if the key of the entity has been changed in the source.

Let me explain the question with more details. Assume that we have a dimension named DimCustomer and it is loaded with following two records;


And with the next loading, what if we get some new records (last three records - 150, 151, 152) and Id 151 is actually the old record which had Id of 100. This can happen with various business scenario, and, since the source id is new, we always consider it as a new record without checking.


This has to be checked, we need to make sure that SourceKey of the dimension record is changed with new SourceId (CustomerId) without inserting a new record, duplicating this customer: Jane Knight. How do we do it with our SSIS package?

Generally, if we cannot find changes in the source, we get all records from the source to ETL package and then check for new records using a Lookup configured for the data warehouse. With a scenario like this, what we have to do is, once new records are identified, use FuzzyLookup for checking and seeing whether records that are similar are exist with records found as "New". We can use exact matching (equi-join) without using Fuzzy Lookup but there can be slight changes which cannot be found using exact matches. If we find some matching rows, we need to update them without inserting.

Have a look data flow in this ETL package;


This package extracts Customers from the source using Get Customers from the source. I have used the query shown above for that. Then the Lookup: Lookup - Find new Customers is used for matching with existing records in DimCustomer and finding new customers. The Lokkup No Match Output gives us new records.


Next is the Fuzzy Lookup for finding whether records identified as "New" are already exist with old records. For this, we have to connect again with the data warehouse and look for similarities on Customer Name and Town. If exist, need to get the existing CustomerKey as CurrentCustomerKey. Not that the Similarity Threshold is set to 95%, not to 99%.


If we get a value for CurrentCustomerKey, and similarity is closer to 100%, we can consider the records as an old records though we get it as a new record from the source. The added Conditional Split checks it and allows us to create two flows for "real new" records and "old but new" records.


After that, we can have two destinations, one for inserting new records, and one for updating old records with new CustomerId. For update, as you see, Script Component is used as below;

xxx

When run the package, as you see, 4 records are coming from the source, three records are identified as new records, Fuzzly Look along with Conditional Split gives 2 records as new and 1 record as a new record but related to an old record. Therefore package inserts two new records and update one records. Here is the result now;






Saturday, July 30, 2016

SQL Server Brain Basher of the Week #049 - SSIS Package Store

Let's talk about Integration Services in this week for Brain Basher. SQL Server supports two deployment models named Package Deployment Model and Project Deployment Model. The Package Deployment Model was the initial model available with SQL Server and this allows us to deploy packages and manage individually. The Project Deployment Model was introduced with SQL Server 2012 and it allows us to deploy all packages as a single unit. This week question is based on old model; Package Deployment Model.

What is SSIS Package Store?

When deploying with Package Deployment Model, individual files can be deployed either to msdb database or to file system. There are multiple ways of deploying individual packages. One way is, importing packages using Import Package in SSMS Integration Services to import into SQL Server. When importing to SQL Server using Import Package, if you select the destination as File System instead of MSDB database, the package is saved in the SSIS Package Store.

The path of the SSIS Package Store is C:\Program Files\Microsoft SQL Server\{version}\DTS\Packages. Let's deploy a package to SQL Server / File System and see how it goes to SSIS Package Store.

Let's open SSMS and connect with Integration Services;


Once connected, right click on File System and select Import Package, you need to select the package for importing (or deploying).


Once imported, you can have a look on relevant installation folder, you should see the package deployed.


Should we use this or not? It depends, I prefer File System not SQL Server / File System or SQL Server / MSDB. Biggest issue with this is, no specific folders in this folder for instances if you have multiple instanced from the same version. You may encouter some issues with security as well. Therefore, for older way, File System is better and if possible, go for the latest way which Project Deployment Model.



Friday, July 29, 2016

Fact Loading: Switching a loaded temporary table into an empty partition of the fact table

Fact table in data warehouses are always partitioned and with most of solutions, it is partitioned on a date key. Generally, we use SSIS for loading dimensions and fact tables and ETLs written for loading the Fact Table can be optimized by loading data into another table and switching it to an empty partition of the fact table. This is one of the best practices in data warehousing data loading and technique for doing is not a complex task.

Assume you have a Fact Table that is partitioned by moths, example, one partition for 201601, another for 201602, and so on, and data is loaded till July 2016 (201607). Then when we load August data, we can load the data into a new table that has the same structure and then switch that table in to 201608 partition in the fact table for optimizing the data loading and minimizing the impact to the fact table. The below diagram shows it;


Here is the way of doing it. I will take the same Fact Table created with this post: SSIS - Fact Loading with Dimension Type II and Dimension Type I. You can use the same code for creating dimension tables and fact tables for trying out this. In that post, I have loaded data related to June and July 2016. Let's see how we can load August data using the technique discussed in this post.

1. Let's take this data set for loading August data.

SELECT CONVERT(date, '2016-08-01') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-12') As TransactionDate, 23 AS ProductId, 101 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-14') As TransactionDate, 23 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-22') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity

2. We need a different table with the same structure for loading processed data. The below code creates another table for holding data temporarily. Note that it has same columns and it is partitioned same as FactSales table.

CREATE TABLE FactSalesTemp
(
 DateKey int NOT NULL INDEX IX_FactSalesTemp CLUSTERED,
 ProductKey int NOT NULL,
 CustomerKey int NOT NULL,
 EmployeeKey int NOT NULL,
 SalesAmount money NOT NULL,
 SalesQuantity smallint NOT NULL
)
ON ps_SalesDate (DateKey)
GO

3. I will be using the same SSIS package used with my previous post. As you see, now the destination is set to newly created table which is FactSalesTemp.


4. Before loading data into new table, let's add another partition to both table for August data. All we need to do is, add a file group to the scheme and boundary value to the function;

-- Add another file group to the scheme 
ALTER PARTITION SCHEME ps_SalesDate
NEXT USED [PRIMARY]
GO

-- splitting the last partition by adding another boundary value
ALTER PARTITION FUNCTION pf_SalesDate ()
SPLIT RANGE (20160801)
GO

5. Now we can execute the SSIS package and load data into newly created table. Once the data set is loaded into FactSalesTemp, we can check both tables and how partitions are filled.

How partitions are filled;


How tables are filled;


6. As you see, data is loaded into newly created table and they are in partition 9. Now we need to switch the new table into FactSales 9th partition. Here is the way of doing it.

ALTER TABLE dbo.FactSalesTemp 
SWITCH PARTITION 9 TO dbo.FactSales PARTITION 9;

7. Now if you check the records in the table, you will see the FactSales is loaded with new data, and it is loaded to the correct partition.


You can have Altering scheme and function in the Control Flow as a Execute SQL Task before the Data Flow Task added. And, you can have Switching partition  with another Execute SQL Task just after the Data Flow Task for completing the SSIS package.

Thursday, July 28, 2016

SSIS - Fact Loading with Dimension Type II and Dimension Type I

The Dimensional Modeling in data warehousing, either using Star or Snowflake schema is still the most famous structure applied and widely used, though summarized tables using in-memory and columnstore concepts are slowly taking over the traditional method. It is very common to see that SSIS (or SQL Server Integration Services) is used for data loading, applying various patterns. Searches show different types of articles and posts but it is rare to find a complete post that talks about loading a Fact table using SSIS, looking up type I and II dimensions. That is the reason for this post, this talks about how to load a Fact Table, getting keys from both Type I and Type II dimensions.

Let's make a small data warehouse with Star Schema and populate some sample records. The following code creates a database called SalesDataWarehouse and then creates dimension tables named DimDate, DimCustomer, DimProduct, and DimEmployee. As you see, DimEmployee is a slowly changing dimension type II dimension.

CREATE DATABASE SalesDataWarehouse;
GO

-- Connect and create tables
USE SalesDataWarehouse;
GO

CREATE TABLE dbo.DimDate
(
 DateKey int  primary key
 , [Date] date not null
 , [Month] varchar(20) not null
 , MonthNumber smallint not null
 , QuarterName varchar(20) not null
 , QuarterNumber smallint not null
 , [Year] smallint not null
);

CREATE TABLE dbo.DimCustomer
(
 CustomerKey int identity(1,1) primary key
 , FullName varchar(100) not null
 , Town varchar(100) not null
 , SourceKey int not null
);

CREATE TABLE dbo.DimProduct
(
 ProductKey int identity(1,1) primary key
 , Product varchar(100) not null
 , Brand varchar(100) not null
 , SourceKey int not null
);

CREATE TABLE dbo.DimEmployee
(
 EmployeeKey int identity(1,1) primary key
 , Name varchar(100) not null
 , SalesTown varchar(100) not null
 , StartDate date not null
 , EndDate date  null
 , IsCurrent bit not null default (1)
 , SourceKey int not null
);

The next code is for loading some sample records. Note that DimDate is loaded with WideWorldImportersDW database as its date dimension table contains records for the current year. However, this can be loaded using a simple script as well.

Have a look on records related to DimEmployee. Since it is a SCD Type II dimension, it holds history records. As you see, there are two records to James Phil with validity period.

-- Inserting date records from AdventureWorks table
INSERT INTO dbo.DimDate
 (DateKey, [Date], [Month], MonthNumber, QuarterName, QuarterNumber, [Year])
SELECT CONVERT(int, CONVERT(varchar(8), [Date], 112)), [Date], [Month], [Calendar Month Number]
 , 'Quarter ' + CONVERT(varchar(2), DATEPART(Q, [Date])), DATEPART(Q, [Date]), [Calendar Year]
FROM WideWorldImportersDW.Dimension.Date;

-- Inserting records to Customer
INSERT INTO dbo.DimCustomer 
 (FullName, Town, SourceKey)
VALUES
 ('Jane Knight', ' Town 1', 100)
 , ('Jack walter', 'Town 2', 101);

-- Inserting records to Product
INSERT INTO dbo.DimProduct 
 (Product, Brand, SourceKey)
VALUES
 ('Product 1', ' Brand 1', 22)
 , ('Product 2', 'Brand 3', 23);

-- Inserting records to Customer
INSERT INTO dbo.DimEmployee
 (Name, SalesTown, StartDate, EndDate, IsCurrent, SourceKey)
VALUES
 ('Peter Kevin', 'Town 1', '2016-06-01', null, 1, 200)
 , ('James Phil', 'Town 2', '2016-06-01', '2016-06-30', 0, 201)
 , ('James Phil', 'Town 3', '2016-06-30', null, 1, 201);


Next code is for creating the Fact table. Generally, Fact Table is a partitioned table. Therefore, this code creates a partition function and scheme for it. Note that, the best practice is, have different File Groups for partitions but let's just use PRIMARY (I will be discussing partition switching on fact loading with another post, extending the same code).

-- creating the partition
CREATE PARTITION FUNCTION pf_SalesDate (int)
AS RANGE RIGHT
FOR VALUES (20160101, 20160201, 20160301, 20160401, 20160501, 20160601, 20160701)
GO

-- creating the scheme
CREATE PARTITION SCHEME ps_SalesDate
AS PARTITION pf_SalesDate
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY])
GO

CREATE TABLE FactSales
(
 DateKey int NOT NULL INDEX IX_FactSales CLUSTERED,
 ProductKey int NOT NULL,
 CustomerKey int NOT NULL,
 EmployeeKey int NOT NULL,
 SalesAmount money NOT NULL,
 SalesQuantity smallint NOT NULL
)
ON ps_SalesDate (DateKey)
GO

Let's insert some sample records into the Fact Table too.

INSERT INTO FactSales (DateKey, ProductKey, CustomerKey, EmployeeKey, SalesAmount, SalesQuantity)
VALUES
 (20160605, 1, 1, 1, 500, 2)
 , (20160615, 2, 2, 1, 500, 2)
 , (20160622, 2, 2, 2, 500, 2)
 , (20160624, 2, 1, 1, 500, 2)

Done, now we have to create a SSIS package for loading new records. As you noted, we have records related to June 2016, and we will be loading records related to July 2016. Let's use following recordset for loading the Fact Table.

SELECT CONVERT(date, '2016-07-01') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-07-12') As TransactionDate, 23 AS ProductId, 101 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-07-14') As TransactionDate, 23 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-07-22') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity


Let's create a SSIS package and add a connection to the data warehouse. I have a database named Sales and will be using it as the source even though we do not extract records from the source. As you see, I have a DataFlow Task in the Control Flow.


Since we do not get surrogate key values from the source, we need to have Lookups for dimension tables for getting surrogate key values. As you see with the image below, I have added OLEDB Source to connect with Sales and four of Lookups for getting surrogate key values related to Date, Customer, Employee and Product.


Let's have a look on source first. It is configured with Sales database and the query above is used for populating required records.


There are three Type I type dimensions. Let's have a look on one and see how it needs to be checked. In order to get DateKey from the DimDate dimension, we need to look up DimDate by using TransactionDate that comes with the source. We need to configure the lookup to DataWarehouse connection and select the lookup table as DimDate. Once the connection is set, we need to link TransactionDate coming from the source with Date in the dimension. We need to select DateKey as it is the one we need when we insert records to the Fact Table. The selected DateKey is added to the flow automatically.


The similar configuration is required for DimCustomer and DimProduct for getting surrogate key values. We need to link CustomerID of the source to SourceKey in the DimCustomer table and get the CustomerKey. We need to link ProductID of the source to SourceKey in the DimProduct table and get the ProductKey. However, getting EmployeeKey is different as it is as SCD Type II dimension. Let's have a look on the configuration of Employee Lookup.

I will take you through the steps required for configuring this. First we need to make sure that Cache mode is set to either Partial Cache or No Cache because we are going to parameterize the lookup query.


Next is setting up the connection. We need to connect it to DataWarehouse and use a query for loading dimension records. Note the way it has been written. I have used question marks for parameterizing the query as we need to pick the right record related to the TransactionDate.


Just like the other dimensions, we need to match two columns from the source and dimension to find the record. However, additional link is required for getting values for parameters, hence need to link TransactionDate with StartDate though that is not the exact link we need.


You will see the reason for joining TransactionDate and StartDate with next step. We need to adjust the lookup query now, need to do it with Advanced section. It is required to adjust the query like below.


Once adjusted, need to set parameters like below. It allows us to select TransactionDate as a parameter value because the above join. Note that parameters a 0-based indexed.


All done, finally it needs the destination. Set OLEDB Destination for loading the FactSales as the last step the of the DataFlow task. Once all configurations are done, you can run the package and see how the FactSales has been loaded.




Wednesday, July 27, 2016

What is Table Value Constructor?

I am sure that many have used this without knowing the term. However it is always better to know the term before applying because that makes sure the implementation is done with right way.

The Table Value Constructor allows us to specify set of row value expressions to into a table. This makes it possible to have set of row value expressions in a single DML statement and use it as a form of a table. Yes, this is what we used with INSERT statement when we need to insert multiple records into the table using a single INSERT statement.

The Table Value Constructor can be used with INSERT statement, MERGE statement (with USING clause) and in the definition of a derived table. Here is an example;

CREATE TABLE dbo.Customer
(
 CustomerID int identity(1,1) primary key
 , CustomerCode char(5) not null
 , Name varchar(200) not null
 , DateOfBirth date not null
);
GO

-- Inserting multiple rows with
-- Table Value Constructor
INSERT INTO dbo.Customer
 (CustomerCode, Name, DateOfBirth)
VALUES
 ('CUS01', 'Jane', '1998-01-04')
 , ('CUS02', 'Knight', '2000-05-23')
 , ('CUS03', 'Sam', '1997-2-11')

-- Using Table Value Constructor
-- with USING clause
MERGE INTO dbo.Customer t
USING (VALUES
  ('CUS01', 'Jane', '1998-02-04')
  , ('CUS04', 'Knight', '2000-05-23')) 
  AS s (CustomerCode, Name, DateOfBirth)
 ON t.CustomerCode = s.CustomerCode
WHEN MATCHED THEN
 UPDATE SET Name = s.Name, DateOfBirth = s.DateOfBirth
WHEN NOT MATCHED BY TARGET THEN
 INSERT (CustomerCode, Name, DateOfBirth)
  VALUES (s.CustomerCode, s.Name, s.DateOfBirth);

-- Using Table Value Constructor as a derived table.
SELECT Customer.CustomerCode, Customer.Name, CustomerCountry.CustomerCode
FROM dbo.Customer 
 INNER JOIN (VALUES ('CUS01', 'USA'), ('CUS04', 'LK')) AS CustomerCountry (CustomerCode, CountryCode)
  ON Customer.CustomerCode = CustomerCountry.CustomerCode

Tuesday, July 26, 2016

Result shows asterisk when converting int to varchar

I was sent a code by one engineer saying they he gets an odd error with one of calculations written. He gets the following error;

Conversion failed when converting the varchar value '* ' to data type int.

By looking at the error, I realized the issue, it is related to truncation. He converts a numeric value to a varchar variable and then again he uses the converted varchar value with another calculation, converting to back to int. This calculation causes the error because previously converted value contains asterisk, not the numeric value that was used with initial conversion.

How can it be happened? This happens when you convert character or binary expressions to an expression of a different data type. If the the result cannot be accommadate with the set data type, SQL Server either truncates the value, partially displays or throw an error.

See this code;

DECLARE @x int = 1000;
DECLARE @y char(2);

SET @y = @x;
SELECT @y;

When the value 1000 is converted to char(2), the result becomes asterisk because the set data type cannot hold 1000. The asterisk indicates that value is truncated.


Here are some known results when converting to an expression of a different data type.


Sunday, July 24, 2016

SQL Server Brain Basher of the Week #048 - Dropping Clustered Index

SQL Server mainly maintains two structures for tables: Heap and Clustered. If there is no clustered index created with the table, table is with Heap Structure else it will be with Clustered structure.

Once the Clustered Index is created, SQL Server organizes data in a form of B-Tree and data pages related to the table will be held with the last level which is called as Leaf Level. Since the Clustered Index uses data pages, it is not possible to have another Clustered Index in the same table. Now the question is;

What will happen when the Clustered Index is dropped?

Many think that it is not possible because it holds data pages related to the table. But it is possible. If you drop the index, it goes back to the Heap and table will be maintained in the Heap. Here is a code that shows it;

-- creating a test table
CREATE TABLE dbo.TestTable
(
 TestTableId int index ix_TestTable_Id clustered
 , TestTableValue1 varchar(100) not null
 , TestTableValue2 char(5) not null index ix_TestTable_Value2 nonclustered
);
GO

-- Inserting test records
INSERT INTO dbo.TestTable 
VALUES 
(1, 'Hello world', 'ABC01')
, (2, 'Hi', 'ABC02')

-- Checks indexes availeble
SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.TestTable'), NULL, NULL, 'DETAILED');


-- Dropping the clustered index
DROP INDEX ix_TestTable_Id ON dbo.TestTable;

-- Checking the indexes and data after dropping 
SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.TestTable'), NULL, NULL, 'DETAILED');
SELECT * FROM dbo.TestTable;


Saturday, July 23, 2016

Install SQL Server with just three clicks - The SQL Server Basic Installer

We have been installing SQL Server for long time, different versions and different editions. Everyone is familiar with installation wizard and knows that it needs to go through various steps, windows and multiple clicks. If you need an installation to be done for addressing many database requirements, then of course you need to go through almost all installation wizard pages and customize as per the requirements. But if it is for a simple database and we do not need much features installed, then it could be seen as too-much-of-clicks and too-much-of-pages.

What if we can install SQL Server with just three clicks?

Yes, now it is possible. This is called as SQL Server Basic Installer that allows you to get the database engine and click connections SDK installed without seeing much screens. This currently available for following editions;

Let's test this one of the editions. Click on the link give above for Express Edition and get the SQLServer2016-SSEI-Expr.exe file downloaded. Start installation it, this is the first screen you see with it;


You have three options for continuing. Since we need a simple installation, let's select the first option: Basic. This is the first click.

You cannot skip accepting the license terms, you get the usual screen and click on Accept. That is your second click.


Next screen is for specifying the installation location. You do not need to do anything because the default location is selected. Here is your third click, click on Install to continue.


This starts the installation, different screens appear;
Downloading....


Installing...


And finally the screen for showing and installation is completed. During the download, you will see various information on samples, forum, etc, that will be helpful for learning more on SQL Server.


This screens educates you on Connection String used for connecting with the instance, Instance ID, default administrator, installed features etc. It allows you to launch SQL Command for connecting and testing the installation. In addition to that, if you want to get SQL Server Management Studio, you can click on Install Tools for getting it.

As you noticed, it is just three clicks, try and see.