Wednesday, October 28, 2015

Loading data from Microsoft SQL Server to DocumentDB in Azure

WIth my previous post (How to create a DocumentDB in Azure), I mentioned that there are multiple ways of loading documents into DocumentDB in Azure. This is the continuation of it, let's see how we can load data from SQL Server into DocumentDB using Data Migration Tool. Data Migration Tool is a free open-source tool that can be used for loading data into DocumentDB from many different sources. It converts loaded data into JSON hierarchical documents and supports loading data from Microsoft SQL Server, MongoDB, JSON, CSV, Azure Table Storage and many more.

The Data Migration Tool can be downloaded from Microsoft Download Center. However there is an easy way, link for the page is available with DocumentDB account blade;

Go to the page and download the compressed file. Once files are extracted, you will see two executable files; dt.exe and dtui.exe.

Click the dtui.exe for opening the tool. As usual, welcome screen appears, and next screen for selecting the source. Since we try to load data from SQL Server, select SQL under Import from. You need to manually add the connection string (you can get it generated by creating a UDL file), make sure it is added without the provider.

Let's try to load data from AdventureWorksDW2014 database. This is the query for loading customers with locations.

USE AdventureWorksDW2014;

 c.CustomerKey AS [key]
 , c.FirstName + ' ' + c.LastName AS [Name]
 , c.AddressLine1 AS [Location.Street]
 , g.City AS [Location.City]
 , g.StateProvinceName AS [Location.Province]

FROM dbo.DimCustomer c
 INNER JOIN dbo.DimGeography g
  ON c.GeographyKey = g.GeographyKey;

Examine the query. Note the way how aliases have been set, specifically columns related to the address. If you  need a hierarchy inside the document, this is the way of instructing the tool for making them. As per this query, a document will contain properties like key, name, and location, and location will have three more properties; Street, City and Province.

Once the query is placed, write dot (".") for Nesting Separator for instructing the tool that dot is used for separating nested properties.

Export to
Next screen for setting the target. There are multiple options for Export to, since this loads only 18484 documents, we can use DocumentDB - Bulk Import rather selecting other items such as DocumentDB - Sequential record import and JSON file. This creates a stored procedure in the collection and uses for loading documents. It is recommended to use second option which is DocumentDB - Sequential record import for large set of documents because it is faster than bulk load. The third option JSON file does not load documents to DocumentDB but creates JSON files.

Connection string
Setting the connection string is easy. All we have to do is, get the copy of primary connection string from keys in documentdb account blade and add the database name to it. Note that database name has to be manually added.

We can load documents to an existing collection or create a new one. If the one we added with the screen is not exist, tool will create a one for us with the selected performance tier.

Id field
If you leave the Id input box blank, it will create one for us, else we can instruct to use one of our columns for that. As you see, I instruct to use Key column for Id of the documents.

Indexing policy
We can add a policy that explains how we search documents and what type of index we prefer. We do not need to write it but can select the required index by right-clicking on it for getting the menu and selecting one. There are three options;
  1. Default - Efficient indexing only for numeric supporting equality and range check and sorting. String searches cause scan and no support on sorting.
  2. Hash - Support only scanning for both numeric and string, and no support on sorting.
  3. Range - Efficient indexing for both numeric and string including sorting but may use extra space and resources.

Now it is ready for loading, go through the summary and complete it.

Documents are loaded, we can check in the collection whether documents are loaded as we instructed.

Tuesday, October 27, 2015

How to create a DocumentDB in Azure

Microsoft implementation of NoSQL database, documentDB is a cloud based, fully managed, massively scalable service for managing semi-structured data in Azure. Just like other NoSQL products, it supports schema-free JSON documents and unlike other NoSQL platforms, supports indexing automatically on all properties in the document as soon as it is added. Properties in added documents are easily query-able using SQL (Not exactly SQL for relational databases) and any property in the hierarchy is searchable.

This post speaks about creating documentDB in Azure with simple set of steps. If you are not familiar with NoSQL, read this post: What is NoSQL Document Database? What is Azure DocumentDB?

Let's create a documentDB. Login to Azure with new portal ( using your Azure account. Then Browse for DocumentDB Accounts and click on it for opening the blade related to DocumentDB creation.

Once it is opened, firth thing you need to do is, creating an account. Click on Add and enter a name that is globally unique for DocumentDB Account.

DocumentDB Account is created and it is available as This still does not have a database (see image), for creating a database under the account created, click on Add Database button, and give a name for your database. As you see, I have named it as db01.

Once the database is created, you should see it in the Databases grid in the DocumentDB Account blade. Every component in documentDBs has a ResourceID and every documentDB needs at least one Collection.

What is a Collection?
Collection is 10GB of storage that can be created inside the database. It is not only for handling storage but it determines how much you pay for Microsoft. DocumentDB pricing is based on Collections and price is getting changed based on the Pricing Tier assigned to the collection. It is hourly-basis billing and scaling up and down is done by adding or removing collection assigned.

You can get the database blade opened by clicking the name of the database created and add Collection by clicking the Add Collection button. Add Collection blade allows you to enter the name of the collection, select the Pricing Tier, and select the Indexing Policy.

What is Request Units per Second?
As you see, Collection can be assigned to one Pricing Tier out of three; S1, S2, S3. Storage offered with each tier is same but RU different. RU, Request Units per Second simply means how many requests can be performed against the Collection per second. It can be reading requests, can be writing requests but the counting is not exactly the number of requests made but the throughput for the request considering usage of the resources such as CPU, memory and IO operations. As I read, generally a read-request lesser than 1KB requires one RU but a write-request with a same size of document requires more than RU as writing cost is higher than reading cost. A complex request that needs to update documents with many indexes may need more RUs. How do you determine the required RU? It is possible by looking at dashboards given with the blade, it shows all requests made and the RUs used for requests.

Here is an image explains model:

Image was taken from DocumentDB hierarchical resource model and concepts. Read it for more info.

You can start with the tier as you want for the Collection. Once the tier is selected, Indexing Policy has to be selected, it has two policies; Default and Range. The Default is good for properties with numbers and Range is best for both string and numbers but the storage cost is high. For this, let's create a Collection with S1 and Range policy.

Now we have a database with a collection. All Collections are appeared in the database blade with their ResourceID and can be opened by clicking on them. There are many ways of connecting with the database, adding documents and querying documents;

  1. Using the portal itself - Manually add one by one or upload upto 100 documents at a time.
  2. Using DocumentDB migration tool
  3. Programmatically using REST API (difficult and complex), SDK (like .NET, flexible and can do almost all)
  4. Using Stored Procedures and Triggers added in the collection.
Let's use the given interface for adding and querying for testing. Click on the Create Document button in the Collection blade (in my case it is Customer). This opens another blade that can be used for writing a document. As you see, it adds Id automatically which is required. If you do not add an Id to your document, it will create one for the document. Here is my document;

Once it is added, it can be viewed using Document Explorer in database blase as below.

Let's see how this can be done using other ways with next post related to DocumentDB.

Thursday, October 15, 2015

What is NoSQL Document Database? What is Azure DocumentDB?

NoSQL databases are not something recently introduced, it was another type of model that was discussed with relational database management system (introduced in 1970s) whenever disadvantages of relational database management concepts were heavily hitting against traditional database management systems. It is as old as other database concepts like file-based models and relational models and was exist since late 1960s.

While the relational database model is considered as the Second-Generation database model, NoSQL database model is considered as Third-Generation database model. Initially, NoSQL model was introduced for addressing Object-Oriented concepts within the databases calling it as Object-Oriented Database. This was not required with many solutions at the beginning, used by few applications but it started seeing as another type of required backend product with modern projects related to big data, real time applications, and databases that require heavy scaling out implementations.

What it exactly does?
There are different types of NoSQL databases; Column databases (Cassandra), Document databases (MongoDB, DocuemntDB), Key-value databases (Oracle NoSQL database), Graph databases (InfiniteGraph). NoSQL does not mean No Structured Query Language, it means no relational structure, hence No-Relational would be the best name for it. The type we talk about with this post is, Document databases, and it does not mean it stores documents, it stores objects, hence object database would be the best name for it.

Document databases store objects, with no predefined, static schema. It is flexible and designed for documents with semi-structures, typically schema-free JSON. One key advantage with NoSQL is, scaling out. Traditionally, relational database management systems support scaling up by adding more resources to the box, but limited with various factors like OS, Mother-board used, etc. Scaling out is also possible with relational databases but heavily limited. This is completely addressed with the NoSQL database. It is designed for scaling out and boundary for the limitation is availability of resources, nothing else.

How it stores data?
If you understand how it maintains data, then you understand NoSQL document database. We all are familiar with relational structure, we maintain relations (with tables),  entities (with rows), and attributes (with columns) with our relational databases. NoSQL database maintains them differently; row as a document and column as a property.

Here is an example. Following image shows how a table is manged for Customers in a relational database.

And this is how they have be managed in NoSQL database.

As you see, though all three documents have common properties, addresses of customers are maintained different ways, this is because NoSQL is schema-free.

How Microsoft supports on it?
Microsoft implementation of Document database is called as Azure DocumentDB. If you have an Azure Account, you can create a DocumentDB and start using it for your NoSQL requirements. Since it is completely could-based, you do not need to worry on infrastructure required for it, setting it up and mataintaining it. Let's see the way of creating and using it with next NoSQL post.

Tuesday, October 13, 2015

Reporting Services report does not show all lables in X-Axis

Assume that you have a dataset and you have configured a clustered bar chart with it using Reporting Services. Based on number of elements configured with X-Axis, it might not show all labels for each bar, just like below;

If you are experiencing this, remedy for this is simple, get Horizontal Axis Properties and change Interval to 1.

Issues is fixed.

Monday, October 12, 2015

SQL Server Brain Basher of the Week #033 - Cartesian Product

SQL statements produce various types of results and forming the result is based on the way the query has been written. This week brain basher is based on types of results that SQL statement produces;

What is a Cartesian Product and how it can be produced?

There are different types of JOINs that can be used for combining tables such as INNER JOIN, LEFT OUTER JOIN, FULL OUTER JOIN. In addition to these well known JOINs, SQL Server facilitates CROSS JOIN that has only one logical query processing phase. This requires two tables as inputs for the JOIN and produces a Cartesian Product of the two tables.

Cartesian Product (referred as Cross Product and Cross Join as well) is an operator that works on two sets. This combines two records (tuples); records (tuples) from one table (relation) will all the records (tuples) from other table (relation).

There are two ways of writing CROSS JOIN with SQL Server; ANSI SQL-92 and ANSI SQL-89. There are no performance differences between these two, both works fine but ANSI SQL-92 is the most common one.

USE AdventureWorks2014;

/* Producing a resultset for
product catgories for each year

-- ANSI SQL-92
 DISTINCT YEAR(OrderDate) Year
 , Name Category
FROM Sales.SalesOrderHeader
CROSS JOIN Production.ProductCategory
ORDER BY 1, 2;

-- ANSI SQL-89
 DISTINCT YEAR(OrderDate) Year
 , Name Category
FROM Sales.SalesOrderHeader, Production.ProductCategory
ORDER BY 1, 2;

Sunday, October 11, 2015

SQL Server 2016 CTP 2.4 is available for downloading

Microsoft has released SQL Server 2016 Community Technology Preview 2.4 and we can try this in our test environment. Many new great features have been added with this release such as;

  • Operational analytic and in-memory data warehouse improvements
  • Tempdb scalability improvements
  • Reporting Services improvements
And benifits we get from (quoted from the site);

  • Enhanced in-memory performance provide up to 30x faster transactions, more than 100x faster queries than disk based relational databases and real-time operational analytics
  • New Always Encrypted technology helps protect your data at rest and in motion, on-premises and in the cloud, with master keys sitting with the application, without application changes
  • Built-in advanced analytics– provide the scalability and performance benefits of building and running your advanced analytics algorithms directly in the core SQL Server transactional database
  • Business insights through rich visualizations on mobile devices with native apps for Windows, iOS and Android
  • Simplify management of relational and non-relational data with ability to query both through standard T-SQL using PolyBase technology
  • Stretch Database technology keeps more of your customer’s historical data at your fingertips by transparently stretching your warm and cold OLTP data to Microsoft Azure in a secure manner  without application changes
  • Faster hybrid backups, high availability and disaster recovery scenarios to backup and restore your on-premises databases to Microsoft Azure and place your SQL Server AlwaysOn secondaries in Azure

Saturday, October 10, 2015

History of Database Management Sysytems

No doubt, we all are familiar with relational database management systems, structure and components of it. But do you know how this was started? What was the history of Database Management Systems? Here is a brief on it, a post that shows some of my slides;

Friday, October 9, 2015

Types of Database End Users

There are distinct types of people who engage with database management system and End Users are well known. However, when referring End Users, does it represent all people work with DBMS or just business users?

Theoretically, there are four types of people involve with databases and database management systems; Database Administrators, Database Designers, Application Developers and End Users. This means, we are not part of End Users group and they work with databases differently.

image taken from:

There are mainly two types of End Users;
  • Naïve users
    These users have no clue on database management systems and do not know how to access database or how to make requests from database directly. Naïve users typically access the database through given GUIs that has readable menu items for opening windows forms or web pages to interact with data. Relevant example for this is, bank executive opens the interface given for entering customer information.
  • Sophisticated users
    This group represents people who know about the structure defined in the database for some extent. They have skills for accessing the database directly (or through another interface that requires database and SQL knowledge to use) and they make direct requests to the database for getting required data down. Most smart business users and data stewards are fallen into this group. When considering modern database implementations, sophisticated users do not access the core database unless there are specific objects (like Views) created specifically for them, but they access via configured databases such as reporting databases, relational and OLAP data warehouses or data models such as Microsoft Multi-dimensional models, tabular models and models designed with Excel+Power Pivot.

Monday, October 5, 2015

SQL Server Brain Basher of the Week #032 - TSQL - Converting from one value to another

With T-SQL queries, converting data between data types is very common. Conversion can happen automatically (implicitly) or we can take the control and convert as we want (explicitly). This week Brain Basher is based on it, here it is;

What are the functions available with SQL Server 2014 for converting data between data types?

SQL Server offers multiple functions for converting data between types. Some are based on ANSI-Standard and some are SQL Server specific with additional functionalities. Here are four functions available for converting data.
  1. CAST
    CAST is an ANSI-Standard function, hence recommended over SQL Server specific functions. It is a scalar function and returns an error if data types are incompatible.
    CONVERT is proprietary to SQL Server and provides additional functionalities over CAST. This function accepts optional style number for formatting returned value. This is also a scalar function.
  3. PARSE
    This is special function that converts a string to date, time, and number type. The optional parameter which accepts the culture converts the value based on it. If it is not submitted, culture setting of the current user will be used.
    These two functions work just like PARSE and CONVERT. However, instead of raising run-time error for incompatible types, failed conversion return NULL.
Here are some sample codes;

USE AdventureWorks2014;

-- CAST function, converting datetime to date
SELECT SalesOrderId, CAST(OrderDate AS date) OrderDate
FROM Sales.SalesOrderHeader;

-- CONVERT function
SELECT SalesOrderId, CONVERT(char(10), OrderDate, 101) OrderDate
FROM Sales.SalesOrderHeader;

-- PARSE function
SELECT PARSE('01/01/2015' AS datetime2 USING 'en-US') ;

-- TRY_PARSE and TRY_CONVERT function
SELECT PARSE('13/01/2015' AS datetime2 USING 'en-US') ; -- this throws an error
SELECT TRY_PARSE('13/01/2015' AS datetime2 USING 'en-US') ;

Sunday, October 4, 2015

SQL Server Error Severity Levels

Have you ever read the Severity Level indicated with an error occurred? Or just tried to find out the problematic code exist in your code for fixing it? It is always better to understand what error message says and its severity level because some errors cannot be fixed by us though it looks as fixable.

Severity level indicates the seriousness of the error, whether it can be ignored or not, whether it can be fixed by user level. Therefore we need to understand what it describes.

Range of the level  Description
0 to 9 Informational message, not an error. Warning for NULL usage in Average is a good example.
10 Informational message, it indicates non-severe error too.
11 to 16 Errors that can be fixed by us, such as syntax issues, deadlocks, permission related issues.
17 to 19 Serious software related error that user cannot correct. Out of memory, space are some of them.
20 to 24 Very serious errors related to either hardware or SQL Server itself. Generally, errors starting with level 19 gets logged in the Application log.
25 Same as above but this terminates SQL Server service.

Here are some examples;

USE AdventureWorks2014;

-- Warning messages 0 - 10
SELECT COUNT(Color) FROM Production.Product;

-- Errors form 11 - 16
SELECT * FRM Production.Product;

-- Throwing an error with higher severity level
-- Note that 60005 is a user-defined message
RAISERROR (60005, 25, 1) WITH LOG;

Saturday, October 3, 2015

Getting an email notification when a deadlock is occured

If you are administering a heavy transactional database and often facing an issue with incomplete transactions, deadlocks might be the issue and it needs to be detected. There are few ways of checking whether a deadlock is occurring nor not, the most common one is the Profiler. However, it is bit difficult to keep on running the Profiler (or even Traces) against a production database and monitoring it, hence tracking it when it occurs and notifying is the best.

The easiest the method to track it is, using an Alert. In addition to that, it needs Database Mail configured, an Operator, and a change in sys.messages table.
  • Database mail - An implementation of SMTP to enable database component to send email.
  • Operator -A person or group who can receive notification based on a job or an alert raised.
  • Alert - is a predefined response to an event.
Let's configure all and test this. Database mail can be set creating a profile and adding a SMTP account. It can be simply configured using Database Mail under Management in Object Explorer. Note that, for this example, I am using gmail as my SMTP server.

After configuring database mail, configure an operator accessing Operators under SQL Server Agent.

Before configuring an Alert, we need to do a modification to record in a sys.messages table. This table maintains all messages related to errors and it has a column called is_event_logged that describes whether the error related is getting written to Application Log or not. SQL Server raises Alerts only when it is notified by Application Log with a message and matched with a configured Alert. Therefore we need to make sure that error needs to be tracked is logged. By default, deadlock error message, which is 1205 is not getting logged, hence it needs to be changed.

USE master;

-- checking the record
SELECT * FROM sys.messages WHERE message_id = 1205 and language_id = 1033;

-- setting is_event_logged to 1
EXEC dbo.sp_altermessage 1205, 'WITH_LOG', 'true';

-- checking the updated record
SELECT * FROM sys.messages WHERE message_id = 1205 and language_id = 1033;

Now let's create an Alert and sets the Response to the Operator created.

Everything required is done now. Whenever a deadlock occurs, it will be written the Application Log and it noifies Agent and then Agent will fire the Alert.