Sunday, October 12, 2014

Automated notifications on SQL Server errors

Can SQL Server send a notification to me when a deadlock is occurred? Can SQL Server alert me when a database starts suffering with resource issues? I am sure that you have already gone through these type of requirements and I am sure that you have already implemented a mechanism for getting alerts. However, how it has been implemented? Using set of codes? Using Agent Jobs? You may have used a different approach but that is what I have seen with many, nothing wrong with it unless you need more details, run through some logic and have a great flexibility. If all you want is, just to get an "Alert", why do not you use "SQL Server Alerts"?

SQL Server Alerts
SQL Server Alert is a predefined response to an event. You can create an alert configuring it as a "SQL Server event alert", "SQL Server performance condition alert", or "WMI event alert". In order to create an alert for errors, "SQL Server event alert" should be selected. It allows us to create a response to individual error or to all errors of a specific severity level. For example, if you want to get an alert on file growth issues, you can create the Alert for error number 1105 which come under 17 severity level. If you want to get this alert not only for 1105 but for all errors that come under 17, create the alert for severity 17. Here is the way of creating an alert for 1105;
























Once the alert is created, an action has to be added for responding to the event. There are two types of actions that can be configured; Execute a Job, Notify Operators. Since the post is speaking about notifications, let's see how to configure the action for notifying a person.

SQL Server Operators
The easiest way of notifying a person is, setting it up via an operator. The SQL Server Agent Operator is a person or group that can receive notifications from an alert or job. Let's create an operator.























As you see, "Database Administrator" operator has been configured with "Email name" notification option. Note that, in order to send emails by SQL Server, Database Mail has to be configured. Let's go back to the Alert and configure the "Response".



























We need to do one more configuration for receiving alerts. You need to enable "Enable mail profile" of "Alert System" that comes under SQL Server Agent properties.




























All okay now. Let's test and see whether administrator gets the error via an email. Below code creates a database with 4mb size data file and disables "File Growth" of it. Next it inserts bunch of records until the data file gets fully filled. Once the data file is full, SQL Server throws an error with 1105.

-- Create new DB called 'AlertTestDB"
USE master;
GO
CREATE DATABASE [AlertTestDB]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'AlertTestDB'
 , FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\AlertTestDB.mdf' 
 , SIZE = 4096KB 
 , MAXSIZE = UNLIMITED
 , FILEGROWTH = 0)
 LOG ON 
( NAME = N'AlertTestDB_log'
 , FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\AlertTestDB_log.ldf' 
 , SIZE = 1024KB 
 , MAXSIZE = 2048GB 
 , FILEGROWTH = 10%)
GO

-- Connnect and create a table
USE AlertTestDB;
GO

CREATE TABLE TestTable
(
 Id int identity(1,1) PRIMARY KEY
 , Column1 char(2000)
);
GO

-- Insert record until Sever throws an error
SET NOCOUNT ON;

WHILE 1=1
BEGIN

 INSERT INTO TestTable (Column1) VALUES ('test data');
END;
GO

Once the code is executed, following error will be thrown













and administrator automatically receives an email like this.

















Monday, October 6, 2014

MDX: Get last 24 hours periods

Date and Time Dimensions Most of analysis and reporting requirements include a need to aggregate values over time period and this leads us to have date and time dimension tables in our data warehouses. Some designers combine attributes related date and time into one dimension table but most recommended implementation is having two tables; one for date and another for time. Most importantly, no foreign key relationship exist between these two tables.

image

Surrogate Key and Granularity
Surrogate key is very common with data warehouse dimension tables (See why we need surrogate keys). However, specifically for Date dimension table, rather than the general integer value with no semantic meaning, it is always better to have a numeric which is concatenation of each date part. For example, the key of 7th of July, 2006 is 20060707. Time dimension can use general integer values for keys as other usual dimensions with one exception which is the starting value. Make sure that the starting value of the key is 0 for time 00:00:00. Here are sample data for both tables;

image

Last 24-hours periods
One of the common reporting requirements related to date and time is, getting values for last 24 hours period.  Although there are many ways of getting the required result, having the above structure for both Date and Time Dimensions definitely offers most easiest ways. Here is one way;
Assume that today is 7th of July 2006 and time is 09:34:45. If I need “Sales Amount” for last 24 hours, all I have to do is, add a slicer that represents last 24 hours.

image

As you see, the slicer is formed with two tuples and each has two coordinates. First tuple represents today’s date for time periods starting with 00:00:00 to 09:34:45. The second tuple represents yesterday with time periods starting with 09:34:46 to 23:59:59. See the result;

image

Now all we have to do is, build the slicer dynamically. Here is the way. Note that I have used Now function for getting current date and time, Cstr function for converting string values to numeric and StrToMember for converting a string to MDX member.

SELECT  {[Measures].[Sales Amount] } ON 0
 , NON EMPTY {([Order Date].[Year - Quarter - Month - Date].[Date], [Time].[Time].[Time]) }  ON 1
FROM 
 (SELECT {(StrToMember( "[Order Date].[Date].&[" + Format(Now(), 'yyyyMMdd')   + "]") 
   , [Time].[Time].&[0]:StrToMember( "[Time].[Time].&[" 
     + Cstr((Format(Now(), 'HH') * 60 * 60 ) + (Format(Now(), 'mm') * 60 ) + (Format(Now(), 'ss') ))
     + "]"))
  , (StrToMember( "[Order Date].[Date].&[" + Format(Now() -1, 'yyyyMMdd')   + "]") 
   ,StrToMember( "[Time].[Time].&["
    + Cstr((Format(Now(), 'HH') * 60 * 60 ) + (Format(Now(), 'mm') * 60 ) + (Format(Now(), 'ss') ) + 1)
    + "]"):[Time].[Time].&[86399])} ON 0
  FROM [Sales]);

Monday, September 1, 2014

Cumulative Update 1 for SQL Server 2012 SP2

Microsoft has released Cumulative Update 1 for SQL Server 2012 Service Pack 2 and it is available for downloading.

Here is the link for downloading: http://support.microsoft.com/hotfix/KBHotfix.aspx?kbnum=2976982&kbln=en-us

Here is the link for the list of bugs fixed:http://support.microsoft.com/kb/2976982

Read this for understanding versions and service packs released for all SQL Server versions:http://dinesql.blogspot.com/2014/01/versions-and-service-packs-of-sql.html

SQL Server 2014 Sample Databases and applications

Are you looking for SQL Server 2014 AdventureWorks databases and applications? It is now available and here is the link:

https://msftdbprodsamples.codeplex.com/releases/view/125550

Tuesday, June 24, 2014

What if SQL Server Service is paused?

If you have gone through Windows services, specifically SQL Server services, either through SQL Server Configurations manager or Windows Services, you might have noticed that some services can be paused in addition to stopped or restarted. What if the service is paused?

If you pause the SQL Server service, there will not be any issue for existing connection but new connections will not be allowed. Users who have been working with databases can continue until they disconnect the connection. This is generally useful when you need to shutdown the service without disturbing existing users’ activities. Since this does not allow new connection, no user can establish a new connection and once all existing connections are disconnected, server can be safely shutdown.

Here is an example;

User makes a connection and runs a query:
image

Administrator pauses the service:
image

User still runs queries:
image

New user cannot establish the connection:
image

Monday, June 23, 2014

Service Pack 2 is available for SQL Server 2012

Microsoft has released SQL Server 2012 Service Pack 2 and it is available for downloading.

Here is the link for downloading: http://www.microsoft.com/en-us/download/details.aspx?id=43340

Here is the link for the list of bugs fixed: http://support.microsoft.com/KB/2958429

Read this for understanding versions and service packs released for all SQL Server versions: http://dinesql.blogspot.com/2014/01/versions-and-service-packs-of-sql.html

Tuesday, June 17, 2014

What are the advantages of a SQL Server database with multiple files located on one physical drive ?

This is a common question I always get from my classes/workshops. Do we really get some benefits out of a database with multiple data files on one physical drive? Obviously, no performance benefits, but in certain situation, it gives you some benefits;

  1. Taking partial backups, file backups, file group backups
    Although the files have not been distributed on multiple physical drives, still partial backups, files backups and file-groups backups are possible. Assume that your database is very large and some tables are infrequently updated, then you can separate tables into two file groups, using multiple data files, one group for tables that are frequently updated and other for tables that are infrequently updated (or read-only tables). This allows you to have a time-saving and efficient backup strategy, setting different backup plans on two file groups.
  2. Restoration into smaller multiple physical drives
    Assume that you maintain a database with single 100GB size data file. What if the exiting drive crashes and you get two 50GB physical drives for restoring the database from one of the backups? As you know, it is not possible, but, if the database is maintained with multiple data files even with a single physical drive, scenario like this can be easily addressed.
  3. Moving files into another drive
    For various reasons, either the entire database or part of the database is required to be moved to a different physical drives. Some of the reasons could be, moving from test environment to production environment, moving some file-groups to optimized physical drives, moving entire database to a different location. If the database is constructed with multiple files, moving files for above scenario can be easily done with less effort.

MDX: Getting name of the user connected

Are you looking for a way of adding the name of the account used for the session to the resultset? Here is the way of adding it;

image

Thursday, June 12, 2014

What is “sysname” data type used for object names?

You might have noticed that the data type of object names such as database name is “sysname” which is not listed in data type list when creating a table via table designer. This data type can be clearly seen when creating objects using templates (CREATE DATABASE template). What is this sysname data type? Can we use it? Should we use it?

Sysname is a built-in system data type defined as nvarchar(128). Initially, with older versions (up to 6.5 if I am not mistaken, TBV) it was equivalent to varchar(30). It is a non-nullable data type and mainly used to store object names. This data type is not listed in data-type list which comes when creating tables via table designer. However, it can be used for defining columns for tables when they are created using T-SQL. Have a look on following code;

-- using sysname for variables
DECLARE @TestVariable sysname = 'Test Value'
SELECT @TestVariable
GO
 
-- using sysname for creating tables
CREATE TABLE dbo.TestTable
(
    Id int PRIMARY KEY
    , Name sysname
)
GO

If your system is configured with case-sensitive collation, make sure you type the word in lowercase.

As you see, it can be used with our codes. However, many recommend not to use because of; 1. nvarchar(128) can be simply used instead, 2. there can be a change in future version. Therefore, even though it is possible to use sysname data type with T-SQL, it is better not to use.

Connecting with Windows Azure SQL Server database via Power Pivot

While trying to get some records from a database configured in a Windows Azure machine (NOT SQL AZURE), I noticed that the connection can be made using Power Query but Power Pivot. Power Query has an option for connecting with Windows Azure Database using “From Windows Azure SQL Database” as shown in below images;

image

image

However, there is no such option with Power Pivot. If you try to connect using “From SQL Server”, you will get an error as below;

image

image

Unable to retrieve list of databases. Reason: Failed to connect to the server. Reason: Login timeout expired
A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.
Named Pipes Provider: Could not open a connection to SQL Server [53].

The error indicates that Excel tries to establish the connecting using Named Pipes protocol which is not valid with this context. Though it can be considered as the reason for this error, there is no option for changing the protocol as TCP/IP. This makes establishing the connection impossible. However, found a workaround for connecting with Windows Azure SQL Database, not using the default Provider which is “SQL Server Native Client 11.0” but using “Microsoft OLEDB Provider for SQL Server”. All we have to do is, open the Advanced window clicking “Advanced” button and change the Provider as Microsoft OLEDB Provider for SQL Server.

image

I still cannot figure out the exact reason for this and how to connect using SQL Server Native Client. If you have done it, please share.

Related post: http://dinesql.blogspot.com/2014/06/connecting-with-sql-server-virtual.html

Tuesday, June 10, 2014

SQL Server Installation: Are Shared components getting upgraded with Side-by-side Upgrade?

As you know, some of the components such as Management Studio (SSMS) and Configuration Manager are shared components of SQL Server instances and they are upgraded when an in-place upgrade is performed. However, they are not supposed to get upgraded with Side-by-side upgrade unless a certain criteria is met.

In-place and Side-by-side upgrades
In-place upgrades refers an installation of newer version of SQL Server that replaces an existing instance of SQL Server completely, including shared components. The key advantage with this is lesser requirements for additional hardware resources and the key disadvantage is incapability of rolling back at a failure of an installation. Side-by-side upgrades refers an installation of newer version of SQL Server as a new instance without replacing the older version of SQL Server installed. This is the most safest way of upgrading a system because it reduces the risk associated with upgrades leaving the original system in place and intact. However, there are few disadvantages with this, such as copying user defined databases manually, configuring security and configuring jobs used manually.

Side-by-side upgrades does not suppose to replace shared components of older version installed. However a replacement can be happened if the major build number of old instance and the major build number of the new instance are identical. This condition is rare to be occurred but this has been seen with SQL Server 2008 and 2008 R2. If you perform an installation of SQL Server 2008 R2 (version is 10.50) as a side-by-side upgrade for SQL Server 2008 (version is 10.00), though it is not an in-place upgrade, new installation will replace all shared components of SQL Server 2008 instance.

Monday, June 9, 2014

What is the fastest/quickest way of getting the total number of records (COUNT) in a SQL Server table?

Are you trying to get the total number of records in one of your tables? If yes, what is the best, or what is the quickest way of getting the number?

Generally we use COUNT aggregate function for retrieving the record count from the table. This does not take much time if the table contains low number of records. However, when COUNT is run against a table that contains millions of records, it can take a noticeable time. In a situation like that, if the table is a clustered table, the required result can be retrieved with lesser time using sys.indexes system catalog view.

The following query runs against a tables called Fact.Sales in ContosoRetailDW database (one of Microsoft sample database). This table is clustered table and it contains 3.4 millions records. Have a look on the query and the result;

USE [ContosoRetailDW]
GO
 
DECLARE @CurrentTime AS datetime = GetDate()
 
SELECT Count(*) NumberOfRecords 
FROM dbo.FactSales;
 
SELECT Cast(DateDiff(SECOND, @CurrentTime, GetDate()) AS varchar)
        + ','
        + Cast(DateDiff(MILLISECOND, @CurrentTime, GetDate()) AS varchar)
        + ','
        + Cast(DateDiff(MICROSECOND, @CurrentTime, GetDate()) AS varchar)
    AS ExecutionTime
GO
 
DECLARE @CurrentTime AS datetime = GetDate()
 
SELECT [rows] NumberOfRecords 
FROM sys.sysindexes 
WHERE id = OBJECT_ID('dbo.FactSales') and indid =1 
 
SELECT Cast(DateDiff(SECOND, @CurrentTime, GetDate()) AS varchar)
        + ','
        + Cast(DateDiff(MILLISECOND, @CurrentTime, GetDate()) AS varchar)
        + ','
        + Cast(DateDiff(MICROSECOND, @CurrentTime, GetDate()) AS varchar)
    AS ExecutionTime
 

image

As you see, the second method produces the result less than one microsecond while first method takes about 76 milliseconds. This proves that querying sys.indexes is more efficient than COUNT table when querying a clustered table. However, there is chance of seeing an inaccurate number with this method if statistics are not updated. In addition to that, if the table is a heap table, same performance may not be seen with second method.

Friday, June 6, 2014

Connecting with SQL Server virtual machine on Azure using Management Studio

If you have a requirement for connecting with SQL Serer virtual machine configured on Windows Azure, there are few things to be done. The following link explains step-by-step for getting the necessary tasks done for connecting via local Management Studio.

http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-provision-sql-server/

Connection I

Tuesday, June 3, 2014

What is LRU (Last Recent Used) algorithm?

The LRU, or Last Recent Used algorithm determines the data pages to be dropped from the cache when Data Cache section, which is the largest part of the Buffer Pool needs space.

Data Cache section is used for caching database data pages. In order to satisfy users’ requests, SQL Server loads data pages to this section and then fulfills the requirements such as SELECT, INSERT, UPDATE and DELETE. No modifications are performed on the data files directly. Once pages are cached, other than the initial request, other requests are satisfied using cached pages too.

In addition to Data Cache section, Buffer Pool comprises two more sections; Free Pages and Stolen Pages. First section contains pages that are not used yet and are kept to handle new requests. Second section is for pages that are used by other SQL Server components.

The LRU policy determines the timeframe for data pages to be stayed in the Data Cache after they have been flushed to the disk by Checkpoint process. The process that drops pages from the cache is called as Lazy Writer. The determination of dropping pages are done based on a counter maintained on the page header, that is decreased if the page has not been scanned for a while. Pages with lowest counter are always dropped by Lazy Writer.

Maximum amount of memory to SQL Server is 8TB, is it really consuming 8TB?

“As per MSDN, the maximum memory can be consumed by SQL Server is 8TB, Will SQL Server surely use 8TB if the box is configured with 8TB?”. This was a question raised by an enthusiast.

“It will, but currently it is impossible”. It was my answer. Do you agree with me? Let me explain.

Virtual Address Memory, which is called as VAS refers the total amount of memory can be used by a Windows process or an application such as SQL Server. The size of the VAS varies according the to processor architecture (32-bit or 64-bit) and the edition of the operating system. If the processor architecture is 32-bit, VAS is 4GB. Under standard installation, 2GB is reserved for the system (kernel mode address space) and 2GB is reserved for the application (user mode address space). This proportion can be changed by adding a /3GB switch to the boot.ini for setting 3GB for user mode address space.

This is bit different when a 32-bit version of SQL Server is installed on 64-bit version of Windows. This installation provides full 4GB to SQL Server using WOW emulation technology.

The amount “8TB” comes with 64-bit processor architecture. This architecture offers full 8TB for applications, hence 64-bit version of SQL Server running on 64-bit Windows should use 8TB without any issue. However, this is not possible yet because of unavailability of Windows operating system that supports 8TB. The maximum amount of RAM supported by Windows 2012 is 4TB. Therefore, we need to wait for seeing SQL Server using full 8TB for user address space :).

Monday, May 19, 2014

Power BI – Power Query Update

Couple of new features have been added to Power Query and they are available with the latest version of Power Query: 2.11.3625.

Here is the download link for Power Query: http://www.microsoft.com/en-us/download/details.aspx?id=39379

New features added;

  • New Data Sources
  • Improvements to Query Load Settings
  • Improvements to Query Refresh behaviors in Excel
  • New Transformations available in the Query Editor
  • Other Usability Improvements

This blog post describes updates available with new version: http://blogs.msdn.com/b/powerbi/archive/2014/05/06/new-updates-for-power-query.aspx

Thursday, May 15, 2014

SQL Server Data Tools (SSDT) / Business Intelligence Studio (BIDS) for 2014

Are you searching for SQL Server Data Tools or Business Intelligence Development Studio after installing SQL Server 2014? If so, you are reading the right post.

SQL Server Data Tools or Business Intelligence Development Studio is used for creating projects on Integration Services, Analysis Services, Reporting Services, etc. Usually this gets automatically installed when SQL Server is installed but not with SQL Server 2014. This needs to be separately downloaded and installed.

This is basically a set of Visual Studio project templates that allows us to work with Microsoft SQL Server Business intelligence product suite. With SQL Server 2008 and 2008 R2, it was named as Business Intelligence Development Studio (BIDS) and with SQL Server 2012, Microsoft named it as SQL Server Data tools (SSDT). However, confusing us, there was another tool available with the same name which was used for creating database projects with Visual Studio. In order to clear the confusion, with SQL Server 2014, it has been named as Microsoft SQL Server Data Tools – Business Intelligence for Visual Studio (SSDTBI).

You can download Microsoft SQL Server Data Tools – Business Intelligence for Visual Studio 2012 from here:
http://www.microsoft.com/en-us/download/details.aspx?id=36843

This is still not ready for Visual Studio 2013. Microsoft initially made it available but the release has been called off due to some side by side issues with SSDT.
(Updated 02-Jun-2014). Microsoft SQL Server Data Tools – Business Intelligence for Visual Studio 2013 is now available. Read more on this from:
http://www.microsoft.com/en-us/download/details.aspx?id=42313

Sunday, May 4, 2014

Cumulative Update packages for SQL Server 2008 R2, 2012, and 2014

Cumulative update #12 is available for SQL Server 2008 R2 SP2. Refer the following link for downloading it and understanding the fixes done.

http://support.microsoft.com/kb/2938478

Cumulative update #9 is available for SQL Server 2012 SP1. Refer the following link for downloading it and understanding the fixes done.

http://support.microsoft.com/kb/2931078

Cumulative update #1 is available for SQL Server 2014. Refer the following link for downloading it and understanding the fixes done.

http://support.microsoft.com/kb/2931693

For more info on SQL Server versions and service packs, refer:http://dinesql.blogspot.com/2014/01/versions-and-service-packs-of-sql.html

Sunday, April 6, 2014

Sri Lanka crowned T20 champs - 2014

We did it again! Thanks Sanga, Mahela and the team, proud to be a Sri Lankan.

Reasons for adding a surrogate key for dimension tables

As a best practice, we always add a new key, known as a surrogate key to dimension tables for identifying the records uniquely. However, generally, the unique business key becomes the primary key of relational tables related to business entities but once the surrogate key is added to dimension tables, we make it as the primary key without reusing the business key. Here is an example from AdventureWorksDW data warehouse.

image

Surrogate key is an integer that is assigned sequentially when records are added. There are many other names for this such as artificial key, synthetic key, meaningless key, integer key or non-natural key. Do we really need this? Does it seem inappropriate using the business keys for primary keys? It may seem sensible for reusing, but here are some valid reasons for not using business keys but using surrogate keys;

  1. Dimension is populated from multiple sources
    In many cases, dimensions are populated from multiple sources. This always introduces incompatibility between business keys’ data types originating from sources. Not only that, the uniqueness of the business key is not always guaranteed too.
  2. Business key is reassigned when an object becomes obsolete
    OLTP systems always focus on current operations giving less importance to the history. If an object becomes inactive or obsolete (Example: Organization decides to discontinue a product), there is a high probability to get the business key used assigned to a new object. If the business key is used as the key of the dimension, this becomes an issue because dimension already holds an object with the same business key.
  3. Business key is a complex string or GUID
    Business keys used in source systems could be formed combining numbers and different characters or with GUIDs. In such scenario, there is no harm of using the business key as the primary key of the dimension table as long as the uniqueness is guaranteed. However, in terms of performance, it makes more effective to have an integer surrogate key than a complex string.
  4. Integer improves the performance
    As mentioned with reason 3, it is always better to have an integer key as the key of the dimension even though the business key is a simple string. This improves the efficiency of the table and speeds up the data retrieval when joining with facts.
  5. Objects without business keys
    When populating dimension records, there could be a situation that object needs to be populated without a business key. Good example on this is, holding a newly introduced product in the Product dimension, before extracting it from the source, for supporting a transaction exist in the fact table. Another example is, holding an anonymous customer in the Customer dimension for supporting promotion events recorded in the fact table. It is become possible for holding these objects in dimension tables as “Unknown”, “Not applicable”, or “Anonymous” when surrogate key is used.
  6. Objects are versioned in dimension tables
    In order to maintain the history in dimension tables, same objects are versioned by adding multiple records (These dimensions are called as Slowly Changing Dimension Type 2). Example: Customer is recorded with his country Sri Lanka. Later customer becomes a citizen of Australia. In OLTP systems, it is just a change for country column. But in data warehousing, for maintaining the history for events recorded (facts), a new version (new record) of the customer is added with Australia for the country. This is not possible if the business key of the customer is used as the key of the dimension.

Of course, the maintenance cost gets increased and a mechanism has to be introduced for generating surrogate keys and linking them with foreign keys in fact tables. But it is not as difficult as many think because there are many built-in facilities available in DBMS and ETL systems.

Saturday, April 5, 2014

Free eBook: Introducing Microsoft SQL Server 2014

Want to know some major features added to SQL Server 2014? Here is an easy way. This book explains how SQL Server 2014 leverages in-memory technology for improving the performance of OLTP and data warehousing solutions, how easily on-premises solutions can be transferred to cloud with added support for hybrid environment, etc. If you are a DBA, DBE or an engineer who wants to know the new features and capabilities, this is the book to be read.

Download links:

Chapters in this book;

  • Chapter 1: SQL Server 2014 editions and engine enhancements
  • Chapter 2: In-Memory OLTP investments
  • Chapter 3: High-availability, hybrid-cloud, and backup enhancements
  • Chapter 4: Exploring self-service BI in Microsoft Excel 2013
  • Chapter 5: Introducing Power BI for Office 365
  • Chapter 6: Big data solutions

Wednesday, April 2, 2014

Easy way of adding your Microsoft Certification to LinkedIn Profile

If you are maintaining your profile with world largest professional network (Im sure that you are) and you have done a Microsoft Certification, there is an easy way of adding the certification to your LinkedIn profile rather than manually adding it. The new way has been introduced by teaming up with LinkedIn and it is opened to you via an email which you receive with details once an exam is done. All you have to do is, hit the button populated with all details required (since I have not done a certification recently, I have no idea how this will be appeared in the mail, I believe it is similar to below image).

In order to receive the mail, make sure you have enabled “Promotional email” in MCP profile setting and you are using the same email address for both MCP profile and LinkedIn profile.

For more details, read this post published by Carrie Francey (MS): http://borntolearn.mslearn.net/btl/b/weblog/archive/2014/03/25/announcing-a-new-way-to-add-microsoft-certifications-to-your-linkedin-profile.aspx

SQL Server exams update: SQL Server 2012 and 2014

Are you planning to do Microsoft SQL Server certifications in next few weeks? If so, here is an announcement on exams updates. All MCSE related SQL Server exams (464 to 467) will be updated on April 24, 2014 adding new features introduced with SQL Server 2014. Numbers of the exams will not be changed but the title will be adjusted with 2014. Official courseware for these exams have not been updated yet, most probably, updated courseware will be published during May 2014.

Here are the exams that will be updated;

Use following links to see the changes done in skills measured;

MCSA related SQL Server exams (461 to 463) will be remain unchanged.

Tuesday, April 1, 2014

Data Warehouse does not need a large number of non-clustered indexes?

It is a fact that non-clustered indexes improve the performance of data retrieval and we tend to add them mostly based on predicates used. As Data warehouses mainly design for data retrieval, many think that a data warehouse must contain non-clustered indexes as much as possible. In a way, it is true and required, however, we need to thoroughly analyze the requirements on indexing in data warehousing because generally non-clustered indexes do not help much with dimension and fact tables in data warehouses.

Non-clustered indexes generally give better performance on high selective queries. An example for a high selective query would be a query written for getting all customers for a given last name. Mostly attributes in a dimension table are used for pivoting in reports and typically hold few distinct values. Because of that, filters used in such reports are not much selective and will not get any benefits from indexes. However, parameterized reports looking for a particular value from an attribute would benefit as it is very selective. Therefore, when determining an index on an attribute, consider the reporting requirement similar to section example and then add.

It is common practice that adding non-clustered indexes on foreign keys. A typical data warehouse created based on Star-Schema has fact tables and fact tables have foreign keys linked to dimension tables. Do we need to create non-clustered indexes on these keys? Answer is “No”. The reason for that is, “star join query optimizations” feature in SQL Server (Enterprise only). This feature recognizes star join patterns and it uses bitmap filtered hash joins for joining tables in efficient way. Therefore indexes on foreign keys are not required unless a different edition is used. However, if the required dataset is smaller, indexes would be created on foreign keys as merge and nested loops joins give better performance than hash joins. Read more on that here: http://technet.microsoft.com/en-us/library/bb522541(v=sql.105).aspx

Considering above facts, we rarely add non-clustered indexes on both fact and dimension tables. In addition to that, the volume of records involved with the query (IO), the high cost for maintaining indexes, rebuilding every time data is loaded are other factors for not using non-clustered indexes in DWs.

Sunday, March 30, 2014

Magic Quadrant for Business Intelligence and Analytics Platforms – 2014 February

Gartner has published its Magic Quadrant for Business Intelligence and Analytics platforms covering 17 capabilities grouping them into 3 categories. Business intelligence and Analytics as a software platform has been considered for this.

I have summarized capabilities used for this in a simpler format. You can read the entire document at: http://www.gartner.com/technology/reprints.do?id=1-1QYUTPJ&ct=140220&st=sb.

  • Information delivery
    • Reporting: ability to create interactive reports with or without parameters
    • Dashboards: ability to display operational or strategic information using graphical interfaces such as graphs, gauges and maps.
    • Ad hoc report/query: ability to get required information without IT support connected with predefined sources, metrics, hierarchies, etc. 
    • Microsoft Office integration: ability use Microsoft Office (Excel,..) as reporting and analytics client including advanced features like cell-locking and write-back. 
    • Mobile BI: ability to develop and deliver content to mobile devices and use native functionalities such as touchscreens, camera, etc.
  • Analysis
    • Interactive visualization: ability to interact with information with advanced visual representations that go beyond traditional visualization options such as charts, heat-maps, tree-maps, etc.
    • Search-based data discovery: ability to do search on both structured and unstructured and map easily into a classification structure of dimension and measures.
    • Geospatial and location intelligence: ability to combine geographic and location-related data from multiple sources such as aerial maps, GISs with enterprise data, and display them by overlaying on maps. Advanced features such as distance and route calculations, geofencing, 3D visualizations are considered too.
    • Embedded advanced analytics: ability to leverage statistical function library to consume analytics methods such as Predictive model markup language, R-based models, forecasting, “what-if” analysis with visualizations.
    • Online analytical processing (OLAP): ability to provide OLAP features such as drilling-down, slicing and dicing, write-back with “what-if”, etc. using data architectures like relational, multidimensional or hybrid and storage architectures like disk-based or in-memory. 
  • Integration
    • BI infrastructure and administration: ability to use all components such as security, metadata, etc. by all tools in the platform. The platform should support multitenancy.
    • Metadata management: ability to manage systems-of-records semantic and metadata centrally, including user-defined data mashup and metadata.
    • Business user data mashup and modeling: ability to create user-defined analytical models connecting with multiple multi-structured sources using code-free, drag-and-drop functionalities. Advanced features such as semantic auto-discovery, intelligence join etc. are considered too.
    • Development tools: ability to provide set of programmatic and visual tools for developing objects such as reports, dashboards, models etc.
    • Embeddable analytics: ability to create and modify analytic content, visualizations, and application embedding them into business processes.
    • Collaboration: ability to start discussion thread on information, analysis, analytic content, etc.
    • Support for big data sources: ability to connect with hybrid, columnar and array-based sources such as MapReduce.

Thursday, March 20, 2014

Door-Quiz – SQL Server Sri Lanka User Group Meet-up

We had another Door-Quiz at SS SLUG yesterday and as usual, the question asked was tricky. However, many had circled the correct answer and one lucky person grabbed the goodie-pack. Here is the question asked;

image

Whenever we see a numerator is getting divided by 0, all we expect  is an error because, it is a common fact that division by zero has no value. If we execute the SELECT statement separately, SQL Server will definitely throw an error saying it. But the tricky part of this question is, combining this fact with EXISTS predicate. EXISTS in this scenario only considers the existence of result-set produced by SELECT rather executing it as a general SELECT. Since the predicate becomes TRUE, SQL Server goes to PRINT ‘yes’ without producing any error.

Answer for the question is “yes"Smile.

Thursday, March 13, 2014

Who invented dimensional modeling?

Who introduced or invented Dimensional Modeling? Was it invented by Ralph Kimball? Since he completely aligns with Dimensional Modeling in his business intelligence strategies, many believe that it was from him. Unfortunately, it is one of the misperceptions persist in the industry related to Dimensional Modeling.

What is Dimensional Modeling?
Dimensional Modeling is a technique that has been used for decades for structuring databases in simple manner, supporting business users’ requirements such as understanding data, navigating through them easily, and running both simple and complex data analysis queries. Dimensional model designed with relational database management systems is called as Start-Schemas and dimensional model designed with multi-dimensional databases is called as OLAP cubes.

Who invented this?
History is limited on this, however, as per the record exist;

  • The terms “Facts” and “Dimension” were introduced with a joint project developed by General Mills and Dartmouth University in 1960’s.
  • During 1970’s, Nielsan Marketing Research team used these techniques for grocery and drug store data.
  • In 1980’s Nielsen Marketing Research and IRI used grocery and drug store scanner data to link with customer’s internal shipment data.

As per the history, no single person holds the credit of introducing the concept but it is something that have been evolved with contributions of many.

Read more about Dimensional Modeling: http://en.wikipedia.org/wiki/Dimensional_modeling

Why SQL Server does not drop the procedure and recreate the object when the code is written properly?

This is bit funny, it took few minutes to figure out the reason for getting “There is already an object named '' in the database.” error when the code for dropping and recreating is properly written. It is one of key things to remember when coding, hence sharing the experience.

Have a look at the code and the error thrown. Note that it is not the actual code, the code below is written to show the scenario. Please assume that there are many other statements before and after the code given.

  1.  
  2. ...
  3. ...
  4.  
  5. GO
  6.  
  7. -- dropping GetSalesFor2007 if exist
  8. -- and creating
  9. IF OBJECT_ID('dbo.GetSalesFor2007') IS NOT NULL
  10.     DROP PROC dbo.GetSalesFor2007
  11. GO
  12. CREATE PROCEDURE dbo.GetSalesFor2007
  13. AS
  14. BEGIN
  15.     
  16.     SELECT SalesOrderID, SalesOrderNumber
  17.     FROM Sales.SalesOrderHeader
  18.     WHERE OrderDate BETWEEN '01/01/2007'
  19.         AND '12/31/2007 23:59:59.000'
  20. END
  21.  
  22. -- dropping GetSalesFor2008 if exist
  23. -- and creating
  24. IF OBJECT_ID('dbo.GetSalesFor2008') IS NOT NULL
  25.     DROP PROC dbo.GetSalesFor2008
  26. GO
  27. CREATE PROCEDURE dbo.GetSalesFor2008
  28. AS
  29. BEGIN
  30.     
  31.     SELECT SalesOrderID, SalesOrderNumber
  32.     FROM Sales.SalesOrderHeader
  33.     WHERE OrderDate BETWEEN '01/01/2008'
  34.         AND '12/31/2008 23:59:59.000'
  35. END
  36.  
  37. ...
  38. ...

image

As you see, SQL Server tries to create the procedure (Line no: 27) without dropping it using the DROP PROC statement given (Line no: 25). Simply, the reason for this is, missing the batch separator between end of first procedure and DROP statement of second procedure.

What is a batch?
T-SQL Batch is a collections of SQL statements that need to be parsed, normalized and executed as a single unit. The end of the batch is indicated using GO statement (Read more on GO: http://dinesql.blogspot.com/2009/01/separating-batches-using-new-word.html). There are two important points we need to remember when working with T-SQL batches;

  1. The boundaries for scopes of variables are determined using batch separators, hence variables declared in one batch cannot be used beyond the GO statement. In other words, variables declared cannot be used in a different batches other than the batch it is declared.
  2. Most DDL statements require separate batches and may not be combined with other statements.

If the second point is clearly understood, then the reason for above error can be figured out. Statements like CREATE PROC, CREATE VIEW cannot be combined with other statements in the batch and these statements must start with a new batch. If you note the DROP statement for the second procedure (Line no: 25), you will see that there is no GO statement in between end of first procedure and beginning of DROP statement for the second procedure. Therefore the DROP statement becomes a part of the first batch which contains the first procedure, making the DROP statement as part of first procedure. Have a look on below output;

image

As you see, the DROP statement is in the body of first procedure. If we place a GO statement at the end of the first procedure, it will not become a part of first procedure.

Here are all statements that require a separate batch; CREATE DEFAULT, CREATE FUNCTION, CREATE PROCEDURE, CREATE RULE, CREATE SCHEMA, CREATE TRIGGER, and CREATE VIEW.

Sunday, March 9, 2014

Thursday, February 27, 2014

SQL Server: Calculating running totals using T-SQL

This is not a new topic. If you search, you will surely find many posts on this, mostly with traditional techniques but not using newest capabilities such as SQL Windowing. Since I wrote a post on Analysis Services for the same, thought to write the same on T-SQL too. Here is the way of calculating running totals using Window components and functions which provides an efficient way of calculating and simple code structure.

The following query shows the way of calculating. The first code creates a window based on SalesOrderId (which is unique) and get the running totals over SubTotal for a given year. The second code creates a window on OrderDate (which is not unique). This will show the totals for the date instead of Running-Totals for the date unless the range is specified using boundaries. That is the reason for adding upper and lower boundaries using ROW, UNBOUNED PRECEDING and CURRENT ROW inside the window for restricting rows to be participated for the calculation.

  1. -- Window based on OrderId which is unique
  2. SELECT OrderDate, SalesOrderID, SalesOrderNumber, CustomerID
  3.     , SubTotal Total
  4.     , SUM(SubTotal) OVER(ORDER BY SalesOrderID) RunningTotal
  5. FROM Sales.SalesOrderHeader
  6. ORDER BY OrderDate, SalesOrderID
  7.  
  8. -- Window based on OrderDate which is not unique
  9. SELECT OrderDate, SalesOrderID, SalesOrderNumber, CustomerID
  10.     , SubTotal Total
  11.     , SUM(SubTotal) OVER(ORDER BY OrderDate
  12.         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) RunningTotal
  13. FROM Sales.SalesOrderHeader
  14. ORDER BY OrderDate, SalesOrderID

Both queries produce same result;

image

Here is a comparison on the same using traditional techniques. Though it shows that the query uses window is faster than other queries, always check and pick the best.

  1. -- Using Window components
  2. SELECT OrderDate, SalesOrderID, SalesOrderNumber, CustomerID
  3.     , SubTotal Total
  4.     , SUM(SubTotal) OVER(ORDER BY SalesOrderID) RunningTotal
  5. FROM Sales.SalesOrderHeader
  6. WHERE YEAR(OrderDate) = 2006
  7. ORDER BY OrderDate, SalesOrderID
  8.  
  9. -- Using self-join
  10. SELECT h1.OrderDate, h1.SalesOrderID, h1.SalesOrderNumber, h1.CustomerID
  11.     , h1.SubTotal Total
  12.     , SUM(h2.SubTotal) RunningTotal
  13. FROM Sales.SalesOrderHeader h1
  14.     INNER JOIN Sales.SalesOrderHeader h2
  15.         ON h1.SalesOrderID >= h2.SalesOrderID
  16.             AND YEAR(h2.OrderDate) = 2006
  17. WHERE YEAR(h1.OrderDate) = 2006
  18. GROUP BY h1.OrderDate, h1.SalesOrderID, h1.SalesOrderNumber
  19.     , h1.CustomerID, h1.SubTotal
  20. ORDER BY h1.OrderDate, h1.SalesOrderID
  21.  
  22. -- Using sub query
  23. SELECT h1.OrderDate, h1.SalesOrderID, h1.SalesOrderNumber, h1.CustomerID
  24.     , h1.SubTotal Total
  25.     , (SELECT SUM(h2.SubTotal) RunningTotal
  26.         FROM Sales.SalesOrderHeader h2
  27.         WHERE h1.SalesOrderID >= h2.SalesOrderID
  28.             AND YEAR(h2.OrderDate) = 2006)
  29. FROM Sales.SalesOrderHeader h1
  30. WHERE YEAR(h1.OrderDate) = 2006
  31. ORDER BY h1.OrderDate, h1.SalesOrderID

image