Sunday, April 6, 2014

Sri Lanka crowned T20 champs - 2014

We did it again! Thanks Sanga, Mahela and the team, proud to be a Sri Lankan.

Reasons for adding a surrogate key for dimension tables

As a best practice, we always add a new key, known as a surrogate key to dimension tables for identifying the records uniquely. However, generally, the unique business key becomes the primary key of relational tables related to business entities but once the surrogate key is added to dimension tables, we make it as the primary key without reusing the business key. Here is an example from AdventureWorksDW data warehouse.

image

Surrogate key is an integer that is assigned sequentially when records are added. There are many other names for this such as artificial key, synthetic key, meaningless key, integer key or non-natural key. Do we really need this? Does it seem inappropriate using the business keys for primary keys? It may seem sensible for reusing, but here are some valid reasons for not using business keys but using surrogate keys;

  1. Dimension is populated from multiple sources
    In many cases, dimensions are populated from multiple sources. This always introduces incompatibility between business keys’ data types originating from sources. Not only that, the uniqueness of the business key is not always guaranteed too.
  2. Business key is reassigned when an object becomes obsolete
    OLTP systems always focus on current operations giving less importance to the history. If an object becomes inactive or obsolete (Example: Organization decides to discontinue a product), there is a high probability to get the business key used assigned to a new object. If the business key is used as the key of the dimension, this becomes an issue because dimension already holds an object with the same business key.
  3. Business key is a complex string or GUID
    Business keys used in source systems could be formed combining numbers and different characters or with GUIDs. In such scenario, there is no harm of using the business key as the primary key of the dimension table as long as the uniqueness is guaranteed. However, in terms of performance, it makes more effective to have an integer surrogate key than a complex string.
  4. Integer improves the performance
    As mentioned with reason 3, it is always better to have an integer key as the key of the dimension even though the business key is a simple string. This improves the efficiency of the table and speeds up the data retrieval when joining with facts.
  5. Objects without business keys
    When populating dimension records, there could be a situation that object needs to be populated without a business key. Good example on this is, holding a newly introduced product in the Product dimension, before extracting it from the source, for supporting a transaction exist in the fact table. Another example is, holding an anonymous customer in the Customer dimension for supporting promotion events recorded in the fact table. It is become possible for holding these objects in dimension tables as “Unknown”, “Not applicable”, or “Anonymous” when surrogate key is used.
  6. Objects are versioned in dimension tables
    In order to maintain the history in dimension tables, same objects are versioned by adding multiple records (These dimensions are called as Slowly Changing Dimension Type 2). Example: Customer is recorded with his country Sri Lanka. Later customer becomes a citizen of Australia. In OLTP systems, it is just a change for country column. But in data warehousing, for maintaining the history for events recorded (facts), a new version (new record) of the customer is added with Australia for the country. This is not possible if the business key of the customer is used as the key of the dimension.

Of course, the maintenance cost gets increased and a mechanism has to be introduced for generating surrogate keys and linking them with foreign keys in fact tables. But it is not as difficult as many think because there are many built-in facilities available in DBMS and ETL systems.

Saturday, April 5, 2014

Free eBook: Introducing Microsoft SQL Server 2014

Want to know some major features added to SQL Server 2014? Here is an easy way. This book explains how SQL Server 2014 leverages in-memory technology for improving the performance of OLTP and data warehousing solutions, how easily on-premises solutions can be transferred to cloud with added support for hybrid environment, etc. If you are a DBA, DBE or an engineer who wants to know the new features and capabilities, this is the book to be read.

Download links:

Chapters in this book;

  • Chapter 1: SQL Server 2014 editions and engine enhancements
  • Chapter 2: In-Memory OLTP investments
  • Chapter 3: High-availability, hybrid-cloud, and backup enhancements
  • Chapter 4: Exploring self-service BI in Microsoft Excel 2013
  • Chapter 5: Introducing Power BI for Office 365
  • Chapter 6: Big data solutions

Wednesday, April 2, 2014

Easy way of adding your Microsoft Certification to LinkedIn Profile

If you are maintaining your profile with world largest professional network (Im sure that you are) and you have done a Microsoft Certification, there is an easy way of adding the certification to your LinkedIn profile rather than manually adding it. The new way has been introduced by teaming up with LinkedIn and it is opened to you via an email which you receive with details once an exam is done. All you have to do is, hit the button populated with all details required (since I have not done a certification recently, I have no idea how this will be appeared in the mail, I believe it is similar to below image).

In order to receive the mail, make sure you have enabled “Promotional email” in MCP profile setting and you are using the same email address for both MCP profile and LinkedIn profile.

For more details, read this post published by Carrie Francey (MS): http://borntolearn.mslearn.net/btl/b/weblog/archive/2014/03/25/announcing-a-new-way-to-add-microsoft-certifications-to-your-linkedin-profile.aspx

SQL Server exams update: SQL Server 2012 and 2014

Are you planning to do Microsoft SQL Server certifications in next few weeks? If so, here is an announcement on exams updates. All MCSE related SQL Server exams (464 to 467) will be updated on April 24, 2014 adding new features introduced with SQL Server 2014. Numbers of the exams will not be changed but the title will be adjusted with 2014. Official courseware for these exams have not been updated yet, most probably, updated courseware will be published during May 2014.

Here are the exams that will be updated;

Use following links to see the changes done in skills measured;

MCSA related SQL Server exams (461 to 463) will be remain unchanged.

Tuesday, April 1, 2014

Data Warehouse does not need a large number of non-clustered indexes?

It is a fact that non-clustered indexes improve the performance of data retrieval and we tend to add them mostly based on predicates used. As Data warehouses mainly design for data retrieval, many think that a data warehouse must contain non-clustered indexes as much as possible. In a way, it is true and required, however, we need to thoroughly analyze the requirements on indexing in data warehousing because generally non-clustered indexes do not help much with dimension and fact tables in data warehouses.

Non-clustered indexes generally give better performance on high selective queries. An example for a high selective query would be a query written for getting all customers for a given last name. Mostly attributes in a dimension table are used for pivoting in reports and typically hold few distinct values. Because of that, filters used in such reports are not much selective and will not get any benefits from indexes. However, parameterized reports looking for a particular value from an attribute would benefit as it is very selective. Therefore, when determining an index on an attribute, consider the reporting requirement similar to section example and then add.

It is common practice that adding non-clustered indexes on foreign keys. A typical data warehouse created based on Star-Schema has fact tables and fact tables have foreign keys linked to dimension tables. Do we need to create non-clustered indexes on these keys? Answer is “No”. The reason for that is, “star join query optimizations” feature in SQL Server (Enterprise only). This feature recognizes star join patterns and it uses bitmap filtered hash joins for joining tables in efficient way. Therefore indexes on foreign keys are not required unless a different edition is used. However, if the required dataset is smaller, indexes would be created on foreign keys as merge and nested loops joins give better performance than hash joins. Read more on that here: http://technet.microsoft.com/en-us/library/bb522541(v=sql.105).aspx

Considering above facts, we rarely add non-clustered indexes on both fact and dimension tables. In addition to that, the volume of records involved with the query (IO), the high cost for maintaining indexes, rebuilding every time data is loaded are other factors for not using non-clustered indexes in DWs.