Friday, May 29, 2015

SSRS: The parameter is missing a value

Are you experiencing the following error with one of your Reporting Services parameters? Specifically with a hidden parameter?


To overcome this issue, make sure you have a default value for the parameter. Here are the steps;

1. General Parameter Properties. Note that visibility is hidden.


2. Then set as source for the parameter as you want.

3. And set the default value as you want.


Once this is done, it will work without any issue.

Thursday, May 28, 2015

SSRS: How to format datetime as Oracle YYYYMMDDHH24

If a SSRS report has to be designed for an Oracle database and datetime needs to be displayed with date and hour, there are many different ways for handling it. Oracle supports many different formatting on datetime and sometime values in the report need to be displayed as per Oracle format such as YYYYMMDDHH24. That was one of the specific requirements I had, so thought to share the way I did it assuming that it will be useful for you too.

With Oracle, following code returns 2015052813 if the date is 2015-05-28 and time is 1:30pm.

SELECT to_char(sysdate, "YYYYMMDDHH24") FROM dual;

If the same format has to be displayed with SSRS, here is the way to write the expression.


Tuesday, May 26, 2015

SQLSaturday #417 - Speakers and Sessions


Speaker: Pinal Dave

Pinal Dave is a Technology Evangelist for Pluralsight. He has written over 3,300 related articles on his blog at http://blog.sqlauthority.com. He has also authored 17 video courses and authored 11 books. He specializes in SQL Server performance tuning and has over 11 years of hands-on experience and holds a Master's of Science degree and a number of certifications, including MCTS, MCDBA, and MCAD (.NET). He has presented at many technology conferences and is consistently a top-rated speaker. 

Session I: Performance in 60 Seconds - SQL Tricks Everybody MUST Know
Data and Database are very important aspects of application development for businesses. Developers often come across situations where they face a slow server response, even though their hardware specifications are above par. This session is for all the Developers who want their server to perform at blazing fast speed, but want to invest very little time to make it happen. We will go over various database tricks which require absolutely no time to master and require practically no SQL coding at all. After attending this session, Developers will only need 60 seconds to improve performance of their database server in their implementation. We will have a quiz during the session to keep the conversation alive. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. 

Session II: Indexes – The Unsung Hero
Slow Running Queries  are the most common problem that developers face while working with SQL Server. While it is easy to blame the SQL Server for unsatisfactory performance, however the issue often persists with the way queries have been written, and how Indexes has been set up. The session will focus on the ways of identifying problems that slow down SQL Server, and Indexing tricks to fix them. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session.


Speaker: Nilabja Ball

Nilabja is a Premier Field Engineer for Microsoft Global Business Support (GBS), India. He has more than 8 years of SQL Server and BI experience. He is engaged with the customer either for proactive activities including health check, assessment and support or for reactive tasks to troubleshoot and fix complex and critical issues. He has an engineering degree from Visvesvaraya technological university and keen to learn new technologies.

Session I: SQL Server 2014 In-Memory Overview
Overview of the in-memory technologies which are included in SQL Server 2014. We will cover In-Memory OLTP (AKA "Hekaton"), In-Memory Data Warehousing (AKA "Apollo") as well as SSD Buffer Pool Extensions. This session will give understanding of the broad in-memory strategy as well as a basic understanding of the technologies involved.

Session II: NoSQL on Azure: An Introduction to DocumentDB
NoSQL data stores have gone mainstream. In this session, we will survey the expanding NoSQL landscape on Azure, providing a big-picture view of these technologies. Also spend a bit more time on DocumentDB, Microsoft’s document database-as-a-service, and Azure Search. This session is aimed at anybody who’s new to NoSQL or is curious about the NoSQL options that Azure provides today.


Speaker: Sammani Palansuriya 

Sammani Palansuriya is a Visual C#.net MVP in Sri Lanka with more than 5 years' experience in various aspects of .net technologies. Currently she is stationed as a Senior Software Engineer at Exilesoft and working on projects based on asp.net and azure technologies.

Session I: My Data in the Cloud: Microsoft Azure SQL Database
Microsoft Azure SQL Database (formerly SQL Azure) is another Microsoft Azure service that offers relational database-as-a-service with proven performance, scalability, data security and near-zero administration.
This discussion will give you an understanding about Azure SQL Database; what this service is and what it offers. It will also help you understand what sort of requirements will benefit from Azure SQL Database, and what factors you need to know in order to choose this service, along with pros and cons of choosing Azure SQL Database


Speaker: Prabhakaran Bhaskaran 

Prabhakaran is SQL Server consultant at Pythian, a company based in Ottawa, Canada, specializing in Remote database services. He is former SQL Server Support Engineer at Microsoft. He always loves to learn and share his knowledge.

Session I: Inside Transactional Replication
You can probably do a good job in configuring and maintaining transactional replication in your environment. You can get these things done with reasonable quality without knowing how SQL Server does things under the covers. However if you want to take your skills to the next level and troubleshoot complex replication issues, it's crucial to know how transactional replication works internally.
This session will answers questions like; How does the log reader know where to start the scan? How does it add the commands to the Distribution database? How many threads are involved in a replication process? How does distribution agent look for pending transactions to distribute to its subscribers? Knowing all these will help you to manage and troubleshoot the replication very effectively. Come lets learn together!


Speaker: Thurupathan Vijayakumar 

A versatile technical speaker with 5 years of industry experience in SQL Server technologies, software development and cloud computing.

Session I: Setting your first foot in the world of Business Intelligence
Have you ever wondered why we need a data warehouse when we have the good old stringent relational database scheme. Do you want to get to know the concepts of data warehousing and business intelligence? Then this is the session you have been looking for: A tour from OLTP to OLAP with Microsoft technologies. Gear up for the excitement!                                                                                                                                                                                             
                                                                                                                    .
.

Speaker: Indika Ariyasinghe 

Indika Ariyasinghe is an Associate Architect - Technology in Virtusa (Pvt) Ltd. He holds a Software Engineering Degree from University of Peradeniya, Faculty of Engineering and possesses 11 years total experience in software development including business intelligence.

Session I: Enterprise MS BI for everyone, The Journey So Far
Virtusa is a rapidly evolving company with diverse business intelligence needs spanning all level of the organization; from CxOs to Engineers (numbering around 8000 employees); from mobile devices to public area dashboards. In this session, I will share with you the challenges faced and how we have over come them so far, and plans for beyond future.                                                                                                                                                                                                                                     .
.                                                                                                                                                                              
                                            .

Speaker: Gogula Aryalingam 

Gogula is a database and business intelligence professional with 9+ years on the Microsoft data platform. He is an MCSE, MCT, former MVP and an MSc in IT. His passion is in building BI and data-based solutions, writing about things he works with and speaking about them. He loves teaching and working with the technical community. He has spoken at local and foreign events, along with regular stints at the local SQL Server user group. He was also an SME for Microsoft Certifications.

Session I: Telling Stories with Data
Data is all around us; it's in bits and pieces; it's in shapeless chunks. These data can tell us stories. Organizations have loads of these data lying around, but seldom make use of it. Modern data analytics methods however, give us a plethora of methods to access this data, mash them up and tell stories using them. Stories, that can help make effective decisions.
This session looks at how you could collect data from the various sources in your organization, combine them, process them and tell stories using the wealth of tools given to us by Microsoft. Words you would hear are self-service data analytics, big data and machine learning.


Speaker: Dinesh Priyankara

Dinesh Priyankara (MSc IT) is a SQL Server MVP in Sri Lanka with 14 years’ experience in various aspects of database technologies including business intelligence. He has done many local and foreign business intelligence implementations and has worked as subject matter expert on various database and business intelligence related projects. Currently he is stationed as a Senior Architect - Specialist at Virtusa (Pvt) Ltd. He teaches, consults and runs training programs for various institutes.

Session I: Big Data Analytics: The Microsoft Way
Have you already incorporated  “Big Data” into your analytical solution, but still feel you are going nowhere with it? Or are you planning to bring “Big Data” into your analytical solution, but not sure which way to go? Do you wonder about how to get it done, and what tools and methods that you need to use? Let us talk about it; let us understand the “Big Data” requirement; how to and the best way of bringing it into analytical stack using the Microsoft way, of course.


Monday, May 25, 2015

Guest Post: How much data will be loss when DBCC CHECKDB reports Consistency Errors?

This post is written by: Priyanka Chouhan

Every company or an individual, who use database systems, always wants to ensure the database integrity of the stored database. Therefore, you should always run the database consistency against your database, to know the accuracy and purity. In order to regularize the database maintenance routine, run DBCC CHECKDB daily, weekly, monthly. You might have thinking that how do you know what data potentially gets lost? Your next question could be about the performance enhancements after running the DBCC CHECKDB and also about the need of running DBCC CHECKDB without any performance implications.

At times, database system specifically mentions that using this option leaves the database in an unsupported configuration, but still if you proceed then there are chances of database inconsistency. This write-up lets you know the process of identifying the possible potential data loss when DBCC CHECKDB reports corruption in your SQL Server database.

Carefully read the output from DBCC CHECKDB: After running the DBCC CHECKDB, it will show the information in bulk to get you started. Read the output thoroughly, and remember the below mentioned information.

Identify the affected objects: You need to use the output of DBCC CHECKDB to identify the affected object. Usually DBCC CHECKDB does not recommend the REPAIR_ALLOW_DATA_LOSS option for every corruption case. When you identify errors it will give you an Object in ID value of the affected object, which can be used to find out the affected object and then you can get the infected table.

Identify the affected page: After knowing the information about the affected object, you can decide whether the infected data is critical or not. If you talk about an organization, then all the data can be considered as critical and need a treatment. For example, let’s assume that the identified data is critical and you need to minimize the data loss. By identifying the unprocessed page number from the output of DBCC CHECKDB, you can determine the corrupted page. Additionally, it will also explain you about the reference pages related to the corrupt page. Ultimately, it will provide you the complete information that is required to know about the corruption in the DBCC CHECKDB output.

Resolve the corruption: After successfully find out the corrupted page along with its index format. You do not need to restore from backup or run REPAIR_ALLOW_DATA_LOSS to repair; instead you can just disable and rebuild the index. The time taken is dependent upon the size of the index and the impact it has on the system. It's up to you to decide that either you will do it during or after working hours.

Since you have the latest backup and you are planning to restore your database from it, then drop this idea. It would be useless to restore data from the backup, as this restoring process will consume lots of time, in case you own a big database. Therefore, it is better to repair only known corrupted pages. Disabling and rebuilding the index is always a better way to make your complete database offline to resolve the corrupted non-clustered index. However, whenever the DBCC CHECKDB reported a bunch of errors, it is not always compulsory to take your database offline to fix it.

Sunday, May 24, 2015

SQL Server Brain Basher of the Week #013 - Global Temporary Tables

Though we create temporary tables frequently for manipulating temporary data, global temporary tables are infrequently created. As you know, local temporary table is limited to the current session, and global temporary table is for all available sessions. Local temporary table will be automatically dropped when your session is disconnected and global temporary table will be remained until last connection that has referenced the table is disconnected. Now the question for the week is;

Global Temporary Table is global for the database or for the instance?

When this question is asked during my classes/workshops, most common answer for this is, it is global to the database. However, it is not the correct answer, It is global to the entire instance. Once a global temporary table is created, it can be accessed, modified and deleted by any user in any database hosted in the same instance.

Here is a sample code for that;

-- Connect with AdventureWorks database
USE AdventureWorks2014;
GO

-- Create global temporary table
CREATE TABLE ##TestTable
(
 Id int PRIMARY KEY
 , Name varchar(200)
);
GO

-- Query the table in the same session
SELECT * FROM ##TestTable;

Here is the result of SELECT.


Here is the result when accessing the table in a different session.


And here we are accessing the table from a different database. It works without any issue.


Saturday, May 23, 2015

Should I consider Collation when creating a temporary table?

We always create temporary tables for manipulating data for temporary processes and reports. However, unawareness of collation related issues with tempdb might introduce unexpected result with processes and reports, should we really consider it?

What is collation? Collation refers to set of rules that describe how to store and compare data. These rules includes order of characters, case-sensitivity of them, accent-sensitivity of them, kana character types and character width. SQL Server supports on setting collation at instance level, database level or even at column level.

Now, why we need to consider tempdb collation? Have a look on following example. It creates a Database. Note that collation of it is SQL_Latin1_General_CP1_CS_AS (Case Sensitive). Then the table is created and two sample records are added. Assume that this is the business requirement and last SELECT should be based on case and it returns only one record.

USE master;
GO

-- creating database with case-sensitive collation
CREATE DATABASE TestDatabase100 COLLATE SQL_Latin1_General_CP1_CS_AS;
GO

USE TestDatabase100;
GO

-- create a table
CREATE TABLE dbo.TestTable
(
 Id int PRIMARY KEY
 , Name varchar(100)
);
GO

-- insert two records
INSERT INTO dbo.TestTable 
VALUES
(1, 'abc'), (2, 'ABC');

-- Looking only for the second record
-- and this will return ONLY second one
-- as database is case sensitive (collation
-- for the columns are derived from top level)
SELECT * FROM dbo.TestTable WHERE Name = 'ABC';

Assume that same logic has to be implemented with another code using a temporary table, as a part of newly created database.

USE TestDatabase100;
GO

-- create a temporary table
CREATE TABLE #TestTable
(
 Id int PRIMARY KEY
 , Name varchar(100)
);
GO

-- Load data from original table
INSERT INTO #TestTable 
SELECT Id, Name FROM TestTable;

-- Check for result
SELECT * FROM #TestTable WHERE Name = 'ABC';

This will be the result of this.



It returns both records instead of last records. The reason for this behavior is, tempdb collation. Since temporary table has been created inside the tempdb, collation of the tempdb has been used instead of collation of user-defined database. This is why we need to consider the collation of tempdb.

Below code shows how to check the collation of tempdb and how to create the temporary table properly.

-- Checking collation of tempdb
SELECT name, collation_name
FROM sys.databases
WHERE name = 'tempdb'
GO

-- create the temporary table properly
CREATE TABLE #TestTable
(
 Id int PRIMARY KEY
 , Name varchar(100) COLLATE SQL_Latin1_General_CP1_CS_AS
);
GO


How to decide the number of data files required for tempdb?

If your tempdb is used heavily and experiences some slow performance with it, adding multiple data files might help to improve the overall performance of tempdb. Having multiple data files addresses issues related IO restrictions and avoids latch contention during Page Free Space (PFS) scans when temporary objects are created and dropped. But how do we decide the number of files should be added to the tempdb?

There is no specific formula to calculate it but a recommendation. Too much of files are not recommended at all as it might degrade performance rather improving performance. General rule is;

1 file per core or 0.25 file per core if as number of core increases. 

Optimal configuration can only be identified by doing some tests during peak hours but better to follow below guidelines for configuring your tempdb;

If the system has <= 8 cores, then number of data files = number of cores
If the system has > 8 cores, then number of data files = 8.


Fact Table: Clustered index on date key is enough or partitioning should be done with it?

Fact table in a data warehouse (or a large table in an OLTP database) always holds millions of records and we keep on adding records largely. Generally, it is discouraged to add indexes and constraints to Fact Tables but with modern way of implementing data warehouse / BI solutions, if the table is open for business users for analysis, having indexes on Fact Tables would be beneficial. However, whether it is a traditional implementation or modern implementation, it is still recommended to have a clustered index on most frequent key used for analysis and loading which is Date Key.

While discussing Design Considerations of Fact Tables, a question came up: Clustered index on date key is enough or should we partition the table based on date key too? Do we get additional performance benefits by partitioning on the same?

In terms of performance, if the table is not significantly large, I doubt that clustered index on date key and partitioning on the same will give extra performance benefits. But if it is a very large table, performance of data retrieval will be surely improved. Does this mean that Partitioning is not required for Fact Tables?

Let's analyze with a slightly large table. Below code shows a structure of InternetSales table that contains 7 millions records. It has no indexes, not even a clustered index.

-- Table structure
CREATE TABLE dbo.InternetSales(
 OrderDateKey int NOT NULL,
 ProductKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
) ON PRIMARY;
GO

If business user runs a query like below against the table, you can witness the number of pages (IO) have to be read for satisfying the query.

SET STATISTICS IO ON
GO

SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;


If we add a clustered index on Date Key and then test the same;

CREATE CLUSTERED INDEX ix_InternetSales
ON InternetSales (OrderDateKey);
GO

SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;

We will see a significant difference between previous number of pages read and number of pages read now.



Now the question is, partitioning this table will give more benefits or not. Let's partition and see.

-- Create a partition function
CREATE PARTITION FUNCTION pf_InternetSalesDate (int)
AS RANGE RIGHT
FOR VALUES ('20100101', '20110101', '20120101', '20130101', '20140101')
GO

-- Create the scheme
CREATE PARTITION SCHEME ps_InternetSalesDate
AS PARTITION pf_InternetSalesDate
TO (HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, CurrentFileGroup)
GO

-- Dropping clustered index for partitioning
DROP INDEX ix_InternetSales ON InternetSales;
GO

-- Recreating with Partitioning scheme
CREATE CLUSTERED INDEX ix_InternetSales
ON InternetSales (OrderDateKey)
ON ps_InternetSalesDate (OrderDateKey);
go

-- And executing the same Business user's query
SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;

Here is the result of it;


This result does not show much improvement when compare with previous result, hence one can come to a conclusion saying that partitioning is not required when a clustered index is created on date key. But even with this table, we see a slight improvement, means that a large table will definitely benefits with partitioning.

Okay, now what about this table? should we keep partitioning or not? I prefer to have both implementations because Partitioning gives few more benefits in addition to the performance benefit it offers. It offers More Granular Manageability, letting us perform some operation such as indexing, compression and backing up at partition level instead of on the whole table. It improves Data Loading since loading can be managed at partition levels. Considering all of these, should not we create the clustered index on date key as well as partition the table using date key?


Monday, May 18, 2015

Analysis Services, Integration Services and Reporting Services menu items in the Connect menu are disabled, what is the reason?

If you need to connect with Business Intelligence services via Management Studio, you use Connect menu in the Object Explorer. Generally this is what you see with it;



However, what if you see them disabled and cannot use them for connecting with services like SSAS, SSIS and SSRS? There can be few reasons but obvious reason would be Not Installing  required components with SQL Server installation. How do we know whether we have installed them or not?

For that, you need to read the Configuration.ini file in the log folder. Navigate through C:\Program Files\Microsoft SQL Server\120 (your instance version)\Setup Bootstrap\Log\ folder and open the log folder related to the installation. You should see Configutation.ini file in the folder.



If you open the file and look for FEATURES, you should see all installed client components. If you do not see AS, RS, and IS, it means you have not installed them during client components installation. 



If you need them, installation has to be run again for client components and make sure you select Management Tools - Basic and Management Tools - Complete.



Sunday, May 17, 2015

SQL Server Brain Basher of the Week #012 - Partitions

While I was discussing Partitioning with one of my discussions in this week, one enthusiast asked whether how many partitions can be created per table. Before answering, I asked the same from the audience; and thought to make it as SQL Server Brain Basher of the Week.

What is the maximum number of partitions can be added to a SQL Server table?

There are two answers for this because it is based on the installation; whether it is 32-bit or 64-bit. You can create 15,000 partitions if it is 64-bit version of SQL Server. If it is 32-bit version of SQL Server, only 1,000 partitions can be created.

Note that Partitioning is available only with Enterprise Edition and Developer Edition.

For more info related to Maximum Capacity Specification, refer: https://technet.microsoft.com/en-us/library/ms143432(v=sql.120).aspx.

Saturday, May 16, 2015

What is a Junk Dimension?

Dimension represents entities that will be used for analyzing facts in the data warehouse. There are different type of dimensions like Regular, Slowly Changing, Self-Referencing and Junk. Many know about all of these except Junk.

Junk dimension is a dimension that combines multiple dimensions that have attributes with low cardinality and only few discrete values. This makes the process clean and reduce the complexity of the dimensional model.

Here is an example. Assume that three dimensions have been identified during requirement gathering process and they hold small number of values;


Rather than maintaining three tables for these three dimension, one dimension table can be introduced combining all values;


It represents all possible combinations of three dimensions and can be used with Fact tables without any issue. This dimension is called as Junk Dimension.

Thursday, May 14, 2015

What are the reasons for introducing a surrogate key to a dimension table?

When designing a dimension table such as Product and Employee in a data warehouse, a surrogate key is introduced for implementing an unique column for identifying each row uniquely in the entity. Even though we usually see this with almost data warehouses, it is not a must to implement a surrogate key. If the key of the source table related to the entity can be used as the key column of the dimension table, there is no necessity to introduce another column. How do we decide? What are the factors for deciding whether we need a new surrogate key for the dimension table or we can use the source key as the key column in the dimension table?


Here are some factors that can be used for deciding whether you need a new surrogate key (or artificial key) for your dimension table;

  1. Key of the source is a wide string such as CUS01X5679. It is not recommended to use a string data type as the dimension key unless it is fixed and small.
  2. Key of the source is a GUID. This is a 16-bytes value and it is too lengthy for the dimension key. In addition to that, if GUIDs in the source are not sequentially generated, it is not good for the key at all.
  3. Dimension table has to be loaded from multiple data sources. If table has to be populated from multiple sources, there is a chance to get same key from two differences sources for same record. This make impossible to make the source key as the dimension key because two records cannot be added with the same key.
  4. Dimension table is a SCD type 2 dimension. If the dimension table is a Slowly Changing Type 2 dimension, it needs to maintain same record with multiple versions addressing changes and maintaining the history. If the sources key is used as the dimension key, it is not possible to make another version for the same record with the same key.
  5.  Source reuses the key for new items. Some systems allow to delete records from entities and insert new records with the same keys that were used with the deleted records. Since the old record is already taken to dimension table, it is not possible to insert the new record with the same key if source key is the dimension key.
If you do not see any of reasons mentioned above, no harm, you can use the source key as the dimension key.

Tuesday, May 12, 2015

How to take a copy of database files while they are being used by SQL Server

I am sure that you have tried this at least once and have face the below issue.


Generally, we cannot copy database files or log files when they re being used by SQL Server, simply when the SQL Server service is running. If we need to get a copy of MDFs and LDFs, what we normally do is, stopping the service and copying files. However this is not possible with production databases, specifically when databases are used by 24x7. What is the solution for this?

There are many third party applications and tools available for getting this done. Some are paid tools and some are free tools. I have used one with SQL Server 2005 and 2008, unfortunately I cannot remember it now. While I am reading blogs, I came across this post: Backup/Copy Files that are "In Use" or "Locked" in Windows (Command Line) which is written on Copying Files. It explains how to use a tool called HoboCopy which is specifically for backing up / copying files but it can be used for copying our SQL Server files while they are being used.

In order to use HoboCopy, download the relevant file for you from this: https://github.com/candera/hobocopy/downloads,

Once downloaded, open Command Prompt as an Administrator and type the command as follows;

HoboCopy "FileFolderPath" "DestinationFolderPath" FileName.

For example, if you want to get the AdventureWorks2014 database file copied, you will be typing as;

HoboCopy "C:\Program Files\Microsoft SQL Server\MSSQL12,SQL2014_86\MSSQL\DATA" "C;\Test" AdventureWorks2014_Data.mdf




Note that, in order to run this command, you need to make sure Volume Shadow Copy and Microsoft Software Shadow Copy Provider services are running.

Monday, May 11, 2015

I want an execution plan optimized for my value, not for the value passed for parameters with first execution.

Since I got few questions on the posts written on stored procedure in last two weeks, thought to make few more posts on the same based on questions asked. One question asked was Can we make sure that the stored procedure is optimized for one range of values, not for exceptions. If we need to get the stored procedure optimized for one set of values, then instruction has to be passed during the creation. We should only do this if we know that there are different best plans for different values.

I have written a post called How to determine whether RECOMPILE is required? If so, at the creation or execution? and it shows the way of seeing whether different best plans are exist for different values. Have a look on it if you want to know the way.

Let's see how we can optimize a procedure for our values, not for values passed with first execution. Below code creates a stored procedure, an index on OrderDate

-- Clearing the cache  
DBCC FREEPROCCACHE  
GO  
  
USE AdventureWorks2014;  
GO  
  
-- Drop if it is exist  
IF OBJECT_ID('dbo.GetOrderDetails', 'P') IS NOT NULL  
 DROP PROC dbo.GetOrderDetails;  
GO  
  
-- Create the procedure  
CREATE PROC dbo.GetOrderDetails @FromDate datetime, @ToDate datetime  
AS  
    SELECT h.* 
 FROM Sales.SalesOrderHeader h
 WHERE h.OrderDate BETWEEN @FromDate AND @ToDate
 ORDER BY h.OrderDate
GO  

-- Create an index on OrderDate
CREATE INDEX IX_SalesOrderHeader_OrderDate ON Sales.SalesOrderHeader (OrderDate DESC)
GO

Nest code executes the stored procedure with three different values. As you see, WITH RECOMPILE has been added to see whether same plan or different plans are used for the execution.

EXEC dbo.GetOrderDetails '2014-06-29T00:00:00', '2014-06-30T00:00:00' WITH RECOMPILE;  
EXEC dbo.GetOrderDetails '2014-06-26T00:00:00', '2014-06-30T00:00:00' WITH RECOMPILE;  
EXEC dbo.GetOrderDetails '2014-06-01T00:00:00', '2014-06-30T00:00:00' WITH RECOMPILE;  
GO

Here are the plans generated.



As you see, plan is same for first two executions, and a different plan is for third execution. The reason for this is, first and second one try to get orders within few number of days and third one is for many days, hence newly added index is used for first and second one with seek operation and clustered index is used for third one with scanning

Assume that most requests made on this procedure are based on small number of days and we need to optimize it for small number of days. If so, we need to alter the procedure instructing that create a plan optimized for a sample range that contains small number of days. If we do not do it and the first execution is based on large number of days, plan will be created for it and the consecutive executions will use the plan created even for small number of days which is not the optimized plan for them.

Okay, how can we instruct then? Here is the way.

-- Alter the procedure  
ALTER PROC dbo.GetOrderDetails @FromDate datetime, @ToDate datetime  
AS  
    SELECT h.* 
 FROM Sales.SalesOrderHeader h
 WHERE h.OrderDate BETWEEN @FromDate AND @ToDate
 ORDER BY h.OrderDate
 OPTION (OPTIMIZE FOR (@FromDate = '2014-06-29T00:00:00', @ToDate = '2014-06-30T00:00:00'))
GO 

-- Execute with a large range and see
EXEC dbo.GetOrderDetails '2014-01-01T00:00:00', '2014-06-30T00:00:00';  
GO

Here is the plan;


I have used OPTIMIZE FOR hint by passing two values for parameters. Note that values are set for a small range and this instructs engine to create the plan based on these values without considering the values submitted with executions. As you see, when the procedure is executed even with a wide range, still engine uses the plan optimized for small range.

Read more on this hint at: https://msdn.microsoft.com/en-us/library/ms181714.aspx.

Sunday, May 10, 2015

SQL Server Brain Basher of the Week #011 - DBCC

An interview question.

What does DBCC stand for?

When classes related to Administration are being conducted, I always ask this question and as expected, I always get the same answer which is wrong. Most think that this is as Database Consistency Checker but that is the oldest way of describing this.

DBCC Command family was originally supporting Consistency related commands. Version upgrades added more and more commands to this family and they were not only for checking the consistency of our databases. Therefore, since SQL Server 2000, this was started referring as Database Console Commands. 


Friday, May 8, 2015

How to determine whether RECOMPILE is required? If so, at the creation or execution?

When we know that the parameters we pass for stored procedures are widely vary, we insist engine to recompile the procedure. But it does not mean that we have to recompile just because of the values are vary, we need make sure that there are different best plans for these values because there can be an instance where we see just a single plan best for all values. Therefore we need a way of seeing that there are different best plans for different values.

What is the simplest way? Easiest way is, execute the stored procedure with multiple different values with RECOMPILE option, enabling Actual Execution Plan. This tells us whether engines makes different plans for different execution indicating whether we need to have RECOMPILE option.

Here is the way;

-- Clearing the cache
DBCC FREEPROCCACHE
GO

USE AdventureWorks2014;
GO

-- Drop if it is exist
IF OBJECT_ID('dbo.GetOrderDetails', 'P') IS NOT NULL
 DROP PROC dbo.GetOrderDetails;
GO

-- Create the procedure
CREATE PROC dbo.GetOrderDetails @Number int
AS
    SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= @Number
GO

-- Enable "Include Actual Execution Plan"
-- and execute all together
EXEC dbo.GetOrderDetails 1 WITH RECOMPILE;
EXEC dbo.GetOrderDetails 10000 WITH RECOMPILE;
EXEC dbo.GetOrderDetails 10 WITH RECOMPILE;
EXEC dbo.GetOrderDetails 1000 WITH RECOMPILE;
EXEC dbo.GetOrderDetails 100 WITH RECOMPILE;
GO

Let's have a look on plans generated;


Since this is a simple stored procedure, we do not see many different plans for values passed. But if you see many different plans and procedure is always executed with values that are vary, it indicates that we need to add RECOMPILE at the creation.

If you see a common plan with 80-90% of sample executions and see one or two different plans only few few executions, it indicates that RECOMPILE is required at the execution only when passing values similar to those executions.


Thursday, May 7, 2015

EXECUTE....... WITH RECOMPILE: Does it discard the previously cached plan?

Not an advance theory, everyone knows that first execution of the stored procedure creates the best plan for the execution and keeps it in the memory for the next execution. The plan is created based on the values passed to parameters with its first execution. This behavior is called as Parameter Sniffing and it is considered as a good thing as long as the cached plan is still good for parameter values passed with the consecutive execution. This behavior becomes bad when a completely odd value is passed for the parameter or the value of the parameter is vary with initial value. In a situation like this, we consider that the cached plan is not the best for this execution, hence we force SQL Server engine to recompile the plan with new values.

Forcing is done with RECOMPILE option. This option can be added to the stored procedure at its creation, resulting recompilation with all executions or can be added to the execute statement, resulting recompilation only with the current execution. Now the question is, when we force to recompile the stored procedure and generate a new plan by adding RECOMPILE option, does it discard the previously cached plan? Does it keep the new plan for next execution?

Let's test this. Following code adds a stored procedure and executes it three times. Then it checks cache.

-- Clearing the cache
DBCC FREEPROCCACHE
GO

USE AdventureWorks2014;
GO

-- Drop if it is exist
IF OBJECT_ID('dbo.GetOrderDetails', 'P') IS NOT NULL
 DROP PROC dbo.GetOrderDetails;
GO

-- Create the procedure
CREATE PROC dbo.GetOrderDetails @Number int
AS
    SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= @Number
GO

-- Execute three times
EXEC dbo.GetOrderDetails 1;
GO 3

-- Seeing the cache
SELECT p.usecounts, p.cacheobjtype, p.objtype, t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t
WHERE p.usecounts > 0 
 AND TEXT LIKE '%Sales.SalesOrderDetail%'
 AND TEXT NOT LIKE '%dm_exec_cached_plans%'
ORDER BY p.usecounts DESC;



As you see, plan has been created with its first execution and it has been used three times. Now let's try to use a different plan for executing the procedure. Let's force engine to recompile and check the cache again.

-- Execute with RECOMPILE
EXEC dbo.GetOrderDetails 1000 WITH RECOMPILE;
GO 

-- Checking the cache
SELECT p.usecounts, p.cacheobjtype, p.objtype, t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t
WHERE p.usecounts > 0 
 AND TEXT LIKE '%Sales.SalesOrderDetail%'
 AND TEXT NOT LIKE '%dm_exec_cached_plans%'
ORDER BY p.usecounts DESC;


This shows the same plan and no new plan. What does it mean? If a new plan is created, why cannot we see it? If no new plan is created, has the old plan been used? If so, why we do not see an increase on usercounts for the plan? Answer is simple. RECOMPILE creates a new plan based on passed values but it does not keep it and it does not remove the old one from the cache. The newly created plan is only for this execution.

Wednesday, May 6, 2015

How to check parameters values used for compiling the stored procedure?

It is a known fact that the initial plan of the stored procedure is created with its first execution. SQL Server engines creates the plan based on the values passed for parameters with stored procedure's first execution and places the plan in the cache. This plan will be reused with consecutive executions regardless of the values passed to parameters. This behavior is called as Parameter Sniffing, and it helps to improve the overall performance of the stored procedure's execution as well as downgrade the performance of it. For more info on this, read one of my old post: Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?

With consecutive executions, if you want to see the values that have been used for compiling the procedure, or in other words, parameters values used for the plan in the cache, what is the easiest way of seeing it? Simplest way is, get the Properties of last component in the plan and see Parameter List property. Here is the way of doing it;

First, let's create a procedure and execute once.

USE AdventureWorks2014;
GO

-- Drop if it is exist
IF OBJECT_ID('dbo.GetOrderDetails', 'P') IS NOT NULL
 DROP PROC dbo.GetOrderDetails;
GO

-- Create the procedure
CREATE PROC dbo.GetOrderDetails @Number int
AS
    SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= @Number
GO

-- Execute once. This adds the plan which is optimized for 
-- parameter value 1.
EXEC dbo.GetOrderDetails 1;

Now let's execute it with a different value enabling Include Actual Execution Plan.


Open the Execution Plan tab and click on SELECT.



Properties window can be opened by either pressing F4 or Right-Clicking and Selecting Properties. Once it is opened, You will see property called Parameter List and it has the values used for compiling the stored procedure and values used for current execution.


Is this useful? Yes, if you are planning to recompile or want to know whether the plan is optimized for new executions, this holds some of required information you need on it.


Tuesday, May 5, 2015

SQL Server exams are getting retired within next 12 months

New exams are getting added and older exams are either getting revised or retired. Microsoft Learning has decided to retire some exams including few SQL Server exams and has planned dates for the retirement. Here are the SQL Server exams under this retirement plan;




Note that, if one of above exam is part of your certification, this does not mean that your certification is getting invalidated, it is still valid. If you have done one of these exams, do not worry, it will still display in your transcript even after the retirement. If you are planning to take one of these, do it before the retirement date.

For more info (to see other exams under this plan), visit: https://www.microsoft.com/learning/en-us/retired-certification-exams.aspx.

Sunday, May 3, 2015

SQL Server Brain Basher of the Week #010 - Best way to send datetime value

Exchanging data between systems is not an uncommon scenario and it has become a part of most of the business. When exchanging data, the structure and format of the output is sometime determined before exchanging or sometime destination just accepts as it comes. Transferring date/time values is always getting involved with these exchanges and in a way, it is the most common attribute that we frequently troubleshoot as it introduces issues during the transfer, hence thought to add something on it as this week's Brain Basher.

What is the best of sending a datatime value to another system?

There are many ways of transferring datetime values. Transfer it as a string value, as a numeric value based on a predetermined format, or a string value based on a predetermined format are some common methods. However, though we try to manage it our best, still there is a chance for identifying the datetime value incorrectly when transferring from one system to another. What is the solution for this? What if we can use a format that can be recognized by any database management system?

There is a way of passing datatime values formatted with a standard that can be recognized by almost all DBMSs. It is called ISO 8601 standard. Almost all system support functionalities for retreiveing stored data/time values under this standard. Once the value is received by the destination, destination system can recognize the pattern without misinterpreting the value. This is how it has to be formatted;



Here is a simple code that shows how to convert a datetime value to ISO 8601 and how to accept an ISO 8601 formatted value.

-- Converting to ISO 8601
SELECT CONVERT(char(23), GETDATE(), 126);

-- Retrieving ISO 8601 formatted value
DECLARE @StringValue varchar(30) = '2015-05-03T10:50:00.000'

-- This returns "May" recognizing middle part as the month
SELECT DateName(m, Convert(datetime, @StringValue));

Once I discussed this with one of the sessions hosted at SS SLUG. Here is the post I made on it: http://dinesql.blogspot.com/2013/12/best-way-to-pass-datetime-values-to-sql.html.