Dinesh's Blog :::: Being Compiled ::::: 2015

Thursday, December 31, 2015

Query Shortcuts in Management Studio

The most friendly database tool, Management Studio was released with SQL Server 2005 and it has become the right-hand for both database administrators and developers. SQL Server Management Studio is an integrated environment for doing all types of operations and management related to SQL Server and it is a rich tool equipped with many functionalities and an editor. Although we have been using it for long time, Query Shortcuts are still unknown to many. Here is a post on it, and you will surely find the usefulness of them.

Query Shortcuts allows you to configure key combinations for executing your common commands. By default, three shortcuts are already configured and can be used immediately. Seeing configured shortcuts and new shortcut configuration can be done with Options interface which can be opened with Tools menu.

As you see, three shortcuts are already configured. If you press Ctrl+1 in Query Editor, sp_who procedure will be automatically executed and you will see the result of it.

In addition to that, we can configure our own short cuts. Here are some example;

"SELECT * FROM " is configured with Ctrl+5 and "EXEC" is configured with Ctrl+6. This allows us to highlight either a table name or stored procedure and get it either queried or executed using assigned shortcut. For example, if Person.Person is selected and Ctrl+5 is pressed, the statement SELECT * FROM Person.Person will executed.

Make sure you do not assign shortcuts for all operations. Operations like delete should not be configured because you can accidentally delete records once configured.

SQL Server CPU usage: Recording and monitoring with many more server statistics

There are many ways of collecting statistics of SQL Server such as CPU usage, IO made, reads and writes. One facility given by SQL Server is, an Extended Stored Procedure, sp_monitor. This procedure can be used for seeing statistics related to resource use as SQL Server keeps these information using system statistical functions. This procedure shows values since SQL Server was restarted and values since last run sp_monitor was run.

For more details: https://msdn.microsoft.com/en-us/library/ms188912.aspx

Here is the result of the procedure;

This has been mainly given to DBAs to have a quick look. Because of that, collecting them regularly for future analysis is bit difficult. Format is not much user-friendly and returns multiple resultsets, hence calling it using TSQL and saving it in a table is not much easy. However, it can be called and save the result using simple c#.net code. Let's make a table for collecting them first.

-- create a database for holding data
CREATE DATABASE ServerMain;
GO

-- create a table for holding data
USE ServerMain;
GO

CREATE TABLE dbo.ServerStatistics
(
 DateRun datetime primary key
 , CPU_Busy int not null
 , IO_Busy int not null
 , Packet_Received int not null
 , Packet_Sent int not null
 , Packet_Errors int not null
 , Total_Reads int not null
 , Total_Writes int not null
 , Total_Errors int not null
 , Connections int
);
GO
-- Procedure for inserting data
CREATE PROCEDURE dbo.AddServerStatistics @DateRun datetime, @CPU_Busy int, @IO_Busy int
     , @Packet_Received int, @Packet_Sent int, @Packet_Errors int, @Total_Reads int
     , @Total_Writes int, @Total_Errors int, @Connections int
AS
BEGIN

 INSERT INTO dbo.ServerStatistics
  (DateRun, CPU_Busy, IO_Busy, Packet_Received, Packet_Sent
  , Packet_Errors, Total_Reads, Total_Writes, Total_Errors, Connections)
 VALUES
  (@DateRun, @CPU_Busy, @IO_Busy, @Packet_Received, @Packet_Sent
  , @Packet_Errors, @Total_Reads, @Total_Writes, @Total_Errors, @Connections)
END
GO

and here is the c#.net code for calling this procedure and saving the result.

DateTime DateRun = DateTime.Now;
            int CPU_Busy = 0;
            int IO_Busy = 0;
            int Packet_Received = 0;
            int Packet_Sent = 0;
            int Packet_Errors = 0;
            int Total_Reads = 0;
            int Total_Writes = 0;
            int Total_Errors = 0;
            int Connections = 0;
            string commandText = "sp_monitor";

            using (SqlConnection connection = new SqlConnection(@"Data Source=(local)\SQL2014;Database=ServerMain;Integrated Security=true;"))
            {
                using (SqlCommand command = new SqlCommand(commandText, connection))
                {
                    command.CommandType = CommandType.StoredProcedure;
                    connection.Open();

                    SqlDataReader reader = command.ExecuteReader();

                    while (reader.Read())
                    {

                        DateRun = Convert.ToDateTime(reader[1]);

                    }

                    reader.NextResult();
                    while (reader.Read())
                    {
                        CPU_Busy = Convert.ToInt32(reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).Substring(0, reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).IndexOf(")")));
                        IO_Busy = Convert.ToInt32(reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).Substring(0, reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).IndexOf(")")));
                        
                    }

                    reader.NextResult();
                    while (reader.Read())
                    {
                        Packet_Received = Convert.ToInt32(reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).Substring(0, reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).IndexOf(")")));
                        Packet_Sent = Convert.ToInt32(reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).Substring(0, reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).IndexOf(")")));
                        Packet_Errors = Convert.ToInt32(reader[2].ToString().Substring(reader[2].ToString().IndexOf("(") + 1).Substring(0, reader[2].ToString().Substring(reader[2].ToString().IndexOf("(") + 1).IndexOf(")")));
                        
                    }

                    reader.NextResult();
                    while (reader.Read())
                    {
                        Total_Reads = Convert.ToInt32(reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).Substring(0, reader[0].ToString().Substring(reader[0].ToString().IndexOf("(") + 1).IndexOf(")")));
                        Total_Writes = Convert.ToInt32(reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).Substring(0, reader[1].ToString().Substring(reader[1].ToString().IndexOf("(") + 1).IndexOf(")")));
                        Total_Errors = Convert.ToInt32(reader[2].ToString().Substring(reader[2].ToString().IndexOf("(") + 1).Substring(0, reader[2].ToString().Substring(reader[2].ToString().IndexOf("(") + 1).IndexOf(")")));
                        Connections = Convert.ToInt32(reader[3].ToString().Substring(reader[3].ToString().IndexOf("(") + 1).Substring(0, reader[3].ToString().Substring(reader[3].ToString().IndexOf("(") + 1).IndexOf(")")));
                        
                    }

                    connection.Close();
                }

                commandText = "AddServerStatistics";
                using (SqlCommand command = new SqlCommand(commandText, connection))
                {
                    command.CommandType = CommandType.StoredProcedure;

                    SqlParameter parameterDateRun = new SqlParameter("DateRun", SqlDbType.DateTime);
                    parameterDateRun.Value = DateRun;
                    command.Parameters.Add(parameterDateRun);

                    SqlParameter parameterCPU_Busy = new SqlParameter("CPU_Busy", SqlDbType.Int);
                    parameterCPU_Busy.Value = CPU_Busy;
                    command.Parameters.Add(parameterCPU_Busy);

                    SqlParameter parameterIO_Busy = new SqlParameter("IO_Busy", SqlDbType.Int);
                    parameterIO_Busy.Value = IO_Busy;
                    command.Parameters.Add(parameterIO_Busy);

                    SqlParameter parameterPacket_Received = new SqlParameter("Packet_Received", SqlDbType.Int);
                    parameterPacket_Received.Value = Packet_Received;
                    command.Parameters.Add(parameterPacket_Received);

                    SqlParameter parameterPacket_Sent = new SqlParameter("Packet_Sent", SqlDbType.Int);
                    parameterPacket_Sent.Value = Packet_Sent;
                    command.Parameters.Add(parameterPacket_Sent);

                    SqlParameter parameterPacket_Errors = new SqlParameter("Packet_Errors", SqlDbType.Int);
                    parameterPacket_Errors.Value = Packet_Errors;
                    command.Parameters.Add(parameterPacket_Errors);

                    SqlParameter parameterTotal_Reads = new SqlParameter("Total_Reads", SqlDbType.Int);
                    parameterTotal_Reads.Value = Total_Reads;
                    command.Parameters.Add(parameterTotal_Reads);

                    SqlParameter parameterTotal_Writes = new SqlParameter("Total_Writes", SqlDbType.Int);
                    parameterTotal_Writes.Value = Total_Writes;
                    command.Parameters.Add(parameterTotal_Writes);

                    SqlParameter parameterTotal_Errors = new SqlParameter("Total_Errors", SqlDbType.Int);
                    parameterTotal_Errors.Value = Total_Errors;
                    command.Parameters.Add(parameterTotal_Errors);

                    SqlParameter parameterConnections = new SqlParameter("Connections", SqlDbType.Int);
                    parameterConnections.Value = Connections;
                    command.Parameters.Add(parameterConnections);

                    connection.Open();

                    command.ExecuteNonQuery();

                    
                    connection.Close();
                }
            }

If you continuously run this code or schedule for every one hour, you have enough of data to see how busy is your SQL Server and whether it needs more resources or not. Here is a simple chart created using SSRS, based on the saved data.

Let's categorize users involved in Reporting Scenarios

We either author or consume reports which makes us a part of some Reporting Scenario. Regardless of the activities users perform, users can be fallen into common role based on their involvement. Here are some of the roles noticed and identified, this will be useful specially on documentation and making proposals.

Database/Application Developer

A professional software engineer who works mainly on databases and building application. This person creates reports as a part of application development and will work more on formal/tabular reports.
BI Developer

A professional BI developer creates special reports such as analytical reports, interactive reports and dashboard reports connecting with data marts and data warehouses. BI developer works more on data warehousing and BI than general OLTP database related solutions.
IT Professional

This role plays a technical role that always involves with infrastructure such as servers and network. This role includes job type such as database administrator, IT administrator and they rarely work with reports. However, they might involve with creating reports for their work and requirement received from other users, specifically business users.
Power User

Power user is not considered as a technical professional but a business user who have thorough knowledge on the domain and business. This role is played by professionals like business analyst or account who do not have much knowledge on development but uses special tools for creating their own reports with the knowledge they have for analysis they want to perform.
Information Worker
Just like Power User, Information Worker a business user who involves only with consumption. This person uses data available with reports for doing various analysis mainly for decision-making and hold no knowledge on technical implementations of them.

There can be many other roles and classifications in terms of report development and consumption. Any missing common ones?

Wednesday, December 30, 2015

How to limit concurrent connections to the instance temporarly

This is based on a question I was asked today on concurrent connections. For running some procedures and making sure that no one can access the server while they are running but few, how to instruct SQL Server to allow only n number of connection?

Simply, if you want to limit your instance to limited connection, it can be configured with a property in Connection section;

By default, it is 0, means unlimited but limited to connections allowed based on version and edition. If you want to make sure that only 3 concurrent connections can be made, set this as 3 and restart the service. Same can achieve using sp_configure stored procedure too.

EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE ;
GO
EXEC sp_configure 'user connections', 3 ;
GO
RECONFIGURE;
GO

Note that even we execute reconfigure, it needs a restart for setting the value as running value.

Once this is set, when you try to make this 3rd connection, you will get this messege;

Tuesday, December 29, 2015

Split values in a string and get them into varibles using xp_sscanf extended stored procedure

While I was writing some Hive queries for getting words from a lengthy string, I came across a useful extended stored procedure that allows us to split values in a string considering space as the separator. It has some limitations but it can be useful in some scenario.

Assume that you have a string value related to route calls like;

'947773333666 91333555444 1400 WE345TT094 CUS5629 CUS2792'

And you want to read it as;

CallerId, ReceiverId, Duration, Code, DirectInCustomer, DirectOutCustomer.

Then this extended stored procedure can be used for splitting it and storing them in variables. Have a look on the code;

DECLARE @CallerId varchar(20)
DECLARE @ReceiverId varchar(20)
DECLARE @Duration varchar(20)
DECLARE @Code varchar(20)
DECLARE @DirectInCustomer varchar(20)
DECLARE @DirectOutCustomer varchar(20)

DECLARE @Record varchar(1000) = '947773333666 91333555444 1400 WE345TT094 CUS5629 CUS2792'

exec master..xp_sscanf 
   @Record
   , '%s %s %s %s %s %s'
   , @CallerId OUTPUT
   , @ReceiverId OUTPUT
   , @Duration OUTPUT
   , @Code OUTPUT
   , @DirectInCustomer OUTPUT
   , @DirectOutCustomer OUTPUT

SELECT @CallerId As CallerId, @ReceiverId As ReceiverId, @Duration As Duration
 , @Code As Code, @DirectInCustomer As DirectInCustomer, @DirectOutCustomer As DirectOutCustomer

This can be really useful when you have a large recordset with values like above and they need to be split.

Monday, December 28, 2015

GUID or Uniqueidentifier as the key - II - Issues with it

My first post related to GUID discussed the main issue we face when using GUID as the key, specifically as the Clustered Key. Here is another concern we should consider when maintaining a table with GUID as the key. This issue does need to be considered as a critical issue if the table contains low number of records. But if it contains millions of records, this is something we need to consider for sure.

Let's create two tables just as the first post. The below code creates two tables named dbo.Sales_WithGUID and dbo.Sales_WithoutGUID and loads same data set for both tables.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithGUID;
GO

-- Creating the table
CREATE TABLE dbo.Sales_WithGUID
(
 SalesKey uniqueidentifier PRIMARY KEY DEFAULT(NewID()),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

INSERT INTO dbo.Sales_WithGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
FROM AdventureWorksDW2014.dbo.FactInternetSales;


IF OBJECT_ID('dbo.Sales_WithoutGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithoutGUID;
GO

-- Create the second table
CREATE TABLE dbo.Sales_WithoutGUID
(
 SalesKey int identity(1,1),
 CompanyCode char(2) Default ('SL'),
 constraint pk_Sales_WithoutGUID primary key (SalesKey, CompanyCode),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

INSERT INTO dbo.Sales_WithoutGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
FROM AdventureWorksDW2014.dbo.FactInternetSales;

Let's check the space used by both tables.

As you see, first table which is based on GUID consumes 10.7MB for data and 80KB for indexes. Second table which is based on int+char(2) as the key consumes 10.0MB for data and 56KB for indexes. Now, let's additional non-clustered indexes for both tables.

CREATE INDEX IX_Sales_WithGUID_ShipDateKey ON dbo.Sales_WithGUID (ShipDateKey);
CREATE INDEX IX_Sales_WithGUID_CurrencyKey ON dbo.Sales_WithGUID (CurrencyKey);
CREATE INDEX IX_Sales_WithGUID_CustomerKey ON dbo.Sales_WithGUID (CustomerKey);
CREATE INDEX IX_Sales_WithGUID_DueDateKey ON dbo.Sales_WithGUID (DueDateKey);
CREATE INDEX IX_Sales_WithGUID_OrderDateKey ON dbo.Sales_WithGUID (OrderDateKey);
CREATE INDEX IX_Sales_WithGUID_ProductKey ON dbo.Sales_WithGUID (ProductKey);
CREATE INDEX IX_Sales_WithGUID_PromotionKey ON dbo.Sales_WithGUID (PromotionKey);

CREATE INDEX IX_Sales_WithoutGUID_ShipDateKey ON dbo.Sales_WithoutGUID (ShipDateKey);
CREATE INDEX IX_Sales_WithoutGUID_CurrencyKey ON dbo.Sales_WithoutGUID (CurrencyKey);
CREATE INDEX IX_Sales_WithoutGUID_CustomerKey ON dbo.Sales_WithoutGUID (CustomerKey);
CREATE INDEX IX_Sales_WithoutGUID_DueDateKey ON dbo.Sales_WithoutGUID (DueDateKey);
CREATE INDEX IX_Sales_WithoutGUID_OrderDateKey ON dbo.Sales_WithoutGUID (OrderDateKey);
CREATE INDEX IX_Sales_WithoutGUID_ProductKey ON dbo.Sales_WithoutGUID (ProductKey);
CREATE INDEX IX_Sales_WithoutGUID_PromotionKey ON dbo.Sales_WithoutGUID (PromotionKey);

Here is the result now.

Focus on the space used for indexes. First table uses 11MB while second table uses only 7MB. Note that this is only for 60,000 records but if the table has more records, gap will be significantly huge.

How can it be happened when we have same number of records and same types of indexes. This is because of every non-clustered key value is maintained with the clustered key. When it comes to the first table, since it is 16 bytes, the size of the non-clustered key value will be 16+4. But with the second table, it is 6+4 bytes. This is the reason for high usage of the spave for non-clustered indexes.

Here is the code that shows the number of data pages used by each index. Large number of pages increases IO operations and usage of resources.

SELECT OBJECT_NAME (i.object_id) AS TableName 
 , i.name AS IndexName
 , ps.index_id AS IndexID 
 , ps.alloc_unit_type_desc AS DataStructure
 , ps.page_count AS Pages
 , ps.record_count AS Rows
FROM sys.indexes AS i
 CROSS APPLY sys.dm_db_index_physical_stats 
  (DB_ID ()
  , i.object_id
  , NULL
  , NULL
  , N'DETAILED') AS ps
WHERE i.object_id = ps.object_id
  AND i.index_id = ps.index_id
  AND i.object_id 
   IN (OBJECT_ID (N'Sales_WithGUID')
    , OBJECT_ID (N'Sales_WithoutGUID'))
  AND ps.index_level = 0 -- only lef level
ORDER BY TableName, IndexID;
GO

Now you see issue we may face when using GUIDs, hence use it knowing these issues and manage it properly if it is really required.

Sunday, December 27, 2015

SQL Server Brain Basher of the Week #037 - TSQL - Deleting duplicates

Deleting records is a very common activity performed against databases and everyone is familiar with DELETE statement. Here is a question based on DELETE statement.

What is the easiest way of deleting duplicated records from a table if you can identify the duplicated records by a column?

There can be multiple ways of doing it. Before seeing the solution, let's make a table with duplicates.

IF OBJECT_ID('dbo.ProductCode') IS NOT NULL
 DROP TABLE dbo.ProductCode;
GO

CREATE TABLE dbo.ProductCode
(
 ProductCode char(5)
 , ProductType char(5)
);
GO

INSERT INTO dbo.ProductCode
VALUES
 ('PR001', 'T0001')
 , ('PR001', 'T0001')
 , ('PR002', 'T0002')
 , ('PR003', 'T0003')
 , ('PR001', 'T0001')
 , ('PR004', 'T0004')
 , ('PR003', 'T0003')

Here are two ways of deleting records, both work fine, however first method might give better performance over a large dataset.

-- Here is the way of deleting duplicates
WITH cte AS
(
 SELECT *, 
  ROW_NUMBER() OVER (PARTITION BY ProductCode ORDER BY ProductCode) AS RowNumber
 FROM dbo.ProductCode)
DELETE FROM cte
WHERE RowNumber > 1 ;

-- Checking records
SELECT * FROM dbo.ProductCode
ORDER BY ProductCode;

-- Here is another way of deleting them
SET ROWCOUNT 1
WHILE 1=1
BEGIN

 DELETE 
 FROM dbo.ProductCode
 WHERE ProductCode IN (SELECT ProductCode
       FROM dbo.ProductCode
       GROUP BY ProductCode
       HAVING COUNT(*) > 1)
 IF @@ROWCOUNT = 0
  BREAK;
END
SET ROWCOUNT 0

-- Checking records
SELECT * FROM dbo.ProductCode
ORDER BY ProductCode;

Saturday, December 26, 2015

GUID or Uniqueidentifier as the key - I - Issues with it

While analyzing a database related to one of the companies, I noticed that the key of most tables is based on GUID (uniqueidentifier) that drastically hinders the performance unless it is managed properly. Unfortunately, as per my experience in consultancy, 90% of databases that use GUID as the key for tables are not properly managed and many are unaware of the damages it does though they experience poor performance on data manipulation and retrieval.

Most reason out that they need GUID as the key because of a need of an uniqueness among records specifically when records are joined from multiple companies or regions. But with many cases, it can be handled without using Unqiueidentifier but using a composite key that is formed using an integer value and a code that represents the company or region. Understanding the cost of it and understanding how it slows down your queries will surely guide you to take the right decision, hence here is a post on it.

Let's try to understand the most common issue related to GUID as the key. Following code creates a table called Sales_WithGUID and loads data from AdventureWorks. The key of the table, which is SalesKey is a GUID. Note that I do not load records as a batch, but load one by one simulating the real operation.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithGUID;
GO

-- Creating the table
CREATE TABLE dbo.Sales_WithGUID
(
 SalesKey uniqueidentifier PRIMARY KEY DEFAULT(NewID()),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

-- Loading records one by one
DECLARE cur CURSOR 
FOR
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  FROM AdventureWorksDW2014.dbo.FactInternetSales;

DECLARE @ProductKey int 
  , @OrderDateKey int
  , @DueDateKey int
  , @ShipDateKey int 
  , @CustomerKey int 
  , @PromotionKey int 
  , @CurrencyKey int 
  , @SalesTerritoryKey int 
  , @SalesOrderNumber nvarchar(20) 
  , @SalesOrderLineNumber tinyint 
  , @RevisionNumber tinyint 
  , @OrderQuantity smallint 
  , @UnitPrice money 
  , @ExtendedAmount money 
  , @UnitPriceDiscountPct float 
  , @DiscountAmount float 
  , @ProductStandardCost money 
  , @TotalProductCost money 
  , @SalesAmount money 
  , @TaxAmt money 
  , @Freight money 
  , @CarrierTrackingNumber nvarchar(25) 
  , @CustomerPONumber nvarchar(25)  
  , @OrderDate datetime  
  , @DueDate datetime  
  , @ShipDate datetime

OPEN cur

FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_WithGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
 VALUES
  (@ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
    , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
    , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
    , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
    , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate)

 FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate
END

-- Cleaning
CLOSE cur
DEALLOCATE cur

Let's create another table called Sales_WithoutGUID and loads the same. Note that its primary key is a combination of int and char column. This is what I explained above and suggested to the company: using two columns as a composite primary key rather than using uniqueidentifier. Using this composite key, company can be identified and records can be uniquely identified.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithoutGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithoutGUID;
GO

-- Create the second table
CREATE TABLE dbo.Sales_WithoutGUID
(
 SalesKey int identity(1,1),
 CompanyCode char(2) Default ('SL'),
 constraint pk_Sales_WithoutGUID primary key (SalesKey, CompanyCode),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

-- Loading
DECLARE cur CURSOR 
FOR
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  FROM AdventureWorksDW2014.dbo.FactInternetSales;

DECLARE @ProductKey int 
  , @OrderDateKey int
  , @DueDateKey int
  , @ShipDateKey int 
  , @CustomerKey int 
  , @PromotionKey int 
  , @CurrencyKey int 
  , @SalesTerritoryKey int 
  , @SalesOrderNumber nvarchar(20) 
  , @SalesOrderLineNumber tinyint 
  , @RevisionNumber tinyint 
  , @OrderQuantity smallint 
  , @UnitPrice money 
  , @ExtendedAmount money 
  , @UnitPriceDiscountPct float 
  , @DiscountAmount float 
  , @ProductStandardCost money 
  , @TotalProductCost money 
  , @SalesAmount money 
  , @TaxAmt money 
  , @Freight money 
  , @CarrierTrackingNumber nvarchar(25) 
  , @CustomerPONumber nvarchar(25)  
  , @OrderDate datetime  
  , @DueDate datetime  
  , @ShipDate datetime

OPEN cur

FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_WithoutGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
 VALUES
  (@ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
    , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
    , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
    , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
    , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate)

 FETCH NEXT FROM cur INTO @ProductKey, @OrderDateKey, @DueDateKey, @ShipDateKey, @CustomerKey, @PromotionKey
      , @CurrencyKey, @SalesTerritoryKey, @SalesOrderNumber, @SalesOrderLineNumber, @RevisionNumber
      , @OrderQuantity, @UnitPrice, @ExtendedAmount, @UnitPriceDiscountPct, @DiscountAmount
      , @ProductStandardCost, @TotalProductCost, @SalesAmount, @TaxAmt, @Freight
      , @CarrierTrackingNumber, @CustomerPONumber, @OrderDate, @DueDate, @ShipDate
END

-- Cleaning
CLOSE cur
DEALLOCATE cur

Now let's analyze and see. The below code has two SELECT statements that check the fragmentation of both tables. As you see, first table is Externally Fragmented and second table is not fragmented. This is the biggest and most common issue with GUIDs. Since there is no sequential values when generating, when inserting records, in order to maintain the clustered key order, SQL Server splits data pages for moving records to new page, causing External Fragmentation (Read more on fragmentation at: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html). Because of this, reading causes accessing larger number of data pages, slowing down the query and consuming resource.

Other issue is the space required. As you see, second table requires only 10MB for maintaining records while first table requires 15MB.

There is another considerable impact on non-clustered indexes if you, will make another post on it.

As you see the disadvantages with GUID now, try to minimize the usage of it if possible. There is a workaround if the order of the key is only the issue. The function NEWSEQUENTIALID can be used instead of NEWID for generating GUIDs but still the second issue discussed is remained unsorted.

SELECT * FROM sys.dm_db_index_physical_stats 
 (DB_ID(), OBJECT_ID(N'dbo.Sales_WithGUID', N'U'), 1, NULL, 'DETAILED')
SELECT * FROM sys.dm_db_index_physical_stats 
 (DB_ID(), OBJECT_ID(N'dbo.Sales_WithoutGUID', N'U'), 1, NULL, 'DETAILED')

EXEC sp_spaceused N'Sales_WithGUID';
EXEC sp_spaceused N'Sales_WithoutGUID';

Friday, December 25, 2015

What is Predicate Logic and how it is applied to SQL Server?

We have been working with databases and SQL for long time but some of the important theories and concepts are purposely ignored assuming that they are not crucial and can complete the implementation without any issue. But understanding them and applying to implementations make our solutions perfect and less error prone because operations implemented flow on correct way producing the right result. Do you know that terms like SET Theory, Predicate Logic are not much known among engineers including database engineers?

Let's try to understand Predicate Logic in simple form. It is a mathematical basis for the relational database model and as per the theory, it is an expression or property that is either true or false. It is sometime considered as a Boolean Expression too.

Although the theory describes two possible outcomes (true or false), unknown values are not uncommon in database world, hence it needs to be considered. This extends the outcomes of Predicate Logic to three: True, False or Unknown.

How can we apply this to a database? Definitely for comparison, simply as an expression that allows us to evaluate it to true, false or unknown. Remember, unknown is null in database model.

Predicates do not stand alone when implementing with SQL Server databases. It is always a part of the query and following are the possible roles it can play;

Filtering data using WHERE or HAVING clauses
Handling the flow with conditional logic using IF or CASE expressions
Joining tables using ON filter
Enforcing entity, domain and referential integrity (Example: CHECK, FOREIGN KEY constraints)

Here is an example;

USE AdventureWorks2014;
GO

SELECT *
FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2013
 AND (SubTotal > CASE MONTH(OrderDate)
      WHEN 2 THEN 2500
      ELSE 3000
      END);

Thursday, December 24, 2015

Firewall Settings for Azure SQL Server is available with Management Studio now

If you are working with Azure SQL Databases, you are well aware that Azure SQL Server needs your IP address for allowing you to access the Azure SQL database. Although it is considered as a one-time-setting, it is not if the IP address of your machine gets frequently changed or dynamically assigned from a service like DHCP. In a situation like that, we have to always open the portal and add the IP address via Firewall Settings of Azure SQL Server which is time-consuming and inconvenient.

Good news! Now this setting is available with Management Studio and if it detects that the current IP address is no longer a valid one for accessing, it allows you to login to Azure Subscription and add the current IP address to the Firewall Settings. This is really a good facility and saves time. Remember, it is available only with Management Studio 2015 December Preview.

You can download it from: https://msdn.microsoft.com/en-us/library/mt238290.aspx

Wednesday, December 23, 2015

"Failed to retrieve data for this request" "unknown property". (Microsoft.SqlServer.Management.Sdk.Sfc) : Microsoft SQL Server Management Studio

Are you getting this error when trying to retrieve stored procedures added to Azure V12 SQL database through Management Studio November 2015 Preview?

Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)

------------------------------

ADDITIONAL INFORMATION:

unknown property IsNativelyCompiled (Microsoft.SqlServer.Management.Sdk.Sfc)

I think that this is only related to SSMS November 2015 Preview because there is no issue with SQL Server 2012 Management Studio. However, if you want to use the latest, install December Preview 2015, it sorts out the issue.

With SSMS November Preview

With SQL Server 2012

With SSMS December 2015 Preview

Tuesday, December 22, 2015

Are you aware of WHERE ""="" is Always True

No surprises, we still see many dynamic SQL codes that are wrongly implemented which allow smart hackers to go into unauthorized databases and manipulate data as they want. There are many SQL Injections that are famous and widely discussed but some are tiny, unnoticed hence ignored. Here is something like that.

WHERE ""="" is Always True. This is something you have to remember. If you have not parameterized your code or have not used relevant technique for executing your dynamically constructed TSQL code, you open your database for unauthorized people. Have a look on below stored procedure.

IF OBJECT_ID('dbo.GetMyOrder', 'P') IS NOT NULL
 DROP PROC dbo.GetMyOrder;
GO

CREATE PROCEDURE dbo.GetMyOrder @PurchaseOrderNumber nvarchar(50)
AS
BEGIN

 DECLARE @Sql nvarchar(1000)
 
 SET @Sql = N'SELECT * FROM Sales.SalesOrderHeader'
 SET @Sql += N'  WHERE PurchaseOrderNumber = ''' + @PurchaseOrderNumber + ''''

 PRINT @Sql
 EXEC (@Sql);
END
GO

This stored procedure returns details of the order based on the supplied purchased order number. If you execute the code (directly using SSMS or via an interface developed), he gets the desired result;

Now what if someone has no purchased order number but still wants to see orders. This is where we can use WHERE ""="" injection. See the code below;

See what has happened. This forms the query like below;

SELECT * FROM Sales.SalesOrderHeader  WHERE PurchaseOrderNumber = '' OR ''='' --'

If someone pass a value like this via an interface, he might be able to see records he is not supposed to see. This can be controlled by parameterizing the query and execute the dynamic code using sp_executesql instead SQL.

IF OBJECT_ID('dbo.GetMyOrder', 'P') IS NOT NULL
 DROP PROC dbo.GetMyOrder;
GO

CREATE PROCEDURE dbo.GetMyOrder @PurchaseOrderNumber nvarchar(50)
AS
BEGIN

 DECLARE @Sql nvarchar(1000)
 
 SET @Sql = N'SELECT * FROM Sales.SalesOrderHeader'
 SET @Sql += N'  WHERE PurchaseOrderNumber = @PurchaseOrderNumber'

 PRINT @Sql
 EXEC sp_executesql @Sql, N'@PurchaseOrderNumber nvarchar(50)', @PurchaseOrderNumber;
END
GO

Done, now it cannot be hacked easily.

Monday, December 21, 2015

SQL Server Management Studio - December 2015 Release

The SQL Server Management Studio - December 2015 release is available for downloading with few enhancements related to November release and some bug fixes. Once installed, you should see the version upgrade from 12.0.2269.0 to 13.0.900.73. Here is the link for downloading;

https://msdn.microsoft.com/en-us/library/mt238290.aspx

Here are changes done (quoted from the downloading page)

Improvements to Execution plan (Showplan) comparison - to enable comparison of the current query execution plan with one saved in a file.
Improved IntelliSense support - for inline columnstore indexes in SSMS.
Bug fix in Extended Events session wizard - to enable selection of templates when connected to an Azure V12 server.
Improvements and bug fixes- to enable functionality to switch to results tab after query execution, and to display un-truncated column headers when SSMS is set to display results in grid format.

Saturday, December 19, 2015

SQL Server Brain Basher of the Week #036 - Analysis Services Dimensions

Here is this week Brain Basher. This is based on Analysis Services Multidimensional model which allows you to create models based on dimension and fact tables. Biggest advantage with multidimensional model is, it creates aggregates (once enabled) based on the hierarchies defined and stores them for improving the performance of data retrieval from the model. The creation of aggregation is done during processing and it extracts data from sources (preferably from the RDW), creates aggregates if it is enabled and then transfers and stores dimension and fact data (row level - based on grain level) including aggregates in the model. Now the question is;

Which dimension type does not calculate aggregates based on the hierarchies defined when the cube is processed?

Calculating aggregates is not applicable for all dimension types. If the dimension is Parent-Child dimension which is a imbalance and changing dimension, the hierarchy structure is calculated at run time, not during processing. This makes the dimension write-enabled, means that records in the dimension table can be updated and updates can be seen in the model without processing the dimension.

Remember, even though this reduces the processing time, it increases the time it takes for retrieving data. If you see this as a disadvantage, you need to handle the dimension as a regular dimension.

Friday, December 18, 2015

SET STATISTICS IO ON : Can you trust the result of this when using scalar functions?

One key thing we look at with the performance tuning on TSQL is, number of data pages read for retrieving data either from memory or disk. We always trust the SET STATISTICS IO ON and take decisions based on the result of it. But this does NOT always give the accurate value and it might mislead you for taking a wrong action on it.

Let's check this and see. The following code creates a scalar function that returns number of products related to the order accepting the SalesOrderId. This reads SalesOrderDetail table for getting the count.

USE AdventureWorks2014_New;
GO

CREATE FUNCTION dbo.GetNofProductsForOrder (@SalesOrderID int)
RETURNS smallint
AS
BEGIN

 DECLARE @NofProducts smallint
 SELECT @NofProducts = COUNT(*) 
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID = @SalesOrderID

 RETURN @NofProducts;
END;

Before using the function, let's see how many pages need to be read for getting data from both SalesOrderHeader and SalesOrderDetail tables. As you see, I use SET STATISTICS IO ON for this.

SET STATISTICS IO ON

SELECT SalesOrderID, SalesOrderNumber, SubTotal
FROM Sales.SalesOrderHeader;
SELECT SalesOrderID, SalesOrderDetailID, LineTotal
FROM Sales.SalesOrderDetail;

Now let's use the function. This code shows some order details along with number of products related to each order. Note the result related to SET STATISTICS IO ON.

SET STATISTICS IO ON

SELECT SalesOrderID, SalesOrderNumber, SubTotal
 , dbo.GetNofProductsForOrder(SalesOrderID) As NofProducts
FROM Sales.SalesOrderHeader;

Since we access both tables, number of pages to be read should definitely be more than 689 but it looks like this shows only pages related to SalesOrderHeader. How this can be happened? I do not have an answer now but will surely find the reason. Now the question is, how can we see the accurate value for this? Is it possible to see it?

Yea it is possible. It can be clearly seen with Profiler. If you analyze SQL:BatchCompleted and SQL:StmtCompleted, you can get the correct value. Here is the accurate value related to the query using Profiler.

Just be aware on this, this can happen statement like TOP too. In a way, it is always better to use Profiler for seeing pages read but still this can be used for small queries with no functions.

Thursday, December 17, 2015

What is Data Wrangling? Is it same as ETLing?

While I was reading a newsletter received from Hortonworks, I noticed an article related to data, titled with Data Wrangling. With my experience, dozens of business intelligence implementations, though I have worked with hetorogenious data sets, I have never used this term, even when discussing ETLing. Asked few, this is an unknown to many, hence thought to make this post, just discussing how I see the Data Wrangling and how I see it differently comapring well known ETLing.

ETL, Extract, Transform and Loading, is a common technique we use with Data Integration (DI), Data Warehousing and Business Intelligence. This is more on structured data with well know data sources and mostly with automated tools. This extracts data from various, scattered systems, and prepares data as rich-consumable and loads to the destination, specifically data warehouse. Data Wrangling does the same but few differences.

[image taken from: http://www.datawatch.com/what-is-data-wrangling/]

Data Wrangling works more on unorganized, unstructured, large data set rather a set of structured data. This talks about a manual process that coverts data from one raw form to another format which is more readable and organized for analyzing data. As per the articles read, the term was introduced by Trifacta that offers a tool to help on this process. More on this, the person who does this process is called as Data Wrangler.

Wednesday, December 16, 2015

Analysis Services Tabular Model Workspace Database Retention: Why it is important?

When creating a Tabular Data Model project, it creates a workspace database in Analysis Services Tabular Instance for maintaining all the data and metadata while it is being developed. This database is an in-memory database and with default setting, it gets dropped from the instance when the project is closed. However this default setting can be changed by setting workspace database retention property, adjusting the behavior of the workspace database. That is why it is important.

Be default, workspace retention is set to Unload from memory. When creating a project, it creates a database in the disk (default location of SSAS instance) as well as a database in-memory using the SSAS Tabular instance. Name of the database is created as ProjectName_UserName_GUID.

Here is the default setting. See the default setting.

Database created in the disk;

Database created in the SSAS Tabular instance;

With this setting, when the project is closed, in-memory database in the instance is removed. But the database in the disk remains and it is used for creating the in-memory database when the project is opening again. Below image shows the instance once the project is close. Note that in-memory database is dropped.

This setting frees up memory allowing system to use memory for other applications but this might slow down re-opening project as it has to build the database again.

If need, you can keep it in the memory without getting it dropped. This is useful if you continuously working with the project. However, be cautious, it does holds the memory consumed without releasing.

If you rarely open the project (usually, after completion of the development), it is better to set workspace retention to Delete workspace as it deletes the database completely.

Local or remote workspace database?
If possible, developer should use a local instance for the workspace database because it provides the best performance. However remote server can also be used with following limitations;

Project cannot be created by importing from PowerPivot template.
No Backup to disk option with Data Backup property.
Slower performance because of the latency.

Monday, December 14, 2015

Analysis Services Tabular Mode: Counting non-numeric columns using COUNT function

When creating calculated columns or measures in Analysis Services Tabular Mode, we use DAX expression language for adding them. DAX offers number of functions for implementing them and almost all general functions such as SUM, COUNT, AVERAGE are exist too. However some functions behave differently when compare these function behavior with other tools. Here is an example;

Assume that you need to get a count of two columns related to a loaded data table. Let's say we have loaded Internet Sales table from AdventureWorksDW2014, and need to create Count of Promotion Key and Count of Sales Order Number (of course, there is no business logic on these two counts, it is only for demo purposes), then we add two measures like below;

CountSalesOrderNumber:=COUNT([SalesOrderNumber])

CountPromotionKey:=COUNT([ProductKey])

Once added, you will see that CountPromotionKey works fine but CountSalesOrderNumber does not work. The reason for this is, Count function in DAX only works with numeric and dates. For counting non-numeric values, COUNTA has to use instead of COUNT.

CountSalesOrderNumber:=COUNTA([SalesOrderNumber])

Note that both functions do not count blank values. If you want to get the count including blank values, use COUNTROWS function. In addition to these, COUNTX and COUNTAX are given too. They are for getting the count based on a filter.

=COUNTAX(FILTER('Reseller',[Status]="Active"),[Phone])

For getting the count only on blank values, use COUNTBLANK.

Sunday, December 13, 2015

SELECT * and SELECT {columns}: Do they always read same number of data pages?

Everyone knows that SELECT * is not a best practice and we always try to avoid it. Some considerable issues related to it are;

Unnecessary columns will be transferred from server to client consuming resources.
Application written may face issues when dropping or introducing columns.

However, when reading the table by SQL Server engine, in most cases, no difference in terms of performance. The reason is, it has to read all data pages for getting records regardless of the number of columns referred in the statement. Have a look on following query and output of it.

As you see, both statements require same number of data pages to be read for satisfying the requests. However this is not always true, have a look on this;

Now it is different. The reason for this is, types of data pages used with this table. This table has an xml column and it is maintained not with data pages, with text/image pages. Because of this, when reading columns excluding columns like xml, max type columns, text, images, engine needs to read only data pages, reducing number of IO operations. This introduces the third reason for not using SELECT *, hence, do not use SELECT * when the table has columns that requires different data page types other than data pages.

Saturday, December 12, 2015

SQL Server Brain Basher of the Week #036 - SQL Server

SQL Server 2014 introduced In-Memory OLTP tables and many of you know the benefits you get with it. This feature allows you to create table in memory either as durable or non-durable and it is only for tables. Do we create in-memory databases? Can we create in-memory databases? Here is this week question based on it?

Which SQL Server component create in-memory databases that are visible for our implementation?

SQL Server Engine
SQL Server Analysis Services
SQL Server Integration Services
SQL Server Reporting Services

This question can be answered quickly if you have worked on Business Intelligence solutions. Yes, it is related to Analysis Services but for a specific type only. Analysis Services allows us to create two types of models; Multidimensional and Tabular. When working with a Tabular data model project, as you have experienced, it needs a workspace database which is created in Analysis Services tabular instance. This database is in-memory database that keep all data and metadata for the project while it is being developed. When the project is closed, it removes from the memory by default, however it can be controlled by setting Workspace Retention property.

Sunday, November 29, 2015

Analysis Services: How to stop showing records when unrelated dimensions are used with measures?

It is very common to see an Analysis Services database (or a cube) that holds multiple measure groups. If your cube holds multiple measure groups, then you have many number of dimensions and they might not link with all measure groups. In that case, what if a dimension not linked (or unrelated) is used when analyzing a measure in particular measure group?

By default, if an unrelated dimension is used with a measure, dimension is forced to use its top level for showing the value. Example, if Sales Amount from Sales Summary measure group is analyzed with Reseller dimension which is unrelated, result will be as below;

This is controllable. We can stop showing records, when a measure is being analyzed by an unrelated dimension. All we have to do is, set false for a measure group property called IgnoreUnrelatedDimensions.

Once processed, you can test the same and result will be something like below;

Saturday, November 28, 2015

Analysis Services: I see a new measure group when a measure is created with Distinct Count

Have you noticed that a new measure group is automatically created if you have created a new measure with Distinct Count usage? Microsoft Analysis Services create this not because of anything else, it is for improving the performance when it comes to specific analysis such as how many sales reps have sold a particular product.

You can create one with Cube Designer -> Cube Structure.

And once they are created, separate measure group are created;

And I will be able to see how many employees have sold a particular product in each year;