Monday, January 27, 2014

SQL String concatenation with CONCAT() function

We have been using plus sign (+) operator for concatenating string values for years with its limitations (or more precisely, its standard behaviors). The biggest disadvantage with this operator is, resulting NULL when concatenating with NULLs. This can be overcome by different techniques but it needs to be handled. Have a look on below code;

-- FullName will be NULL for
-- all records that have NULL
-- for MiddleName
SELECT 
    BusinessEntityID
    , FirstName + ' ' + MiddleName + ' ' + LastName AS FullName
FROM Person.Person
 
-- One way of handling it
SELECT 
    BusinessEntityID
    , FirstName + ' ' + ISNULL(MiddleName, '') + ' ' + LastName AS FullName
FROM Person.Person
 
-- Another way of handling it
SELECT 
    BusinessEntityID
    , FirstName + ' ' + COALESCE(MiddleName, '') + ' ' + LastName AS FullName
FROM Person.Person

SQL Server 2012 introduced a new function called CONCAT that accepts multiple string values including NULLs. The difference between CONCAT and (+) is, CONCAT substitutes NULLs with empty string, eliminating the need of additional task for handling NULLs. Here is the code.

SELECT 
    BusinessEntityID
    , CONCAT(FirstName, ' ', MiddleName, ' ', LastName) AS FullName
FROM Person.Person

If you were unaware, make sure you use CONCAT with next string concatenation for better result. However, remember that CONCAT substitutes NULLs with empty string which is varchar(1), not as varchar(0).

Friday, January 24, 2014

Business Intelligence: Dimensional Modeling - Do we need Operational Codes as Dimension Attributes?

Business User: I need the product code as product name, whenever I use product dimension, system should default to the code, not the name, we all are familiar with codes, not names.

Business Analyst: Will it be an issue for other users? Specifically the top management? I am sure that they do not want to see the revenue by code but revenue by product name.

Business User: I do not think that it is an issue, besides, we will be using the system not them, they will check main KPIs occasionally.

Business Analyst: BI solution is for all, not for one level, we will have the code for product dimension but let’s default to the name not the code, users who are familiar with codes can still go through codes, system facilitates …………

Business User: I think you don’t get what I say, isn't your duty managing users’ requirements……….

Okay, this could be a sort of conversation you will be having during the requirement gathering process or you might already have had something like this. How do you manage it? How did you manage it? What should be the best way of implementing this?

Dimension attributes are the keys of business intelligence solutions. The usability and understandability of attributes (same goes to dimensions too) measures the success of business intelligence solutions. If business users cannot understand or they cannot use the existing attributes for their analysis, the rejection rate goes up, making the solution less gravitative.

Here is an example: General Date dimension contains Year, Quarter, Month and Date. Some solution requires Week too. Once implemented, if bi-weekly analysis appears as another requirement, how it should be addressed? One solution is, have the week as the filter and filter out weeks for making its appearance as bi-weekly attribute. Though we provide a solution, business user sees it as technical solution, not as a business solution. The demand for bi-weekly attribute comes as no surprise. In the absence of business-user-aligned attributes, in the absence of business user’s most friendly attributes, solution becomes less useful.

Let’s go back to the conversation. Business Users insists the codes as the default attribute for product dimension. In most cases, the reason is, she/he comes from operational world. However, one user’s or one department’s requirement does not represent the entire organization’s requirement. Business Intelligence solution is not for a single person, not for one department. Arguing on this is not the way but explaining the advantages of usage of user-friendly textual, descriptive words for attributes will sort the matter. Explaining how these codes are inevitable for unnecessary inconsistencies will sort the matter. This does not mean that attributes such as product codes should completely ignored. This could be useful when preparing specific reports to operational business processes, this could be useful when communicating back to operational sources. Therefore make sure all attributes related to dimension are implemented and make most meaningful attribute as the default attribute. Not only that, make sure you spend more time on understanding attributes, naming them with accurate business terminology, and make them more robust.

Do not forget that neither measure or KPI can be analyzed without dimensions attributes. That is the starting point of all analysis, means that the success of a Business Intelligence solution is determined by dimensions’ attributes.

Wednesday, January 22, 2014

Monday, January 20, 2014

Can a query derive benefit from a multicolumn index for single-column filtering?

Assume that an index has been created using three columns. Will SQL Server use the index if the 2nd column is used for filtering?

I asked this question recently at a session, as expected, answer of majority was “No”. Many think that multicolumn index is not beneficial unless all columns are used for filtering or columns from left-to-right, in order, are used. Order is important but SQL Server do leverage the index even for 2nd and 3rd columns. If your answer for the question above is “No”, here is a simple example for understanding it.

USE AdventureWorksDW2012
GO
 
-- creating an index using 3 columns
CREATE INDEX IX_DimProduct_1 ON dbo.DimProduct
    (EnglishProductName, WeightUnitMeasureCode, SizeUnitMeasureCode)
 
-- filter using the first column
SELECT ProductKey, EnglishProductName
FROM dbo.DimProduct
WHERE EnglishProductName = 'Road-750 Black, 44'
 
-- filter using the second column
SELECT ProductKey, EnglishProductName
FROM dbo.DimProduct
WHERE WeightUnitMeasureCode = 'G'
 
-- filter using the third
SELECT ProductKey, EnglishProductName
FROM dbo.DimProduct
WHERE SizeUnitMeasureCode = 'CM'
 
-- cleaning the code
DROP INDEX IX_DimProduct_1 ON dbo.DimProduct

This code creates an index using three columns. As you see, the SELECT statements use the columns used for the index; first SELECT uses EnglishProductName, second uses WeightUnitMeasureCode, and third uses SizeUnitMeasureCode. Have a look on query plans;

image

Plans clearly show that SQL Server leverages the index for all three SELECT statements. However, note the way it has been used. For the first SELECT, Index Seek has been used but Index Scan for the second and third. This means, filtering with left-most columns gets more benefits but filtering with right-most columns do not get it optimally.

See below query, filtering starts on the left-most column and two columns are used. As you see, it is “Index Seek”; Index is used optimally.

-- filter using the first and second columns
SELECT ProductKey, EnglishProductName
FROM dbo.DimProduct
WHERE EnglishProductName = 'Road-750 Black, 44' 
    AND WeightUnitMeasureCode = 'G'

image

Tuesday, January 14, 2014

Why column aliases are not recognized by all clauses?

Column aliases are commonly used for relabeling columns for increasing the readability of SQL statements. However aliases cannot be referred with all clauses used in SQL statements. Have a look on below code and its output.

1 SELECT 2 GroupName AS DepartmentGroup 3 , COUNT(Name) AS NumberOfDepartment 4 FROM HumanResources.Department 5 GROUP BY DepartmentGroup

Output:
Msg 207, Level 16, State 1, Line 5
Invalid column name 'DepartmentGroup'.

As you see, GROUP BY clause does not recognize the alias “DepartmentGroup” used for GroupName column. Not only GROUP BY, WHERE and HAVING clauses do not recognize aliases too. You can simply refer the same column name with these clauses without thinking much, however knowing the reason for this will definitely add something to your knowledgebase. Not only that, it will help you to properly construct your SQL statements too.

Possible ways of aliasing columns
There are three ways of aliasing columns with identical output results. One method is to use AS keyword, next is to use equal sign (=), and the other way is positioning the alias immediately following the column name.

1 SELECT 2 GroupName AS DepartmentGroup 3 , COUNT(Name) AS NumberOfDepartment 4 FROM HumanResources.Department 5 GROUP BY GroupName 6 7 SELECT 8 DepartmentGroup = GroupName 9 , COUNT(Name) AS NumberOfDepartment 10 FROM HumanResources.Department 11 GROUP BY GroupName 12 13 SELECT 14 GroupName DepartmentGroup 15 , COUNT(Name) AS NumberOfDepartment 16 FROM HumanResources.Department 17 GROUP BY GroupName

In terms of performance, there is no difference between these three methods but readability. Most prefer AS keyword and it is the recommended way by industrial experts.

Why all clauses cannot refer aliases added?
This is because of the logical order query processing. Clauses like GROUP BY, WHERE, and HAVING are processed prior to SELECT. Therefore the aliases used are unknown to them. Below image shows a SELECT statement. Elements in it are numbered based on the processing order.

Order

Note that you can refer aliases with ORDER BY. The reason for it is, ORDER BY (7) is processed after the SELECT (6). Read more on this at: http://dinesql.blogspot.com/2011/04/logical-query-processing-phases-in.html.

Aliases for expressions: Should I write it again with GROUP BY?
We assign aliases for expressions used in SELECT. However, as the alias cannot be referred with a clause like GROUP BY, same expression has to be duplicated which increases the cost of maintainability. It can be overcome by implementing a table expression without hindering the performance.

1 -- expression is set with 2 -- both column and group by 3 SELECT 4 YEAR(OrderDate) OrderYear 5 , SUM(SubTotal) TotalAmount 6 FROM Sales.SalesOrderHeader 7 GROUP BY YEAR(OrderDate) 8 9 -- expression is set only with 10 -- the column 11 SELECT 12 OrderYear 13 , SUM(SubTotal) TotalAmount 14 FROM (SELECT 15 YEAR(OrderDate) OrderYear 16 , SubTotal 17 FROM Sales.SalesOrderHeader) Sales 18 GROUP BY OrderYear
Aliases for tables
It is possible to have aliases for tables too. It does not support all three ways but AS keyword and adding right after the table name. Here is an example;

1 SELECT Orders.OrderDate 2 , Orders.SubTotal 3 FROM Sales.SalesOrderHeader AS Orders 4 5 SELECT h.PurchaseOrderNumber OrderNumber 6 , p.Name Product 7 , d.LineTotal 8 FROM Sales.SalesOrderHeader h 9 INNER JOIN Sales.SalesOrderDetail d 10 ON h.SalesOrderID = d.SalesOrderID 11 INNER JOIN Production.Product p 12 ON d.ProductID = p.ProductID

Sunday, January 12, 2014

SQL Server Management Studio – Five ways of executing a code

How many ways you have used for executing a T-SQL code in Management Studio? I am sure you have used two well-known options but other three options are unknown to many. Here are all five ways;

  1. Clicking Execute button in SQL Editor toolbar.
  2. Pressing F5 key in keuboard.
  3. Pressing CTRL+E shortcut key.
  4. Pressing CTRL+SHIFT+E shortcut key.
  5. Pressing ALT+X shortcut key.

Analysis Services: Exception of type 'System.OutOfMemoryException' was thrown.

There could be many reasons for this but in most cases, this misleads us, specifically if the MDX is running with Management Studio. You see that there is enough memory, and you are confident of the MDX, but why the Management Studio throws this?

Reason is simple. This Knowledge base article explains it;

SSMS is a 32-bit process. Therefore, it is limited to 2 GB of memory. SSMS imposes an artificial limit on how much text that can be displayed per database field in the results window. This limit is 64 KB in "Grid" mode and 8 KB in "Text" mode. If the result set is too large, the memory that is required to display the query results may surpass the 2 GB limit of the SSMS process. Therefore, a large result set can cause the error that is mentioned in the "Symptoms" section.”

How do we make sure that MS throws the exception because of its limitations, not because of anything else. Simplest way is, get the code executed using another technique. It can be a .NET code using a provider that supports executing MDX against SSAS or could be a SSIS package. I used a simple SSIS package that has a connection to source and data reader destination. Here is the way;

  1. Create a new SSIS project and add an OLE DB Connection to SSAS.
    image
  2. Make sure you add “Format=Tabular” for Extended Properties in Connection Manager.
    image
  3. Add a data flow task and configure an OLE DB Source.
  4. Set the SSAS connection made to OLDE DB Source and add the MDX statement as SQL Command.
  5. Add a Data Reader destination and connect with the source.
  6. Execute and see.
    image

If the code gets executed without any issue, it means you are hit by SSMS limitation not because of anything else.

Versions and Service Packs of SQL Server (2012, 2008 R2, 2008, 2005)

Have you installed all service packs and necessary patches for all your SQL Server instances? Check the version with this table, if not, get them downloaded and installed.

SQL Server Code Name Initial Version Service Packs
2014

Hekaton

12.00.2000.8

CU1: 12.0.2342.0
CU3: 12.0.2402
2012

Denali

11.00.2100

SP1: 11.00.3000
SP1: CU#7: 11.0.3393
SP1: CU#8: 11.0.3401
SP1: CU#9: 11.0.3412
SP2: 11.0.5058
SP2: HotFix – 11.0.5522
SP2: CU#1: 11.0.5532

2008 R2

Kilimanjaro

10.50.1600.1

SP1: 10.50.2500
SP2: 10.50.4000
SP2: CU#11: 10.50.4302
SP2: CU#12: 10.50.4305
2008

Katmai

10.00.1600.22

SP1: 10.00.2531
SP2: 10.00.4000
SP3: 10.00.5500
SP3: CU#14: 10.00.5848
2005

Yukon

9.00.1399.06

SP1: 9.00.2047
SP2: 9.00.3042
SP3: 9.00.4035
SP4: 9.00.5000
SP4: CU#3: 9.00.5266

Here are couple of ways of determining the version of your SQL Server instance;

1. SQL Server Management Studio: Object Explorer

image

2. @@VERSION global variable

image

3. SERVERPROPERTY function

image

Why SQL Server splits the page? I am changing an integer value.

If someone asks “Why SQL Server splits pages”, I know that all of you say “If an insert or update requires more space, SQL Server splits the page and moves half of the records to the new page”. It is true. This makes the table externally fragmented too. However, can there be a scenario which an change of integer value splits the page? There can be, and here it is.

Before that, if you need to understand page-split and fragmentation, read this: http://dinesql.blogspot.com/2013/08/understanding-index-fragmentation-ss.html.

We will start with a simple code. Have a look on below code;

USE tempdb
GO
 
-- creating test table
CREATE TABLE dbo.TestTable
(
    Id int PRIMARY KEY
    , Value char(4000) NOT NULL
)
GO
 
-- inserting 2 records
INSERT INTO dbo.TestTable
    VALUES 
    (2, replicate('a', 4000)), (5, replicate('b', 4000))
 
-- check records with page numbers
SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph

I have used two functions for getting page numbers related to inserted records. For more info on these two functions, read this: http://dinesql.blogspot.com/2013/12/finding-pages-or-physical-location-of.html

Here is the result of the SELECT;

image

As you with the result, both records are in same page, and becuase of the size of records, page is almost full. Let’s do a modification to Value column first and see whether it causes a page split.

UPDATE dbo.TestTable
    SET Value = replicate('e', 4000)
WHERE id = 2
 
SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph

image

Result is same. Getting 4000 bytes as a changed value has not required additional space and page split has not occurred. Now let’s make a change on Id column.

UPDATE dbo.TestTable
    SET Id = 3
WHERE id = 2
 
SELECT Id, Value, ph.*
FROM dbo.TestTable
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph

image

Page split has occurred. Why SQL Server needs space and splits the page ONLY IF IT IS AN INTEGER COLUMN? This is what you have understand. Note that this is not just an integer column. It is the PRIMARY KEY; The clustered key. What happened was, SQL Server updated the Value column as an in-place update. It does not require additional space. However SQL Server does not perform in-place update for key columns. It performs delete-plus-insert operation for key column updates. It requires space for the INSERT. That is the reason for the page split. Understand that this behavior is not only for integer, it is for all key columns.

I will make another post on in-place update and delete-plus-insert operation.

Wednesday, January 1, 2014

Ordering the result of Views – SS SLUG Dec 2013 – Brain Bashers - Demo VIII

First of all, Happy New Year 2014! May your all dreams come true in 2014!

This post is related to the presentation “SQL Brain Bashers” I did at SQL Sever Sri Lanka User Group Meeting. This topic is very old but still it needs to be discussed for beginners; it is all about having ORDER BY with VIEWs.

Here is the question related to this demo;

Does SQL Server sort the resultset returning from a view based on ORDER BY clause added?

Answer is “NO”. Have a look on below code and then see the explanation.

USE AdventureWorks2012
GO
 
-- this throws an error
-- read the error properly
CREATE VIEW v_SalesOrdersDetails
AS
SELECT SalesOrderID, SalesOrderDetailID
    , CarrierTrackingNumber, OrderQty
    , ProductID, LineTotal
FROM Sales.SalesOrderDetail
ORDER BY ProductID DESC
GO
 
-- this creates the view
CREATE VIEW v_SalesOrdersDetails
AS
SELECT TOP 100 PERCENT 
    SalesOrderID, SalesOrderDetailID
    , CarrierTrackingNumber, OrderQty
    , ProductID, LineTotal
FROM Sales.SalesOrderDetail
ORDER BY ProductID DESC
GO
 
-- this query does not sort the
-- result by product id
SELECT * FROM v_SalesOrdersDetails 
 
-- this sorts as we have added
-- to the query
SELECT * FROM v_SalesOrdersDetails 
ORDER BY ProductID DESC 

Here is the result;

View

As you see, the ORDER BY clause we have added to the view has no effects.

Let’s try to understand this behavior. Remember the error thrown from first try. It clearly says that ORDER BY cannot be used with VIEWs. Looks weird :)? There is nothing wrong with it. View is supposed to represent a table. A table is a logical entity that does not hold ordered rows, hence view cannot hold ordered rows too. However SQL Server allows to use ORDER BY clause with views when TOP or FOR XML is used. Again, remember, though you have used ORDER BY with TOP, it does not sort the result, it uses ORDER BY for completing TOP operation only (Note: Generally it uses ORDER BY for two purposes; Which rows to be picked for the TOP, and order the result).

There is a misconception that adding TOP 99.99 PERCENT sorts the result set based on ORDER BY added. The fact is, it is wrong too, SQL Server does not guarantee a sorted resultset. Not only that, the result set might not contain all records returned.