Wednesday, September 28, 2016

How to get the most accessed tables from SQL Server database

While discussing with one of my clients, this question raised but I knew that there is no direct way unless we use something related to indexes. So the query is simple, all we have to do is, get the total count of all scans, seeks and lookups for particular indexes;

SELECT 
 t.name AS TableName,
 SUM(u.user_seeks + u.user_scans + u.user_lookups) AS NumberOfTimeAccessed
FROM sys.dm_db_index_usage_stats u
 INNER JOIN sys.tables t 
  ON t.object_id = u.object_id
WHERE database_id = DB_ID()
GROUP BY t.name
ORDER BY SUM(u.user_seeks + u.user_scans + u.user_lookups) DESC

Is this 100% accurate? As per my understanding, this is not 100% accurate. You might get larger numbers when you have multiple indexes but if you all want is to know which tables get more hit, this does a reasonable job.

Does this capture the tables that are based on heap structure? Yes of course, still you see the count on heap tables too.

Tuesday, September 27, 2016

How many times my SQL Server files have grown?

If you have not changed the default Growth Rate settings in SQL Server files, then the default will be applied. SQL Server 2016 is smart, it configures the Growth Rate as 64MB but old versions set this to very low value, resulting too many smaller growth. This slows down the system because the cost of growing a file is very expensive.

How do you check whether the your SQL Server experience too many growths? It is simple with standard reports given. The report Disk Usage shows all occurring of file growth if SQL Server has grown the files since last service restart.


This is what you see.


If you see many growth within a shorter period, you need to change the Growth Rate immediately.

Monday, September 26, 2016

Data Science - Type of Analytics

While discussing with few data enthusiasts, the topic Type of Data Analytics came up and various types were discussed. There were few arguments on categorization but it is clear that the types of Data Analytics are vary and can be defined with many classifications.

However, if someone needs to understand main types of it, what would be the way of categorizing it. As per the discussion we had and the articles I have read, I believe that it can be divided into three;


If we are to summarize this;

Descriptive Analytics is something we always do, we always implement. We try to see the insight based on historical data. You might be analyzing last 6 months supermarket bills and try understand how you have spent, whether you have spent on unnecessary items, whether your spent is worth or not, etc. The owner of the supermarket would do the same. He will check and see how his customers have bought items, any relationships among purchased items, which gives more profits, which do not move, etc. Generally this is the first part of any Business Intelligence implementation and this is what widely used by many users.

Predictive Analytics helps us to see the future. This is also based on historical data but it actually forecasts values for future. This analytics uses historical data along with statistics related to all entities. This is not only for seeing values, it can be used for other processes related to the business. Example, we can use this analytics for finding associations between products and keep those products closer to increase the sale. That is why we see some different names on this such as Market Basket Analytics, Association.

Prescriptive Analytics combination of both above items but it shows few things additionally. It tells you what might happen and what should be done. This analytics is bit rare to see with standard BI applications but many try to implement it. This is referred as Describe - Predict - Prescribe

Sunday, September 25, 2016

SQL Server Brain Basher of the Week #054 - Guest Account

Here is the question of the week. I am sure that you have seen guest user account in your databases. This week question is based on it;

What is Guest User Account and usage of it?

The Guest User Account is a system user account that is available with all user defined databases and it cannot be dropped. By default, it is disabled, it can be enabled using following code.

GRANT CONNECT TO guest;

This user account has no relevant Login Account. SQL Server uses this account when;
  • User connects with SQL Server using the Login and tries to access a database that he has no User account created.
  • Guest User Account is enabled.
In a way, this is good because this allows users to connect with SQL Server using Logins and accesses databases without creating Users but it might introduce some security threats as well. Therefore, it is better to keep it disabled. If you need to enable it, make sure it has least privileges on resources in the database.

Saturday, September 24, 2016

Are you experiencing "Double-Hop" problem with SQL Server?

Without noticing, you may experience double-hop problem with your SQL Server and you may be clueless on it. With this, all you see is, some requests made to SQL Server is not getting satisfied and getting permission related to errors. Let's try to understand what is double-hop and how it can be sorted out with SQL Server.

See the below diagram;


The diagram shows that how user connects with the database using the application. He sends a request via the application and application server satisfies his request using the database hosted in the same machine. In this case, application needs to use user's credentials for connecting with SQL Server hosted is Server 01 and it impersonates the user's account for that. It is the first hop and it works fine.

Now the second hop. SQL Server 01 needs some info in SQL Server 02, hence it forwards to the request with user's credentials but, by default, environment does not allow it. SQL Server 01 has no way of forwarding user's credentials to Server 02 and you will get an error.

How to sort this out?
This requires Delegation to be enabled. Delegation allows Server 01 to forward user's credentials to Server 02 and then Server 02 can execute the code as a distributed query.

You need following to be enabled/configured for Delegation.
  • User must have permission on both Server 01 and Server 02.
  • Either TCP/IP or Named Pipes must be the protocol for accessing SQL Server.
  • The Active Directory property called Account is sensitive and cannot be delegated must not be checked for the user's account.
  • Both Server 01 and Server 02 require registered Service Principal Number (SPN).
  • SQL Server services accounts must be trusted for Delegation.
  • SQL Server 02 must be configured as a Linked Server in Server 01.


Friday, September 23, 2016

SQL Server 2016 - New Releases

There are few new releases related to Microsoft SQL Server in last few days. Here is the list, get them downloaded as you need and install.

Cumulative Update #2 for SQL Server 2016 RTM
Link: https://blogs.msdn.microsoft.com/sqlreleaseservices/cumulative-update-2-for-sql-server-2016-rtm/

SQL Server Management Studio -16.4.1 Release
Link: https://blogs.msdn.microsoft.com/sqlreleaseservices/announcing-sql-server-management-studio-16-4-release/

SQL Server Data Tools 16.4 Release
Link: https://blogs.msdn.microsoft.com/ssdt/2016/09/20/sql-server-data-tools-16-4-release/

Thursday, September 22, 2016

Cannot execute as the database principal because the principal does not exist - Mismatched Security IDs

Are you experiencing this error with newly restored database?

Cannot execute as the database principal because the principal "abc" does not exist, this type of principal cannot be impersonated, or you do not have permission.

SQL Server throws this error when a user tries to execute a code but SQL Server cannot find the  login account. This can happen when a database from an instance (let's say Server-01) is restored to a different instance (Server-02) of SQL Server because it breaks the link between user accounts and login accounts. Example, assume that Server-01 has a database called Sales, and this has a login called Jane and user called Jane in Sales database, linked with Jane Login. If the database is restored to Server-02, you will see above-mentioned error because;
  1. Jane Login is exist in the Server-02 but it has a different SID that does not match with SID of user Jane in Sales database.
  2. Jane Login does not exist.
If the reason is the #1, then you can re-map it or if the reason is #2, then it needs to be created and re-mapped.

Here is an example. This example shows one way of seeing the error and the way of fixing it. The following code;
  • Creates a database named Sales.
  • Creates a login called Jane and add Jane as a user to Sales database.
  • Configure appropriate permissions to Jane for reading data.
  • Take a backup of the Sales database.
  • Delete the Sales database.
  • Delete the Jane Login.
  • Restore the Sales database.
  • Check and see whether Jane can access the database.
  • Create Jane's login again and try.
  • Find out orphaned users.
  • Map Jane's user with the newly created login.
USE master;
GO

-- create the database
CREATE DATABASE Sales
GO

-- create the login
CREATE LOGIN Jane WITH PASSWORD = 'Pa$$w0rd';

-- connect with Sales and add Jane as a user
USE Sales;
GO

-- create user
CREATE USER Jane FOR LOGIN Jane;
-- give permission for reading.
EXEC sp_addrolemember 'db_datareader', 'Jane';

-- change the security context and execute a code
-- Jane can read records
EXECUTE AS USER = 'Jane';
SELECT USER_NAME()
SELECT * FROM sys.objects;
REVERT;

-- Connect to master database
USE master;
GO

-- take a backup and delete the database and login
BACKUP DATABASE Sales TO  DISK = N'D:\MSSQL13.MSSQLSERVER\MSSQL\Backup\Sales.bak' 
WITH NOFORMAT, NOINIT

DROP DATABASE Sales;

DROP LOGIN Jane;

-- restored the database again. Note that
-- this has the user but instance does not have the login
RESTORE DATABASE Sales FROM DISK = N'D:\MSSQL13.MSSQLSERVER\MSSQL\Backup\Sales.bak';

-- Connect with Sales
USE Sales;
GO

-- This throws the error
EXECUTE AS USER = 'Jane';
SELECT USER_NAME()
SELECT * FROM sys.objects;
REVERT;

-- Connect with master and re-create the login
USE master;
GO
CREATE LOGIN Jane WITH PASSWORD = 'Pa$$w0rd';

-- Connect again and check
USE Sales;
GO

-- This throws the error again
EXECUTE AS USER = 'Jane';

-- Checking orphaned users
-- This will show that Jane's user account is orphaned.
SELECT d.* 
FROM sys.database_principals d
 LEFT OUTER JOIN sys.server_principals s
  ON d.sid = s.sid
WHERE s.sid IS NULL and d.type = 'S';

-- Map the Jane's user account with her login
ALTER USER Jane WITH LOGIN = Jane;

Wednesday, September 21, 2016

Declarative Referential Integrity and Procedural Referential Integrity

In order to make the database (or table) consistence, one of the integrity types we use is Referential Integrity. It is implemented using a Foreign Key constraint and it makes sure that update or delete of a referenced value is prevented when it is exist in the foreign key column. This makes sure that we do not see inconsistency with values or records such as An Order without a customer, Customer address with an unknown country because Foreign Key constraint does not allow to insert an Order without setting a valid value to the foreign key column referencing customer table, or it does not allow to set a wrong country in the customer table when it is set as a foreign key to the country table. However, there are some instances that it requires some additional validation based business logic which lead us to implement this in a different way.

Declarative Referential Integrity
This is what we discussed above. The validation required is implemented using Foreign Key constraint to maintain the integrity. This makes sure that referenced value cannot be updated or deleted if it is referenced by a foreign key and it does not allow to have a value in referencing column with a value that is not exist in the referenced column..

Procedural Referential Integrity
This does the same but we handle it as we need, rather passing the responsibility to the DBMS. This is implemented with Triggers. Example, assume that the Customer table has a column called CountryId which is a foreign key, referencing Country table, and when a customer is inserted, if the customer type is web, we need to set the CountryId to -1. Then the logical has to be handled by us and we can implement it with a AFTER TRIGGER. Another example is, assume that CountryId should be set only when the customer has placed certain number of orders for a specific country during insert and update, then we can use INSTEAD OF TRIGGER for checking before the operation as this needs to check with some other tables as well, and if it does not reach the threshold, we can rollback the operation. This is Procedural Referential Integrity.

I have SQL Server Standard Edition. Should I buy CALs for concurrent users or all users connect with it?

Interesting question was asked during the user group meetup today and it was about CALs used with SQL Server 2016 Standard Edition. When we purchase CALs for Standard Edition, should we consider only about concurrent connections or all users we expect to connect with it?

As per the reading I did, it is NOT for concurrent users. It is for all users or devices connect with SQL Server. Assume that you have a web application that consumes data from SQL Server using a single user account. And web application is used by 100 users for doing various business, of course, accessing the database via the application. Then, if the SQL Server is based on CALs, you need to purchase 100 CALs, not 1 CAL.

Remember, CAL can be connected to any SQL Server instance. You do not need two CALs for connecting to two SQL Servers.

Microsoft licensing model was bit complex but now it clearer than it was. This PDF gives you enough information to understand the licensing models.

If I am not mistaken, there is no Microsoft web page to get pricing on SQL Server, you need to talk to your local vendor on it.

Sunday, September 18, 2016

SQL Server Brain Basher of the Week #053 - Data Archiving

We archive data when we see that database has many records, they never get updated and they are rarely read. With traditional archiving way, we take out data and store them in a different database or store them as text files. In most cases, instead of storing exact structure of records, we summarize and store them. However, the traditional archiving process is not sufficient enough for modern requirements.

There are instances you need to read old archived data along with new data. There are instances you need analyze old data for getting some information. Unfortunately, with traditional method, it takes long time for setting up and reading data, and in some cases, not all required information is not available.

Here is the question for the week.

What is the new SQL Server 2016 feature that supports data archiving, addressing new requirements?

The new feature is Stretch Database. This feature allows you to migrate old data into Azure and still be able to query data. It does not require any changes to applications and codes written, end user sees as if they query a on-premise data table.

When configure a table as a Stretch Table, it can be set to "stretch" the entire table or partially, using an inline function that filters data. Once enabled, this feature set some restrictions to the table. You cannot UPDATE/DELETE archived data and uniqueness does not force for archived data via UNIQUE or PRIMARY KEY constraints. For more info on limitations, read: https://msdn.microsoft.com/en-us/library/mt605114.aspx.

You can see whether your table can be stretched or not using Stretch Database Advisor in Microsoft SQL Server 2016 Upgrade Advisor. If the Advisor says that the table can be stretched, then you can go ahead and stretch it.

The steps for making the table stretched is simple.
  1. Check and see whether table can be stretched, using the Advisor.
  2. Right-click on the table, click on the Stretch menu item and follow the pages in the wizard.
  3. You can either select the entire table or click on Entire Table for filtering. If you click on it, you get a screen like below. You can set the function as you want (see below image).
  4. Next screen is for connecting with Azure. Enter your credentials for log in.
  5. There is an option for SQL Server selection. You can either create a new one or use an existing one. Fill necessary details and submit.

Now your table is stretched.






Saturday, September 17, 2016

How SQL Server finds the Execution Plan in the Plan Cache Store

You know that SQL Server creates a plan for your query and keeps them in the cache for reusing when it is required. When you make a request again, how SQL Server knows that the plan is exist in the cache or not? What sort of techniques it uses for finding the plan? Let's try to understand it.

There are four main plan cache stores. These plan cache stores maintain different types of plans and plans are reused when the same request is made by users or plans are discarded when SQL Server needs space in the store (Let's discuss the techniques it uses for discarding with another post). These for stores are;
  • Object plans store for storing plans for procedures, functions and triggers.
  • SQL plans store for storing plans for ad-hoc queries.
  • Bound Trees plan store for storing structures created by Algebrizer for views, defaults and constraints.
  • Extended Stored Procedure plan store for storing plans for Extended Stored Procedures.
A store is a Hash Table and it contains series of Buckets. A Bucket contains zero or more plans and the Hash value is calculated;
  • Using databaseId and objectId for Object plan, Bound Tree plan and Extended Stored Procedure plan stores.
  • Using databaseId and the statement for SQL plan store.
The assignment of execution plan to store bucket is done using the Hash Value. The Plan Handle (or Cache Key) which is made up with various attributes, is used for referencing the plan in the cache.

When you make a request to SQL Server, it does following steps for find the plan in the cache;
  1. Generate the Hash Value for the request for the appropriate store.
  2. Search all execution plans in the bucket along with the cache key.
  3. If found, it uses it, else create a new plan.
For understanding the process, let's check with a simple code;

-- Clear the cache
DBCC FREEPROCCACHE

-- Checking the number of buckets for each store
-- and how many have been used.
SELECT cc.name, buckets_count, buckets_in_use_count
FROM sys.dm_os_memory_cache_hash_tables AS ht 
 JOIN sys.dm_os_memory_cache_counters AS cc 
  ON ht.cache_address = cc.cache_address 
WHERE cc.name IN ('Object Plans','SQL Plans','Bound Trees','Extended Stored Procedures');


SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate = '01/06/2014';

SELECT * FROM Sales.SalesOrderHeader
WHERE OrderDate = '01/07/2014';

-- Check the cached plans
-- You will see two plans for above two statements
SELECT p.*, t.text
FROM sys.dm_exec_cached_plans p  
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t  
WHERE --p.usecounts > 0 AND  
 text LIKE '%SELECT * FROM Sales.SalesOrderHeader%'  
 AND text NOT LIKE '%SELECT p.*%'
ORDER BY p.usecounts DESC; 


-- Check how the cache key is formed
-- Note the two bucket ids
-- And note that only object id is different
-- For SQLs, it is the internal hash of the batch text
-- not an id in sys.objects table
SELECT p.bucketid, t.text  , ph.*
FROM sys.dm_exec_cached_plans p  
 CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) t  
 CROSS APPLY sys.dm_exec_plan_attributes(p.plan_handle) AS ph
WHERE --p.usecounts > 0 AND  
 text LIKE '%SELECT * FROM Sales.SalesOrderHeader%'  
 AND text NOT LIKE '%SELECT p.*%'
 AND is_cache_key = 1
ORDER BY p.usecounts DESC; 



Friday, September 16, 2016

Warnings in SQL Server Query Execution Plans

Have you seen Warnings on operators in Query Execution Plans?


Warnings indicate that Query Optimizer has found something that might affect the performance of the query and you need to work on it. There are various reasons for warnings but typically warning is displayed when;
  • values used with WHERE or JOIN clauses are not same data type. This might prevent statistics from being used correctly and might display 
  • database is with AUTO CREATE STATISTICS OFF and column used with the predicate has no statistics.
  • no ON clause (or equivalent) when two or more tables are referenced in the query.
Here are some code samples on them.

The first code segment shows how warning is shown when data types are not match with JOIN.

USE AdventureWorks2014;
GO

-- create new header table
SELECT *
INTO OrderHeader
FROM Sales.SalesOrderHeader;

-- create new details table
SELECT *
INTO OrderDetails
FROM Sales.SalesOrderDetail;

-- change the data type of SalesOrderID in OrdeDetails table
ALTER TABLE OrderDetails
ALTER COLUMN SalesOrderID varchar(10);

-- Enable Execution Plan
-- Execute the query and see
SELECT *
FROM OrderHeader h
 INNER JOIN OrderDetails d
  ON h.SalesOrderID = d.SalesOrderID


The second code shows how warning is shown when no statistics available.

-- create a new database
CREATE DATABASE TestDatabase
GO

-- turn off auto create statistics
ALTER DATABASE TestDatabase SET AUTO_CREATE_STATISTICS OFF;

-- connect with the database
USE TestDatabase;
GO

-- create a new table
SELECT *
INTO OrderHeader
FROM AdventureWorks2014.Sales.SalesOrderHeader;

-- create a clustered index on newly created table
CREATE CLUSTERED INDEX IX_Sales ON dbo.OrderHeader (SalesOrderID);

-- Execute the query and see
-- Since no statistics on OrderDate column
-- and it does not create
-- you should see a warning
SELECT *
FROM dbo.OrderHeader
WHERE OrderDate = '2014-01-01';


The third shows how warning is shown when no ON clause for tables referenced.

USE AdventureWorks2014;
GO
-- Execute the query and see
-- Since no ON clause, warning is displayed
SELECT *
FROM Production.Product, Production.ProductSubcategory;


Thursday, September 15, 2016

SQL Server: How Optimizer creates a plan for your query

If you have not read previous two posts related this, please have a look on them first;

When a query is submitted, it goes through various phases and finally it comes to Query Optimizer as an Algebrizer Tree including details of physical objects related to the query. The Query Optimizer uses details such as Object Schema (indexes, constraints, data types) and Object Statistics for making plans for the query.

Optimization Phases
In order to reduce the time it takes for creating plans and evaluating them for finding the best, Query Optimizer goes through three phases. There three phases makes sure that it avoids complex plans as much as possible and goes to phases only if it is required. Three phases are:
  1. Simplification
    This tries to make the plan simplify for making the optimization quicker. This does simplifications such as Converting sub-queries to joins, Removing redundant joins.
  2. Trivial plan generation
    For some queries, there will only few possible plans exist. When Query Optimizer experiences it, it picks a single plan without evaluating other plans.
  3. Full optimization
    It comes to this phase if the query has many number of plans and they have to be evaluated for selecting the best. This is called as Full cost-based optimization as well. However, in certain scenario, Query Optimizer does not evaluate all plans because of the cost on resource usage related to evaluation.

    In order to select plans for evaluation, it uses Transformation Rules. The combination of Logical Query Tree (or Algebrizer Tree) and Transformation Rules results an internal structure called Memo. It will be used for calculating the cost.

    There can be an instance that many number of plans are created and even with Transformation Rules, it takes long time for evaluating. Then, SQL Server tries to handle the optimization with three stages; Search 0 - Transaction Processing Phase, Search 1 - Quick Plan Phase, and Search 2 - Full Optimization Phase. When it goes through these stages, if the stage does not return a plan, then it goes to the next one. If none of the stages return a plan, then it picks the plan with lowest cost.
Here are some code samples that explain some phases;

When this code is executed, it ignores Production.ProductCateory table as it is not required. The reason is the Foreign Key Constraint exit between Production.ProductSubCategory and Production.ProductCategory and as per the JOIN condition, it says that no records can exist in the Production.ProductSubCategory without a record in Production.ProductCategory. Therefore Optimizer ignores it. This is an example for Simplification.


Here is another example for simplification. This query has a condition on the Status column and Status column has a check constraint set saying that it should be between 0 and 8. Query Optimizer knows that there cannot be record exist with Status 10 because of the constraint, hence it simplify the query.


Here is one more query that shows the second phase which is Trivial Plan Generation. There are not much options in terms of multiple plans, hence one will be immediately selected.



Wednesday, September 14, 2016

SQL Server Physical Phases of Query Processing

I made a small video that shows Logical Phases of Query Processing and published with the post called SQL Server Logical Phases of Query Processing. Not only logical phases, knowing Physical Phases would definitely help you on Query Optimization because it tells how SQL Server makes plans based on various rules and techniques. Let's try to understand the Physical Phases;
  1. Parsing
    This is the first phase of Query Processing. This phase validates the query syntax and parses it into a Logical Query Tree. The sequence of items in the Logical Query Tree is based on Logical Phases describes with my above mentioned post. If SQL Server finds some syntax errors, then error is thrown without continuing. If no error found, then it goes to the next phase with the Logical Query Tree created. Note that this is also referred as Parse Tree and the component handles this is called as Command Parser.
  2. Binding
    This phase checks and see whether objects in the Logical Query Tree are exist and user has sufficient permissions on them. If objects are not exist or user has no permission, then it throws an error and stops the execution. If you see invalid object name error, it means that your query has stopped at Binding.

    If everything works fine, then the output of this phase which is the Algebrizer Tree that is the Logical Query Tree bound to objects is created. This process is handled by Algebrizer.
  3. Query Optimization
    The Query Optimizer is responsible for this phase. It takes the Algebrizer Tree and uses it with supportive components like Object Schema that are constraints, indexes and data types, and Object Statistics. This makes multiple plans with various steps and picks the best plan for the query.
  4. Query Execution
    Finally Query Executor executes the query based on the plan created by Query Optimizer. Output of this phase is the result we expect.
Let's talk about more on Query Optimizer with the next post.


Tuesday, September 13, 2016

Is TSQL a Declarative language or an Imperative language?

We have been writing TSQL codes for a long time but do you know what type of language TSQL is? It is unknown to many or many know the type of it but do not know what does it mean.

TSQL is a Declarative Language because the TSQL statement that describes the logic does not describe the sequence of execution  of individual elements in the statement for processing the logic. If you have written a TSQL statement, it is not necessary to process the individual clauses in the statement as they appear in the statement. That is how it works and that is why TSQL becomes a Declarative Language. However there are exceptions as well. TSQL has some Imperative Language flavors because it supports statements like WHILE, IF THEN.

The other type, Imperative Language processes the logic as they appear, as they are written. The languages like .NET, C++ are examples for Imperative Languages and commands written are executed in the sequence they appear.

Read more on Imperative Language at: https://en.wikipedia.org/wiki/Imperative_programming
Read more on Declarative Language at: https://en.wikipedia.org/wiki/Declarative_programming


Monday, September 12, 2016

SQL Server Full-Text Search with rank values

Once I wrote a post titled Enabling Fulltext search in Azure SQL Database discussing Full-Text search in Azure. While using it with one of my databases, wanted to show the result of the search ordered by how well they match to the search criteria. In order to sort the result as I need, the best is, get a rank generated for each row and use it for ordering the result. I had used Freetext function for getting the result but if I realized that this cannot be achieved using the Freetext function.

The CONTAINSTABLE and FREETEXTTABLE functions return a column named Rank for showing the rank related to the record based on matching. This can be used get the result sorted based on it, showing most relevant records at the top. Remember, the higher value of the Rank generated indicates the best matching.

Here is a code for showing the way of getting the rank;

-- Creating a table
CREATE TABLE dbo.EmployeeDetails
(
 EmployeeDetailsId int identity(1,1) not null
 , constraint pk_EmployeeDetails primary key (EmployeeDetailsId)
 , WorkingExperience nvarchar(4000) not null
 , ProjectsWorked nvarchar(4000) not null
 , Resume nvarchar(max) 
)
GO

CREATE FULLTEXT CATALOG EmployeeCatelog;
GO

CREATE FULLTEXT INDEX ON dbo.EmployeeDetails 
 (WorkingExperience, ProjectsWorked, Resume) KEY INDEX pk_EmployeeDetails
 ON EmployeeCatelog;
 -- By default CHANGE_TRACKING = AUTO


-- Once enabled, search can be performed;
SELECT *
FROM dbo.EmployeeDetails
WHERE freetext ((WorkingExperience, ProjectsWorked, Resume), 'SQL');

SELECT *
FROM dbo.EmployeeDetails
WHERE freetext ((Resume), 'SQL');

-- Get the rank and sort the result using it
SELECT t.Rank, e.*
FROM dbo.EmployeeDetails e
 INNER JOIN CONTAINSTABLE (dbo.EmployeeDetails, (WorkingExperience, ProjectsWorked, Resume), 'SQL') AS t
  ON e.EmployeeDetailsId = t.[Key]
ORDER BY t.Rank DESC

Sunday, September 11, 2016

Could not retrieve the table information for the connection manager Excel Connection Manager

You may experience the below error if you try to connect with latest version of Microsoft Excel using Integration Services;

Could not retrieve the table information for the connection manager 'Excel Connection Manager'. Failed to connect to the source using the connection manager 'Excel Connection Manager'


This error comes when you try to retrieve Excel Sheets from the connected Excel Workbook. In order to solve this, you need to install correct OLE DB Providers and Drivers for Microsoft Office files. Follow the link below for downloading the correct version;


Note that if you have 32-bit version of Office, then you need 32-bit drivers else you need 64-bit drivers.

Saturday, September 10, 2016

Microsoft Azure Storage Explorer latest version is available

I have been using various third party tools for working with cloud storages and one of the best I have seen is CloudXplorer. It is good as it allows us to work with many platforms without limiting one.

However, if you need a tool only for Azure, then Microsoft Azure Storage Explorer is the best. The new version is available with many new functionalities and worth to try.

The latest version can be download from here.


Sunday, September 4, 2016

SQL Server Brain Basher of the Week #052 - Identity property

The Identity property is not something new and it has been widely used with most of table when a surrogate key is required. It can be simply used with a numeric data type and it inserts a value automatically when an insert operation is done, based on the seed and increment set. The Identity property is used with CREATE TABLE statement like blow;

USE tempdb;
GO

CREATE TABLE dbo.Customer
(
 CustomerId int identity(1,1) primary key
 , CustomerName varchar(200) not null
);

Now the question is, where else you see the keyword Identity with similar functionality?

Many assume that the functionality we get from Identity is only available with CREATE TABLE statement. But there is a function, named as Identity that offers the same functionality.

See the following code. It is SELECT INTO statement that allows us to get some records from a table and create a new table with the recordset. You can use Identity function with SELECT INTO statement for adding a column and populating values as per seed and increment. The Identity function needs three parameters: data type, seed and increment. And the good thing is, it adds the Identity property to the column as well.

USE tempdb;
GO

-- This creates a table with PersonId
-- which based on Identity function
SELECT Identity(int, 1, 1) As PersonId, FirstName, LastName
INTO dbo.Person
FROM AdventureWorks2014.Person.Person;


Saturday, September 3, 2016

Reading records that are not locked - READPAST hint

When a set of records are being updated, with the default Isolation Level which is Read Committed, SQL Server makes sure that no one can read them until the transaction is completed. This avoids Dirty Read concurrency issue but it decreases concurrency.

What if you need to read rest of the records that are no locked? Now the question is, how can we instruct to SQL Server to skip locked records and read other records. It makes possible with READPAST hint.

The READPAST hint allows us to read records by ignoring locked record without waiting till incompatible locks to be released. This is NOT similar to READUNCOMMITTED hint because it reads all records including records that are locked.

Here is an example:

Execute the following query in a new window.
USE AdventureWorks2014;
GO

SELECT * FROM Production.Product;

-- This starts a transaction and locks set of records
BEGIN TRAN
 UPDATE Production.Product
  SET Color = 'Black'
 WHERE Color IS NULL;

-- ROLLBACK TRAN 

Now try following codes with another window
-- This wait until locks are released.
SELECT * FROM Production.Product

-- This returns all records including 
-- records that are being changed.
SELECT * FROM Production.Product WITH (READUNCOMMITTED);

-- This returns all records excluding
-- records that are being changed.
SELECT * FROM Production.Product WITH (READPAST);

See the difference in results. You may find READPAST hint useful with certain scenario.

Friday, September 2, 2016

SQL Server Statistics - When they are automatically getting updated?

You know that SQL Server uses an Execution Plan for every query you execute and the plan is created using multiple factors. Among the factors like Indexes and Statistics, Statistics play a key role because the cost for the plan is decided based on information provided by Statistics.

Statistics provides information on the data distribution in columns, tables and indexes. The Query optimizer uses statistics for estimating number of rows that query could be returned, and then decides the cost for CPU and other resources. Since the optimizer is a Cost-Based-Optimizer, the accuracy of the statistics is matter a lot because the plan selected for the query is based on the cost involved with it.

How often these statistics are getting updated? Since it updates automatically (by default, it is on), do we really need to care about it? Let's do a test and get answers for all the questions. Let's create a table and insert some records.

USE tempdb;
GO

-- DROP TABLE dbo.Person
-- Create the table
CREATE TABLE dbo.Person
(
 PersonId int identity(1,1) primary key
 , FirstName varchar(50) not null
 , LastName varchar(50) not null index IX_Person_LastName nonclustered
)

-- Insert 1000 records from Person.Person table
INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (1000) FirstName, LastName 
FROM AdventureWorks2014.Person.Person

-- Query for one person named Abel and see
-- Enable Execution Plan too.
SET STATISTICS IO ON
SELECT * 
FROM dbo.Person
WHERE LastName = 'Abel'


As you see, it returns only one record and it uses the non-clustered index created for seeing the record. It is the best plan because SQL Server knows that there is only one record for Abel and Seek is the best operation for finding the record. How does it know? Statistics, see below code and the output. It says that there is only one record for Abel.

DBCC SHOW_STATISTICS ('dbo.Person', 'IX_Person_LastName')


Let's add some more records. But this time, I will be adding 699 records (will tell you the reason later) with the same Last Name : Abel.

INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (699) FirstName, 'Abel'  
FROM AdventureWorks2014.Person.Person

Since there are many records for Abel now, for a query like above will be performed well with Scan operation instead of Seek but you will see that SQL Server still uses Seek instead of Scan.


Now you know the reason why it has not used Scan operation. As you see, still the statistics shows Estimated Number of Rows as 1.699 though there are 700 records. We can manually update statistics using UPDATE STATISTICS statement but let's try to understand why it has not been updated.

Generally, when the Auto Update Statistics is enabled, it will update statistics when it detects 500+20% of row changes. This algorithm gets changed with larger number of records but generally this is how it works. In our case, we have 1000 records and 20% of it is 200. Therefore it needs 200+500, which is 700 changes for updating statistics. Since we have inserted only 699, it does not fire statistics update.

Now you can either manually update or insert another record. Let's insert one more records and see how it works.

-- Can manually update
-- UPDATE STATISTICS dbo.Person IX_Person_LastName
-- or
-- Insert new record

INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (1) FirstName, 'Abel'  
FROM AdventureWorks2014.Person.Person

-- Check the query again
SET STATISTICS IO ON
SELECT * 
FROM dbo.Person
WHERE LastName = 'Abel'


Statistics return Estimated Number of Rows accurately, hence optimizer decides to use Scan instead of Seek. Note that, logical reads has gone down to 12 now since not much lookups.

This is the reason why experts recommend update statistics manually if a large number of record are changed by an operation.

You can read more info on statistics at: https://blogs.msdn.microsoft.com/srgolla/2012/09/04/sql-server-statistics-explained/

Thursday, September 1, 2016

SQL Server Lock Escalation and handling it at partition level

SQL Server locking mechanism makes sure that database is consistence during a transaction and it makes the database available for as much as concurrent users while a transaction is being run. Locks are acquired at both reading and writing. Locks acquired for reading, with default Isolation Level, are released as soon as the SELECT is completed but locks acquired for writing are held until the transaction is completed.

Lock Escalation
Locks are acquired at an appropriate level. It can be at key level, page level or entire table. The cost for maintaining locks is high when many locks are acquired for a transaction, hence SQL Server tries its best for minimizing the number of locks need to hold. It is done by converting fine-grained locks held by a transaction on a single resource to a single coarser-grained lock on the same resource. This is called Lock Escalation.

The Lock Escalation happens from either row or page locks to a table lock. It never happens from row locks to a page lock. With previous releases of SQL Server, the condition for deciding to escalate locks was hard-coded based on certain number of records that are locked at either row or page level. However, SQL Server 2016 uses multiple factors and one of them is memory consumption. In most cases, when lock manager consumes more than 40% of allocated memory, Lock Escalation can happen.

ALTER TABLE SET ESCALATION LEVEL
Whether the table is partition or not, the default behavior is to escalate to table level. The ALTER TABLE SET ESCALATION LEVEL that is used for controlling this, accepts three values;
  1. TABLE: The default value. This escalates to table level whether the table is partioned or not.
  2. AUTO: This is same as above but it escalates to partition level if the table is partitioned.
  3. DISABLE: This prevents lock escalation but it does not guarantee.
Disabling at session level
If you need to stop escalation at session level, use can enable Trace Flag 1224 for the session. This stop Lock Escalation but it still can occur due to the memory pressure.
However, if you need to completely stop it, you can use 1211 instead of 1224. You need to be cautious on this as it might degrade the performance.

The following code shows how SQL Server escalates locks with different settings. The table InternetSales_Partition is partitioned by its OrderDateKey and default settings have not been changed.

BEGIN TRAN
 UPDATE dbo.InternetSales_NonPartition
  SET TaxAmt = TaxAmt * 10
 WHERE OrderDateKey BETWEEN 20130501 AND 20130831

--ROLLBACK TRAN

If we check the locks acquired using sp_lock;


As you see, entire table is locked with its default settings. Let's set it with AUTO option and see how it works.

ALTER TABLE dbo.InternetSales_Partition
SET (LOCK_ESCALATION = AUTO)

BEGIN TRAN
 UPDATE dbo.InternetSales_Partition
  SET TaxAmt = TaxAmt * 10
 WHERE OrderDateKey BETWEEN 20130501 AND 20130831

--ROLLBACK TRAN


As you see, now it is at the partition level, not at the table level. This increases the concurrency even though the Lock Escalation has occurred.

Monday, August 22, 2016

SQL Server Heap Structure shows many IO than expected

It is very rare to see a table with Heap Structure as it is not a common practice but we still find such tables with some implementations. When performing performance tuning on problematic tables or queries, the first check we have in the checklist is whether the query produces an odd number of IOs than it should show, hence, same is applied to a query written against a heap structure as well. What if it shows a significant higher number for IOs?

This can happen because of Forwarding Pointers. When a value of a column is updated that requires more space and if the Data Page has no space, the page is set with a new page or last page it finds and keeps a pointer in the original page. SQL Server does this because by doing like this, it does not need to update the RID (8byte – file number: page number: row number) of non-clustered indexes created. But if the table has many records and many similar updates, it creates many forwarding pointers and increases the IO for satisfying the request.

Let's check this with a simple code;

The following code creates a table and inserts set of records;

USE tempdb;
GO

CREATE TABLE dbo.TestTable
(
 Id int identity(1,1)
 , Name varchar(200) not null
);
GO

INSERT INTO dbo.TestTable
 (Name) VALUES ('Name');
GO 1000000

This code checks the space. As you see, the table consumes 21800KB and if we divide it by 8, we can get the approximate number of pages it needs to hold records, which is 2725. The last statement shows the pages read.

-- This shows the space used for data is 21800 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 21800 / 8 = 2725
-- And this proves that our calculation is correct

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


Now let's update the table by increasing the size of the second column. Since all pages are full with initial inserts, SQL Server moves updated records to new pages, adding pointers to the original location. If we check the same after the update, you will see that our calculation shows that it needs only 16630 pages but there are 489,382 page reads.

UPDATE dbo.TestTable
 SET Name = REPLICATE('N', 200)
WHERE Id % 2 = 0;

-- This shows the space used for data is 133040 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 133040 / 8 = 16630
-- But we do not see the same now

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


This will surely slow down the queries. How this has happened? It has happened because of forwarding pointers set. The following code shows you the number of forwarding pointers created by our update statements.


Have a look on the output, the 489,382 IOs is formed as Page Count + Forwarded Record Count which is 16630 + 472,752.

If you see a similar issue with your table, it is better to upgrade the table from Heap to Clustered structure. However, as a temporary measure, you can run the following code for fixing it.

ALTER TABLE dbo.TestTable REBUILD;

Sunday, August 21, 2016

SQL Server Brain Basher of the Week #051 - SQL Server 2016

SQL Server 2016 is released and many have started using it. Here is a simple question on SQL Server 2016;

True or False: SQL Server 2016 is available for both x32 and x64 architectures.

Although earlier versions of SQL Server supported both x32 and x64 architectures, SQL Server 2016 is supported on x64 processor only. It is no longer supported x86 processors (x32 architecture).