Monday, August 22, 2016

SQL Server Heap Structure shows many IO than expected

It is very rare to see a table with Heap Structure as it is not a common practice but we still find such tables with some implementations. When performing performance tuning on problematic tables or queries, the first check we have in the checklist is whether the query produces an odd number of IOs than it should show, hence, same is applied to a query written against a heap structure as well. What if it shows a significant higher number for IOs?

This can happen because of Forwarding Pointers. When a value of a column is updated that requires more space and if the Data Page has no space, the page is set with a new page or last page it finds and keeps a pointer in the original page. SQL Server does this because by doing like this, it does not need to update the RID (8byte – file number: page number: row number) of non-clustered indexes created. But if the table has many records and many similar updates, it creates many forwarding pointers and increases the IO for satisfying the request.

Let's check this with a simple code;

The following code creates a table and inserts set of records;

USE tempdb;
GO

CREATE TABLE dbo.TestTable
(
 Id int identity(1,1)
 , Name varchar(200) not null
);
GO

INSERT INTO dbo.TestTable
 (Name) VALUES ('Name');
GO 1000000

This code checks the space. As you see, the table consumes 21800KB and if we divide it by 8, we can get the approximate number of pages it needs to hold records, which is 2725. The last statement shows the pages read.

-- This shows the space used for data is 21800 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 21800 / 8 = 2725
-- And this proves that our calculation is correct

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


Now let's update the table by increasing the size of the second column. Since all pages are full with initial inserts, SQL Server moves updated records to new pages, adding pointers to the original location. If we check the same after the update, you will see that our calculation shows that it needs only 16630 pages but there are 489,382 page reads.

UPDATE dbo.TestTable
 SET Name = REPLICATE('N', 200)
WHERE Id % 2 = 0;

-- This shows the space used for data is 133040 KB
EXEC sp_spaceused N'dbo.TestTable', N'true';

-- If we devide by 8, it should show the approximate number of pages
-- table should used for maintaining records.
-- 133040 / 8 = 16630
-- But we do not see the same now

SET STATISTICS IO ON
SELECT COUNT(*) FROM dbo.TestTable;
SET STATISTICS IO OFF


This will surely slow down the queries. How this has happened? It has happened because of forwarding pointers set. The following code shows you the number of forwarding pointers created by our update statements.


Have a look on the output, the 489,382 IOs is formed as Page Count + Forwarded Record Count which is 16630 + 472,752.

If you see a similar issue with your table, it is better to upgrade the table from Heap to Clustered structure. However, as a temporary measure, you can run the following code for fixing it.

ALTER TABLE dbo.TestTable REBUILD;

Sunday, August 21, 2016

SQL Server Brain Basher of the Week #051 - SQL Server 2016

SQL Server 2016 is released and many have started using it. Here is a simple question on SQL Server 2016;

True or False: SQL Server 2016 is available for both x32 and x64 architectures.

Although earlier versions of SQL Server supported both x32 and x64 architectures, SQL Server 2016 is supported on x64 processor only. It is no longer supported x86 processors (x32 architecture).

Friday, August 19, 2016

How to Recover Deleted Tables in SQL Database [Guest Post]

This post is written by: Priyanka Chouhan
Priyanka Chouhan is a technical writer in "Stellar Data Recovery" with 5 years of experience and has written several articles on SQL. She has the fine knowledge of SharePoint and SQL Server. In the spear time she loves reading and gardening.

The SQL Server deleted table can be recovered using transaction log and Log Sequence Number (LSN). Transaction logs are responsible for logging the database activities. Using the undocumented function “fn_dblog” you can find out the unauthorized deletion of data and can recover back the deleted data contents. 

If the estimated time and record of data is known, it is not a difficult task to recover your data. In case, if you are not aware of the date and time of its deletion, then it requires detecting the real cause of deletion. The initial task is to find out the accurate log sequence numbers (LSN) under which the DELETE statement is executed, and then the data will be recovered until the exact LSN is reached.

What is an LSN?
Each record in SQL Server database transaction log is identified by a unique LSN. It is associated with every record at which the substantial event occurred. This may be helpful for building accurate restore sequence. LSNs are used internally for tracking the point in time at which data is restored during the RESTORE sequence. While restoring a backup, the data is restored to LSN, which is associated with that point in time during which the data was recorded. 

Restore Deleted SQL Database Tables with Log Sequence Number
SQL Server deleted data can be restored easily by overwriting the original database using backup files. If the DROP TABLE statement is executed accidently, and the full database backup is available, it can be considered that no modifications were made after the table was dropped.

If this is not the case, deleted data can still be recovered even if:
  • the database objects has been modified after executing the DROP TABLE statement,
    or
  • valid database backup is not available
Every DROP operation is logged to the transaction log in SQL Server. This implies that the transaction log saves information required to roll back transactions, considering that the SQL Database is in full recovery mode and supports point in time recovery from the respective DROP TABLE statement.

If the DROP SQL database operation is logged to the SQL Server transaction log, it is possible to roll back the operation even if it is compressed within an active transaction.

DROP TABLE Customer1
SELECT * FROM Customer1

Since the table ‘Customer1’ has been dropped, the above SQL Query will result in “Msg 208, Level 16, State 1” error message.

Since the DROP operation is rolled back, the following SQL Query will return entire rows existing in the Customer1 table.

BEGIN TRAN 
DROP TABLE Customer1
ROLLBACK 
SELECT * FROM Customer1

Can Dropped Tables be restored? 

In case, the transaction with DROP operation is no longer active, the table Customer1 cannot roll back since it has been committed. Therefore, the data it stores will be deleted. The question that now arises is whether the data being deleted due to DROP operation can be recovered even if the full backup is not available. The answer is yes. SQL tables can be recovered using third party software even in cases where the backup is not available. 

Follow the steps below to recover SQL Server deleted table data and gain access to integrated attributes with the help of Stellar Phoenix SQL Database Repair:
  1. Download, install and launch the software.
  2. The home screen will appear with a message box stating that the SQL Server must be stopped before initializing the recovery process.
  3. In the next window, select the database (MDF file) in which table has to be restored.
  4. Under Find Database column, provide the location of the folder to search the database.
  5. Alternatively, click the Find button if the database location is unknown.
  6. You can select the option to recover deleted records as well by checking the box which reads “Include Deleted Records”.
  7. Select the Repair button to initiate the procedure to recover the SQL Server deleted tables.
  8. The tool will display all the items of the database including tables, indexes, Views, Triggers, Rules, Functions, etc. on the left panel.
  9. You can check the boxes available against each item to recover the specific item. In this case, you need to select Tables.
  10. Next, preview all the details incorporated in the selected table including deleted records will be displayed on the right panel of the tool.
  11. The Electronic Registration wizard will open. Enter the Serial Number delivered to your Email ID.
  12. The tool will provide the option to save the recovered database into four file formats, namely, MSSQL, CSV, HTML, and XLS.
  13. Next, click on the Browse button to provide the destination path for storing the recovered MDF file.
  14. Next, click on OK to proceed further with the process to restore the deleted SQL Server table.
  15. You can optionally select to open the recovered database by checking the box “Automatically open the destination folder after saving”.
  16. Then, click on OK.
  17. Select the Saving option to restore the recovered tables from the MDF file either from: 
    1. New Database: Selecting this option restores recovered data to new database, or
    2. Live Database: Selecting this option restores recovered data to existing database

      Note: If you select to store recovered data to the existing database, you need to provide the SQL Server Database name and other credentials.
  18. Next, the message box will appear stating that the recovered data is restored to the selected format
You can now view the newly stored database or MDF file to access the recovered tables.





Thursday, August 18, 2016

NoSQL, Not only SQL, NewSQL: Are they same?

Last year, I wrote a post on NoSQL Document Database: What is NoSQL Document Database? What is Azure DocumentDB?, that speaks about Document Database and how it stores data. Although the post explains about NoSQL as well, it does not talk about other terms like Not Only SQL or NewSQL, hence thought to make a post on it.

NoSQL in simple term
We talk about NoSQL when we talk about different type of databases (or storage). In most cases, our projects were done with relational databases but for certain scenario, it needs a different structure to hold data, specifically when the data needs to hold is semi-structured or unstructured. As I mentioned in the previous post, it is not a new database type, it is as old as relational databases. In simple terms, NoSQL database is a non-relational database that allows us to store mainly large mount of semi-structured data.

NoSQL has become prevalent in recent years because;
  • Previously-ignored-Big-Data has become the most important.
    Generally, applications built ignore some of the data not because of they are not important but difficult to process with traditional DBMSs. Large streams of data, semi-strucutured or unstrcutured data coming from various equipments are some of the examples. With modern technologies and techniques developed, it has become easy to process them, hence previously-ignored data needs to be captured and stored. This is one of the reasons for getting NoSQL databases are populated.
  • Less-expensive-scaled-out solutions
    For processing large datasets, specifically data without a proper structure, it needs a good infrastructure solution. Initially, the required infrastructure was very expensive, but now, with powerful commodity hardware, virtualization technologies, cloud services, the cost has relativiely has gone down. Not only that, since some services can be taken on demand and pay as you go, the usage of them has become easy and affordable. This has definetly become another factor for higher usage of NoSQL.
Not Only SQL
Although NoSQL databases are non-relational, some of databases now support SQL as well, hence sometime they are referred as Not Only SQL.

NewSQL
The NewSQL represents relational database that offers NoSQL capabilities. It offers similar scalability and allows us to work with semi-structured and unstructured data as we do with NoSQL databases.

You can see types of NoSQL databases available with this link: http://nosql-database.org/



Sunday, August 7, 2016

SQL Server Brain Basher of the Week #050 - SSRS Client Printing

Here is the weekly question on the SQL Server for improving your knowledge for facing interviews. This time, let's talk about an area on Reporting Services.

SQL Server Reporting Services is a great platform for all types of reporting solutions and its Web Portal (former Report Manager) is one of the tools for managing deployed reports. The Web Portal allows you to view and manage reports published and it allows you to print viewed reports like below;


Now the question is How can you disable the printing facility from the portal?

If you have a requirement like that, it can be done with Reporting Services Properties using the Management Studio. Simply, connect to Reporting Services using the Management Studio and get its properties by right-clicking the instance. There is a property called EnableClientPrinting that controls the functionality, you can set this has False for disabling client printing.

Once disabled, the Print icon will not be appeared.



Saturday, August 6, 2016

The shared dataset definition is not valid. Details: The required attribute 'Name' is missing

Are you experiencing this error when trying to deploy a shared dataset? 


This has been mentioned in several forums and considered as a bug. Searched for a permanent solution but could not find, therefore this is what you have to do if experience the same;

    1. Open the project folder and locate the Shared DataSet.


    2. Open the shared dataset using a Text Editor and add the Name Attribute as below.


    3. Now deploy and see. You should be able to do it.

Thursday, August 4, 2016

Generate aggregates for different levels in the same queries without using sub queries

While analyzing a query written for a report, I noticed a query that has been written for getting aggregated values on a same column but using different levels. It has been written using sub queries and it was taking an unexpected time for loading data. I had to analyze it and optimize and this what I did. 

It was something similar to below query;

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

SELECT YEAR(OrderDate) SalesYear
 , DATEPART(q, OrderDate) SalesQuarter
 , MONTH(OrderDate) SalesMonth
 , SUM(ol1.UnitPrice * ol1.Quantity) MonthlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) 
   AND  DATEPART(q, o2.OrderDate) =  DATEPART(q, o1.OrderDate)) AS QuarterlySalesAmount
 , (SELECT SUM(ol2.UnitPrice * ol2.Quantity) 
  FROM Sales.Orders o2
  INNER JOIN Sales.OrderLines ol2
   ON o2.OrderID = ol2.OrderID
  WHERE YEAR(o2.OrderDate) = YEAR(o1.OrderDate) ) AS YearlySalesAmount
FROM Sales.Orders o1
 INNER JOIN Sales.OrderLines ol1
  ON o1.OrderID = ol1.OrderID
GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
ORDER BY 1, 2, 3;

You get a result like below image. As you see, it shows Yea, Quarter, Month and then sales for the Month, Quarter and Year.


I cleaned up the cache with following code and executed query to see some satistics. It takes 4 seconds and it has read more than 14,000 pages from the Orders and more than 91,000 pages from the OrderLines table.

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;


The quickest solution came to my mind was CTE, hence applied it to the query as below.

USE WideWorldImporters;
GO

SET STATISTICS IO ON;

WITH YearlySales (SalesYear, YearlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate)
),
QuarterlySales (SalesYear, SalesQuarter, QuarterlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate)
),
MonthlySales (SalesYear, SalesQuarter, SalesMonth, MontlySalesAmount)
AS
(
 SELECT YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate), SUM(UnitPrice * Quantity)
 FROM Sales.Orders o
 INNER JOIN Sales.OrderLines ol
   ON o.OrderID = ol.OrderID
 GROUP BY YEAR(OrderDate), DATEPART(q, OrderDate), MONTH(OrderDate)
)
SELECT y.SalesYear, q.SalesQuarter, m.SalesMonth, m.MontlySalesAmount, q.QuarterlySalesAmount, y.YearlySalesAmount
FROM YearlySales y
 INNER JOIN QuarterlySales q
  ON y.SalesYear = q.SalesYear
 INNER JOIN MonthlySales m
  ON q.SalesYear = m.SalesYear AND q.SalesQuarter = m.SalesQuarter
ORDER BY y.SalesYear, q.SalesQuarter, m.SalesMonth;

And when I test the performance just like the way I did, as you see, it takes only 1 second and very small number of pages are read.


A simple lesson we can learn from this is, we have to avoid sub queries as much as possible, specifically co-related sub queries because the cost is very high. In addition to that, we should use modern ways for getting the results in efficient ways.

Wednesday, August 3, 2016

Arithmetic overflow error converting expression to data type int - It works for larger number but not for a smaller number

One of my colleagues sent me a simple code that has two lines. The first line works without any issue but second throws an error. This is the code;

SELECT 9*10000000000000000
SELECT 9*1000000000

This is the output;


What is the issue, the first calculation is much larger to second one but it works. The reason is, how SQL Server evaluates the expression and decides the data types. If we consider the second line, the largest value in the expression is 1,000,000,000 which is lower to the highest value that int can hold. The maximum number int can hold is 2,147,483,647. Since both 9 and 1,000,000,000 fall into the range related int, it sets the data type of the return value has int as well. However, when 1,000,000,000 gets multiplied by 9, it exceeds the range. That is the reason for the error.

With the first line, 10,000,000,000,000,000 is considered as a bigint type number and the result becomes the same. Therefore it works without any issue.

If we write the query as below, it will work without any issue.

DECLARE @a bigint
SET @A = 10000000000000000
SELECT  9*@A

SET @A = 1000000000
SELECT  9*@A

Tuesday, August 2, 2016

SQL Server Variable Assignment - SET or SELECT?

Declaration variables and assigning values in different ways is something we see as a very common code in modules like stored procedures and functions. It is possible to assign a value at the declaration or a value can be assigned after the declaration either using SET or SELECT. A question raised on it, what would be the best and which gives better performance, when assigning the value using SET or SELECT. This is what I explained;

Let's see the differences one by one;
1. Only one variable can be set with a value with a single SET statement but with the SELECT statement, multiple variables can be set with values using a single SELECT statement.

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- New two SET statements for assigning values
SET @Variable01 = 100; SET @Variable02 = 200; 
SELECT @Variable01, @Variable02;

-- Assigning values using a single SELECT
SELECT @Variable01 = 300, @Variable02 = 300;
SELECT @Variable01, @Variable02;
GO


2. Both SET and SELECT support assigning values using a query. Just like the number (1), SELECT can be used for setting multiple variables.

USE WideWorldImporters;
GO

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- Assigning a value using a query, this works as expected
SET @Variable01 = (SELECT COUNT(*) FROM Sales.Customers); 
SELECT @Variable01;

-- Assigning values using a query, this works as expected
SELECT @Variable01 = COUNT(*), @Variable02 = AVG(CustomerID) FROM Sales.Customers

SELECT @Variable01, @Variable02;
GO


3. When assigning values using a query, if the query returns more than one record, SET returns an error whereas SELECT takes the first record and assign the value to the variable.

USE WideWorldImporters;
GO

-- Declaring variables without assigning values
DECLARE @Variable01 int;
DECLARE @Variable02 int;

-- Assigning a value using a query, this query returns more than one value
-- , hence SET throws an error
SET @Variable01 = (SELECT CustomerID FROM Sales.Customers); 
SELECT @Variable01;

-- Assigning a value using a query, this query returns more than one value
-- , but SELECT takes the first value without throwing an error
SELECT @Variable01 = CustomerID FROM Sales.Customers

SELECT @Variable01, @Variable02;
GO


4. When assigning values using a query, if the query returns no record, NULL will be set to the variable with SET statement but SELECT statement will keep the old value without changing.

USE WideWorldImporters;
GO

-- Declaring variables, assigning values
DECLARE @Variable01 int = 0;
DECLARE @Variable02 int = 0;

-- Assigning a value using a query
-- This query does not return any record, hence variable becomes NULL
SET @Variable01 = (SELECT CustomerID FROM Sales.Customers WHERE BuyingGroupID = 100 ); 

-- Assigning a value using a query
-- This query does not return any record, but initial value will not be replaced with a NULL
SELECT @Variable02 = CustomerID FROM Sales.Customers WHERE BuyingGroupID = 100;

SELECT @Variable01, @Variable02;

5. One more point, remember that SET is ANSI standard for assigning variables but SELECT is not.