Showing posts with label data pages. Show all posts
Showing posts with label data pages. Show all posts

Wednesday, April 12, 2017

How to find the related file and the page numbers of records in SQL Server data table

Sometime, for some administration works, we need to know which file has been used for holding our records and the related page numbers. How can we easily find these information?

Generally we use DBCC IND and DBCC PAGE but there are two more great functions that can be used for finding the same.

The first function is %%PHYSLOC%%. This returns the RID (Record Identifier) as a hexadecimal value. The RID consists file number, page number and, record number. The second function is a table-valued function which is fn_PhysLocCracker. It accepts the RID returning from %%PHYSLOC%% and returns three-columned table for file number, page number and, record number. Combing these two, we can get the information as we want. Here is an example;

SELECT p.BusinessEntityID, p.Title
    , p.FirstName, p.LastName
    , ph.*
FROM [Person].[Person] p
CROSS APPLY fn_PhysLocCracker(%%PHYSLOC%%) ph



Saturday, April 23, 2016

How SQL Server handles our requests for modifications

This is not something new with latest Microsoft SQL Server but this is still being discussed and it is unknown or unclear to many of database developers. While I was discussing transaction log of SQL Server database with few, as a part of it, how SQL Server accepts our requests and modifies records was discussed. Thought it is useful to everyone, hence making a post on it;

What really happens when we send a request to SQL Server? It can be an update or a delete. Request might be related to one record or many records. Have a look on below image;


This starts with the request. Either using an application or connecting directly to SQL Server using something like SSMS, we send the request. Once SQL Server received the request, it checks data pages related to the records. If data pages required are not in the memory (or buffer cache), it loads relevant data pages from the disk to memory. Then, remember, it modifies records in pages that are in the memory, not pages in the disk. That is what 1st and 2nd steps in the image explains.

Third step is, updating the transaction log in the disk. Once the page (or pages) in the memory are modified, they become dirty pages. Then SQL Server writes redo and undo information to the log file. During this update, pages related are locked until the transaction log is completely updated.

Once the log is updated, the acknowledgement is sent to the application. Note that, even though the data files are not updated, we receive a message saying records are successfully updated. But we do not want worry, even something happen after we received the message, SQL Server can recover committed records, making sure durability which is one of the properties of the transaction, is satisfied with SQL Server.

Later, after one ore more transactions, a process called Checkpoint writes all dirty pages back to the disk, updating data files. This is how SQL Server handles our update requests.

Friday, December 18, 2015

SET STATISTICS IO ON : Can you trust the result of this when using scalar functions?

One key thing we look at with the performance tuning on TSQL is, number of data pages read for retrieving data either from memory or disk. We always trust the SET STATISTICS IO ON and take decisions based on the result of it. But this does NOT always give the accurate value and it might mislead you for taking a wrong action on it.

Let's check this and see. The following code creates a scalar function that returns number of products related to the order accepting the SalesOrderId. This reads SalesOrderDetail table for getting the count.

USE AdventureWorks2014_New;
GO

CREATE FUNCTION dbo.GetNofProductsForOrder (@SalesOrderID int)
RETURNS smallint
AS
BEGIN

 DECLARE @NofProducts smallint
 SELECT @NofProducts = COUNT(*) 
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID = @SalesOrderID

 RETURN @NofProducts;
END;

Before using the function, let's see how many pages need to be read for getting data from both SalesOrderHeader and SalesOrderDetail tables. As you see, I use SET STATISTICS IO ON for this.

SET STATISTICS IO ON

SELECT SalesOrderID, SalesOrderNumber, SubTotal
FROM Sales.SalesOrderHeader;
SELECT SalesOrderID, SalesOrderDetailID, LineTotal
FROM Sales.SalesOrderDetail;



Now let's use the function. This code shows some order details along with number of products related to each order. Note the result related to SET STATISTICS IO ON.

SET STATISTICS IO ON

SELECT SalesOrderID, SalesOrderNumber, SubTotal
 , dbo.GetNofProductsForOrder(SalesOrderID) As NofProducts
FROM Sales.SalesOrderHeader;


Since we access both tables, number of pages to be read should definitely be more than 689 but it looks like this shows only pages related to SalesOrderHeader. How this can be happened? I do not have an answer now but will surely find the reason. Now the question is, how can we see the accurate value for this? Is it possible to see it?

Yea it is possible. It can be clearly seen with Profiler. If you analyze SQL:BatchCompleted and SQL:StmtCompleted, you can get the correct value. Here is the accurate value related to the query using Profiler.


Just be aware on this, this can happen statement like TOP too. In a way, it is always better to use Profiler for seeing pages read but still this can be used for small queries with no functions.

Sunday, December 13, 2015

SELECT * and SELECT {columns}: Do they always read same number of data pages?

Everyone knows that SELECT * is not a best practice and we always try to avoid it. Some considerable issues related to it are;

  • Unnecessary columns will be transferred from server to client consuming resources.
  • Application written may face issues when dropping or introducing columns.
However, when reading the table by SQL Server engine, in most cases, no difference in terms of performance. The reason is, it has to read all data pages for getting records regardless of the number of columns referred in the statement. Have a look on following query and output of it.



As you see, both statements require same number of data pages to be read for satisfying the requests. However this is not always true, have a look on this;



Now it is different. The reason for this is, types of data pages used with this table. This table has an xml column and it is maintained not with data pages, with text/image pages. Because of this, when reading columns excluding columns like xml, max type columns, text, images, engine needs to read only data pages, reducing number of IO operations. This introduces the third reason for not using SELECT *, hence, do not use SELECT * when the table has columns that requires different data page types other than data pages.

Wednesday, November 25, 2015

How can I see the pages used by my table?

Not always, but there are instances that you need to see the data pages used by SQL Server for holding data in our tables. There are few ways of seeing them, the easiest way is, using two undocumented functions called sys.fn_PhysLocFormatter and sys.fn_PhysLocCracker. These two were initially released with SQL Server 2008 and they are still available. Not sure why they are still marked as undocumented but information throw from them is really useful.

Here is a test code on it;

USE AdventureWorksDW2014;
GO

-- Checking with fn_PhysLocFormatter function.
-- This shows file id, page id and slot id together.
SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS RID, * FROM dbo.FactInternetSales;


-- If you need file, page and slot separately, 
-- fn_PhysLocCracker function can be used.
SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS RID, RID_c.file_id, RID_c.page_id, RID_c.slot_id, * 
FROM dbo.FactInternetSales
 CROSS APPLY sys.fn_PhysLocCracker (%%physloc%%) AS RID_c
ORDER BY RID_c.file_id, RID_c.page_id, RID_c.slot_id;