Monday, September 11, 2017

SQL Server - Merging Partitions - Does it physically move data from a data file to another?

Partitioning is a common practice when maintaining large number of records in tables regardless of the database type: whether the database is an OLTP database or a data warehouse based on dimensional modeling. Partitioning requires a Partition Function that describes boundary values and number of partitions required, and Partition Scheme that assigns File Groups to partitions.

Here are few posts I have made on partitioning;

Once a table is partitioned, based on the workloads, we add new boundary values for introducing new partitions to the table and we remove boundary values for combining partitions (or removing partitions). These two operations are done with two partition related functions: SPLIT and MERGE. Now the question is, when we remove a partition using the MERGE function, if partitions are distributed with different file groups (means different data files), does SQL Server moves data from one file to another? Do we really have to consider it?

Let's make a database and do a simple test. Let's create a database called AdventureWorksTest and add multiple file groups and data files.

-- Creating the test database
USE master;
GO

CREATE DATABASE AdventureWorksTest
GO

-- Add multiple file group and files
-- files are distributed
-- Add filegroup 1
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup1;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_01,
    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\AdventureWorksTest_Data_01.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup1;

-- Add filegroup 2
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup2;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_02,
    FILENAME = 'C:\Databases\AdventureWorksTest_Data_02.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup2;

-- Add filegroup 3
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup3;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_03,
    FILENAME = 'D:\Databases\AdventureWorksTest_Data_03.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup3;

-- Add filegroup 4
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup4;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_04,
    FILENAME = 'E:\Databases\AdventureWorksTest_Data_04.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup4;
GO

As you see, we have five file groups now.


Let's create a Partitioned Table and load it using AdventureWorks2014 Sales.SalesOrderHeader table. Note that, below code creates a Partition Function with 2014, 2015, 2016 and 2017 as boundary values. And it creates the Partition Scheme setting with multiple file groups.

-- Creating parition function and scheme
USE AdventureWorksTest;
GO

CREATE PARTITION FUNCTION pf_Year (datetime) 
AS RANGE right
FOR VALUES ('2014-01-01 00:00:00.000', '2015-01-01 00:00:00.000',  '2016-01-01 00:00:00.000',  '2017-01-01 00:00:00.000');
GO

CREATE PARTITION SCHEME ps_Year 
AS PARTITION pf_Year 
TO ([PRIMARY], FileGroup1, FileGroup2, FileGroup3, FileGroup4);
GO



-- Creating a partitioned table and loading from AdventureWorks2014 database
-- Note that sales dates of AdventureWorks2014 database have been updated for year 2014 to 2017
CREATE TABLE Sales
(
 SalesOrderID int NOT NULL,
 RevisionNumber tinyint NOT NULL DEFAULT (0),
 OrderDate datetime NOT NULL DEFAULT GETDATE(),
 DueDate datetime NOT NULL,
 ShipDate datetime NULL,
 [Status] tinyint NOT NULL DEFAULT(1),
 SubTotal money NOT NULL DEFAULT(0.00),
 TaxAmt money NOT NULL DEFAULT (0.00),
 Freight money NOT NULL DEFAULT (0.00),
 TotalDue money,
 Comment nvarchar(128) NULL,
 rowguid uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT NEWID(),
 ModifiedDate datetime NOT NULL DEFAULT GETDATE()
) 
ON ps_Year(OrderDate);
GO

INSERT INTO Sales
  (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]
  , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate)
SELECT SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]
  , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate
FROM AdventureWorks2014.Sales.SalesOrderHeader;

-- Checking records;
SELECT  $PARTITION.pf_Year(OrderDate) PartitionNo, *
FROM Sales;

Here is the result of the query.


Here is another query to see how data is distributed with each file group.

-- Get the record count for each partition/file group
SELECT t.name AS TableName,
  p.partition_number PartitionNumber,
  COALESCE(f.name, d.name) FileGroupName, 
  SUM(p.rows) NumberOfRecords,
  SUM(a.total_pages) NumberOfDataPages
FROM sys.tables AS t
 INNER JOIN sys.indexes i 
  ON i.object_id = t.object_id
 INNER JOIN sys.partitions p 
  ON p.object_id = t.object_id 
   AND p.index_id = i.index_id
 INNER JOIN sys.allocation_units a 
  ON a.container_id = p.partition_id
 LEFT OUTER JOIN sys.filegroups f 
  ON f.data_space_id = i.data_space_id
 LEFT OUTER JOIN sys.destination_data_spaces AS dds 
  ON dds.partition_scheme_id = i.data_space_id 
   AND dds.destination_id = p.partition_number
 LEFT OUTER JOIN sys.filegroups d 
  ON d.data_space_id = dds.data_space_id
WHERE t.[type] = 'U' AND i.index_id IN (0, 1) AND t.name LIKE 'Sales'
GROUP BY t.name, p.partition_number, COALESCE(f.name, d.name) 
ORDER BY t.name, p.partition_number;


If I illustrate the same in different manner;


This is how the data is distributed with my system now. If I try to delete a data file, example, AdventureWorksTest_Data_03 that is belong to FileGroup3, SQL Server will not allow me to do as it holds data.


If I decide to combine 2015 and 2016, in other words, if I want to remove the 4th partition that holds 2016 data, what will happen. Let's do and see. The below codes merge the partition;

ALTER PARTITION FUNCTION pf_Year()
MERGE RANGE ('2016-01-01 00:00:00.000');

Once delete, let me run the same query to see how data is partitioned now.


As you see, FileGroup3 is no longer used by the table and number of records in the FileGroup2 has been increased. The partition which was 5 is now 4. This means, data that was maintained with FileGroup3 (or AdventureWorksTest_Data_03) has been moved to FileGroup2 (or AdventureWorksTest_Data_02). This is what has happened;


This clearly shows that data is moved when partitions are merged. Okay, do we have to really consider about this? Yes, we have to, when this is a small table, we do not see any performance issue but if the table is large, moving data from one file to another will surely take time, hence, think twice before merging, specifically when they are distributed in multiple files.

** Now, if you need, you can delete the AdventureWorksTest_Data_03 file and SQL Server will allow you to delete as it is empty.

No comments: