Monday, September 11, 2017

SQL Server - Merging Partitions - Does it physically move data from a data file to another?

Partitioning is a common practice when maintaining large number of records in tables regardless of the database type: whether the database is an OLTP database or a data warehouse based on dimensional modeling. Partitioning requires a Partition Function that describes boundary values and number of partitions required, and Partition Scheme that assigns File Groups to partitions.

Here are few posts I have made on partitioning;

Once a table is partitioned, based on the workloads, we add new boundary values for introducing new partitions to the table and we remove boundary values for combining partitions (or removing partitions). These two operations are done with two partition related functions: SPLIT and MERGE. Now the question is, when we remove a partition using the MERGE function, if partitions are distributed with different file groups (means different data files), does SQL Server moves data from one file to another? Do we really have to consider it?

Let's make a database and do a simple test. Let's create a database called AdventureWorksTest and add multiple file groups and data files.

  1. -- Creating the test database  
  2. USE master;  
  3. GO  
  4.   
  5. CREATE DATABASE AdventureWorksTest  
  6. GO  
  7.   
  8. -- Add multiple file group and files  
  9. -- files are distributed  
  10. -- Add filegroup 1  
  11. ALTER DATABASE AdventureWorksTest  
  12. ADD FILEGROUP FileGroup1;  
  13. GO  
  14. ALTER DATABASE AdventureWorksTest  
  15. ADD FILE   
  16. (  
  17.     NAME = AdventureWorksTest_Data_01,  
  18.     FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\AdventureWorksTest_Data_01.ndf',  
  19.     SIZE = 5MB,  
  20.     MAXSIZE = 100MB,  
  21.     FILEGROWTH = 5MB  
  22. )  
  23. TO FILEGROUP FileGroup1;  
  24.   
  25. -- Add filegroup 2  
  26. ALTER DATABASE AdventureWorksTest  
  27. ADD FILEGROUP FileGroup2;  
  28. GO  
  29. ALTER DATABASE AdventureWorksTest  
  30. ADD FILE   
  31. (  
  32.     NAME = AdventureWorksTest_Data_02,  
  33.     FILENAME = 'C:\Databases\AdventureWorksTest_Data_02.ndf',  
  34.     SIZE = 5MB,  
  35.     MAXSIZE = 100MB,  
  36.     FILEGROWTH = 5MB  
  37. )  
  38. TO FILEGROUP FileGroup2;  
  39.   
  40. -- Add filegroup 3  
  41. ALTER DATABASE AdventureWorksTest  
  42. ADD FILEGROUP FileGroup3;  
  43. GO  
  44. ALTER DATABASE AdventureWorksTest  
  45. ADD FILE   
  46. (  
  47.     NAME = AdventureWorksTest_Data_03,  
  48.     FILENAME = 'D:\Databases\AdventureWorksTest_Data_03.ndf',  
  49.     SIZE = 5MB,  
  50.     MAXSIZE = 100MB,  
  51.     FILEGROWTH = 5MB  
  52. )  
  53. TO FILEGROUP FileGroup3;  
  54.   
  55. -- Add filegroup 4  
  56. ALTER DATABASE AdventureWorksTest  
  57. ADD FILEGROUP FileGroup4;  
  58. GO  
  59. ALTER DATABASE AdventureWorksTest  
  60. ADD FILE   
  61. (  
  62.     NAME = AdventureWorksTest_Data_04,  
  63.     FILENAME = 'E:\Databases\AdventureWorksTest_Data_04.ndf',  
  64.     SIZE = 5MB,  
  65.     MAXSIZE = 100MB,  
  66.     FILEGROWTH = 5MB  
  67. )  
  68. TO FILEGROUP FileGroup4;  
  69. GO  

As you see, we have five file groups now.


Let's create a Partitioned Table and load it using AdventureWorks2014 Sales.SalesOrderHeader table. Note that, below code creates a Partition Function with 2014, 2015, 2016 and 2017 as boundary values. And it creates the Partition Scheme setting with multiple file groups.

  1. -- Creating parition function and scheme  
  2. USE AdventureWorksTest;  
  3. GO  
  4.   
  5. CREATE PARTITION FUNCTION pf_Year (datetime)   
  6. AS RANGE right  
  7. FOR VALUES ('2014-01-01 00:00:00.000''2015-01-01 00:00:00.000',  '2016-01-01 00:00:00.000',  '2017-01-01 00:00:00.000');  
  8. GO  
  9.   
  10. CREATE PARTITION SCHEME ps_Year   
  11. AS PARTITION pf_Year   
  12. TO ([PRIMARY], FileGroup1, FileGroup2, FileGroup3, FileGroup4);  
  13. GO  
  14.   
  15.   
  16.   
  17. -- Creating a partitioned table and loading from AdventureWorks2014 database  
  18. -- Note that sales dates of AdventureWorks2014 database have been updated for year 2014 to 2017  
  19. CREATE TABLE Sales  
  20. (  
  21.  SalesOrderID int NOT NULL,  
  22.  RevisionNumber tinyint NOT NULL DEFAULT (0),  
  23.  OrderDate datetime NOT NULL DEFAULT GETDATE(),  
  24.  DueDate datetime NOT NULL,  
  25.  ShipDate datetime NULL,  
  26.  [Status] tinyint NOT NULL DEFAULT(1),  
  27.  SubTotal money NOT NULL DEFAULT(0.00),  
  28.  TaxAmt money NOT NULL DEFAULT (0.00),  
  29.  Freight money NOT NULL DEFAULT (0.00),  
  30.  TotalDue money,  
  31.  Comment nvarchar(128) NULL,  
  32.  rowguid uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT NEWID(),  
  33.  ModifiedDate datetime NOT NULL DEFAULT GETDATE()  
  34. )   
  35. ON ps_Year(OrderDate);  
  36. GO  
  37.   
  38. INSERT INTO Sales  
  39.   (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]  
  40.   , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate)  
  41. SELECT SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]  
  42.   , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate  
  43. FROM AdventureWorks2014.Sales.SalesOrderHeader;  
  44.   
  45. -- Checking records;  
  46. SELECT  $PARTITION.pf_Year(OrderDate) PartitionNo, *  
  47. FROM Sales;  

Here is the result of the query.


Here is another query to see how data is distributed with each file group.

  1. -- Get the record count for each partition/file group  
  2. SELECT t.name AS TableName,  
  3.   p.partition_number PartitionNumber,  
  4.   COALESCE(f.name, d.name) FileGroupName,   
  5.   SUM(p.rows) NumberOfRecords,  
  6.   SUM(a.total_pages) NumberOfDataPages  
  7. FROM sys.tables AS t  
  8.  INNER JOIN sys.indexes i   
  9.   ON i.object_id = t.object_id  
  10.  INNER JOIN sys.partitions p   
  11.   ON p.object_id = t.object_id   
  12.    AND p.index_id = i.index_id  
  13.  INNER JOIN sys.allocation_units a   
  14.   ON a.container_id = p.partition_id  
  15.  LEFT OUTER JOIN sys.filegroups f   
  16.   ON f.data_space_id = i.data_space_id  
  17.  LEFT OUTER JOIN sys.destination_data_spaces AS dds   
  18.   ON dds.partition_scheme_id = i.data_space_id   
  19.    AND dds.destination_id = p.partition_number  
  20.  LEFT OUTER JOIN sys.filegroups d   
  21.   ON d.data_space_id = dds.data_space_id  
  22. WHERE t.[type] = 'U' AND i.index_id IN (0, 1) AND t.name LIKE 'Sales'  
  23. GROUP BY t.name, p.partition_number, COALESCE(f.name, d.name)   
  24. ORDER BY t.name, p.partition_number;  


If I illustrate the same in different manner;


This is how the data is distributed with my system now. If I try to delete a data file, example, AdventureWorksTest_Data_03 that is belong to FileGroup3, SQL Server will not allow me to do as it holds data.


If I decide to combine 2015 and 2016, in other words, if I want to remove the 4th partition that holds 2016 data, what will happen. Let's do and see. The below codes merge the partition;

  1. ALTER PARTITION FUNCTION pf_Year()  
  2. MERGE RANGE ('2016-01-01 00:00:00.000');  

Once delete, let me run the same query to see how data is partitioned now.


As you see, FileGroup3 is no longer used by the table and number of records in the FileGroup2 has been increased. The partition which was 5 is now 4. This means, data that was maintained with FileGroup3 (or AdventureWorksTest_Data_03) has been moved to FileGroup2 (or AdventureWorksTest_Data_02). This is what has happened;


This clearly shows that data is moved when partitions are merged. Okay, do we have to really consider about this? Yes, we have to, when this is a small table, we do not see any performance issue but if the table is large, moving data from one file to another will surely take time, hence, think twice before merging, specifically when they are distributed in multiple files.

** Now, if you need, you can delete the AdventureWorksTest_Data_03 file and SQL Server will allow you to delete as it is empty.

No comments: