Dinesh's Blog :::: Being Compiled ::::: Partitioning

Showing posts with label Partitioning. Show all posts

Monday, September 11, 2017

SQL Server - Merging Partitions - Does it physically move data from a data file to another?

Partitioning is a common practice when maintaining large number of records in tables regardless of the database type: whether the database is an OLTP database or a data warehouse based on dimensional modeling. Partitioning requires a Partition Function that describes boundary values and number of partitions required, and Partition Scheme that assigns File Groups to partitions.

Here are few posts I have made on partitioning;

Fact Table: Clustered index on date key is enough or partitioning should be done with it?

Truncating Partitions - SQL Server 2016 TSQL Enhancements

Fact Loading: Switching a loaded temporary table into an empty partition of the fact table

Once a table is partitioned, based on the workloads, we add new boundary values for introducing new partitions to the table and we remove boundary values for combining partitions (or removing partitions). These two operations are done with two partition related functions: SPLIT and MERGE. Now the question is, when we remove a partition using the MERGE function, if partitions are distributed with different file groups (means different data files), does SQL Server moves data from one file to another? Do we really have to consider it?

Let's make a database and do a simple test. Let's create a database called AdventureWorksTest and add multiple file groups and data files.

-- Creating the test database
USE master;
GO

CREATE DATABASE AdventureWorksTest
GO

-- Add multiple file group and files
-- files are distributed
-- Add filegroup 1
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup1;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_01,
    FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\AdventureWorksTest_Data_01.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup1;

-- Add filegroup 2
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup2;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_02,
    FILENAME = 'C:\Databases\AdventureWorksTest_Data_02.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup2;

-- Add filegroup 3
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup3;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_03,
    FILENAME = 'D:\Databases\AdventureWorksTest_Data_03.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup3;

-- Add filegroup 4
ALTER DATABASE AdventureWorksTest
ADD FILEGROUP FileGroup4;
GO
ALTER DATABASE AdventureWorksTest
ADD FILE 
(
    NAME = AdventureWorksTest_Data_04,
    FILENAME = 'E:\Databases\AdventureWorksTest_Data_04.ndf',
    SIZE = 5MB,
    MAXSIZE = 100MB,
    FILEGROWTH = 5MB
)
TO FILEGROUP FileGroup4;
GO

As you see, we have five file groups now.

Let's create a Partitioned Table and load it using AdventureWorks2014 Sales.SalesOrderHeader table. Note that, below code creates a Partition Function with 2014, 2015, 2016 and 2017 as boundary values. And it creates the Partition Scheme setting with multiple file groups.

-- Creating parition function and scheme
USE AdventureWorksTest;
GO

CREATE PARTITION FUNCTION pf_Year (datetime) 
AS RANGE right
FOR VALUES ('2014-01-01 00:00:00.000', '2015-01-01 00:00:00.000',  '2016-01-01 00:00:00.000',  '2017-01-01 00:00:00.000');
GO

CREATE PARTITION SCHEME ps_Year 
AS PARTITION pf_Year 
TO ([PRIMARY], FileGroup1, FileGroup2, FileGroup3, FileGroup4);
GO



-- Creating a partitioned table and loading from AdventureWorks2014 database
-- Note that sales dates of AdventureWorks2014 database have been updated for year 2014 to 2017
CREATE TABLE Sales
(
 SalesOrderID int NOT NULL,
 RevisionNumber tinyint NOT NULL DEFAULT (0),
 OrderDate datetime NOT NULL DEFAULT GETDATE(),
 DueDate datetime NOT NULL,
 ShipDate datetime NULL,
 [Status] tinyint NOT NULL DEFAULT(1),
 SubTotal money NOT NULL DEFAULT(0.00),
 TaxAmt money NOT NULL DEFAULT (0.00),
 Freight money NOT NULL DEFAULT (0.00),
 TotalDue money,
 Comment nvarchar(128) NULL,
 rowguid uniqueidentifier ROWGUIDCOL NOT NULL DEFAULT NEWID(),
 ModifiedDate datetime NOT NULL DEFAULT GETDATE()
) 
ON ps_Year(OrderDate);
GO

INSERT INTO Sales
  (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]
  , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate)
SELECT SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status]
  , SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate
FROM AdventureWorks2014.Sales.SalesOrderHeader;

-- Checking records;
SELECT  $PARTITION.pf_Year(OrderDate) PartitionNo, *
FROM Sales;

Here is the result of the query.

Here is another query to see how data is distributed with each file group.

-- Get the record count for each partition/file group
SELECT t.name AS TableName,
  p.partition_number PartitionNumber,
  COALESCE(f.name, d.name) FileGroupName, 
  SUM(p.rows) NumberOfRecords,
  SUM(a.total_pages) NumberOfDataPages
FROM sys.tables AS t
 INNER JOIN sys.indexes i 
  ON i.object_id = t.object_id
 INNER JOIN sys.partitions p 
  ON p.object_id = t.object_id 
   AND p.index_id = i.index_id
 INNER JOIN sys.allocation_units a 
  ON a.container_id = p.partition_id
 LEFT OUTER JOIN sys.filegroups f 
  ON f.data_space_id = i.data_space_id
 LEFT OUTER JOIN sys.destination_data_spaces AS dds 
  ON dds.partition_scheme_id = i.data_space_id 
   AND dds.destination_id = p.partition_number
 LEFT OUTER JOIN sys.filegroups d 
  ON d.data_space_id = dds.data_space_id
WHERE t.[type] = 'U' AND i.index_id IN (0, 1) AND t.name LIKE 'Sales'
GROUP BY t.name, p.partition_number, COALESCE(f.name, d.name) 
ORDER BY t.name, p.partition_number;

If I illustrate the same in different manner;

This is how the data is distributed with my system now. If I try to delete a data file, example, AdventureWorksTest_Data_03 that is belong to FileGroup3, SQL Server will not allow me to do as it holds data.

If I decide to combine 2015 and 2016, in other words, if I want to remove the 4th partition that holds 2016 data, what will happen. Let's do and see. The below codes merge the partition;

ALTER PARTITION FUNCTION pf_Year()
MERGE RANGE ('2016-01-01 00:00:00.000');

Once delete, let me run the same query to see how data is partitioned now.

As you see, FileGroup3 is no longer used by the table and number of records in the FileGroup2 has been increased. The partition which was 5 is now 4. This means, data that was maintained with FileGroup3 (or AdventureWorksTest_Data_03) has been moved to FileGroup2 (or AdventureWorksTest_Data_02). This is what has happened;

This clearly shows that data is moved when partitions are merged. Okay, do we have to really consider about this? Yes, we have to, when this is a small table, we do not see any performance issue but if the table is large, moving data from one file to another will surely take time, hence, think twice before merging, specifically when they are distributed in multiple files.

** Now, if you need, you can delete the AdventureWorksTest_Data_03 file and SQL Server will allow you to delete as it is empty.

Sunday, December 18, 2016

Moving a table with millions of records to another within a second - SQL Server Brain Basher of the Week #061

This is a very common Interview Question and many come with different solutions but most forget some easy ways of doing it. Sometime we need to move data from one table to another (Note that not copying) but depend on the method we use and the number of records, it will take few seconds to many hours.

However if the structure of the second table is same as first and need to move all records, then what is the best way?

Here is the solution. I have a table called dbo.InternetSales and it has 6 millions records. Let's see how long it takes to move the entire record set from this table to a table called dbo.InternetSales2 that has the same structure.

See the below code;

SELECT COUNT(*) InternetSalesCount FROM dbo.InternetSales;
SELECT COUNT(*) InternetSales2Count FROM dbo.InternetSales2;

ALTER TABLE dbo.InternetSales SWITCH TO dbo.InternetSales2;

SELECT COUNT(*) InternetSalesCount FROM dbo.InternetSales;
SELECT COUNT(*) InternetSales2Count FROM dbo.InternetSales2;

As you see, all records have been transferred from first table to second table within a second :).

How do we do it? This is actually called as Partition Switching. Read more on this with post: Fact Loading: Switching a loaded temporary table into an empty partition of the fact table

Thursday, September 1, 2016

SQL Server Lock Escalation and handling it at partition level

SQL Server locking mechanism makes sure that database is consistence during a transaction and it makes the database available for as much as concurrent users while a transaction is being run. Locks are acquired at both reading and writing. Locks acquired for reading, with default Isolation Level, are released as soon as the SELECT is completed but locks acquired for writing are held until the transaction is completed.

Lock Escalation

Locks are acquired at an appropriate level. It can be at key level, page level or entire table. The cost for maintaining locks is high when many locks are acquired for a transaction, hence SQL Server tries its best for minimizing the number of locks need to hold. It is done by converting fine-grained locks held by a transaction on a single resource to a single coarser-grained lock on the same resource. This is called Lock Escalation.

The Lock Escalation happens from either row or page locks to a table lock. It never happens from row locks to a page lock. With previous releases of SQL Server, the condition for deciding to escalate locks was hard-coded based on certain number of records that are locked at either row or page level. However, SQL Server 2016 uses multiple factors and one of them is memory consumption. In most cases, when lock manager consumes more than 40% of allocated memory, Lock Escalation can happen.

ALTER TABLE SET ESCALATION LEVEL

Whether the table is partition or not, the default behavior is to escalate to table level. The ALTER TABLE SET ESCALATION LEVEL that is used for controlling this, accepts three values;

TABLE: The default value. This escalates to table level whether the table is partioned or not.
AUTO: This is same as above but it escalates to partition level if the table is partitioned.
DISABLE: This prevents lock escalation but it does not guarantee.

Disabling at session level

If you need to stop escalation at session level, use can enable Trace Flag 1224 for the session. This stop Lock Escalation but it still can occur due to the memory pressure.

However, if you need to completely stop it, you can use 1211 instead of 1224. You need to be cautious on this as it might degrade the performance.

The following code shows how SQL Server escalates locks with different settings. The table InternetSales_Partition is partitioned by its OrderDateKey and default settings have not been changed.

BEGIN TRAN
 UPDATE dbo.InternetSales_NonPartition
  SET TaxAmt = TaxAmt * 10
 WHERE OrderDateKey BETWEEN 20130501 AND 20130831

--ROLLBACK TRAN

If we check the locks acquired using sp_lock;

As you see, entire table is locked with its default settings. Let's set it with AUTO option and see how it works.

ALTER TABLE dbo.InternetSales_Partition
SET (LOCK_ESCALATION = AUTO)

BEGIN TRAN
 UPDATE dbo.InternetSales_Partition
  SET TaxAmt = TaxAmt * 10
 WHERE OrderDateKey BETWEEN 20130501 AND 20130831

--ROLLBACK TRAN

As you see, now it is at the partition level, not at the table level. This increases the concurrency even though the Lock Escalation has occurred.

Friday, July 29, 2016

Fact Loading: Switching a loaded temporary table into an empty partition of the fact table

Fact table in data warehouses are always partitioned and with most of solutions, it is partitioned on a date key. Generally, we use SSIS for loading dimensions and fact tables and ETLs written for loading the Fact Table can be optimized by loading data into another table and switching it to an empty partition of the fact table. This is one of the best practices in data warehousing data loading and technique for doing is not a complex task.

Assume you have a Fact Table that is partitioned by moths, example, one partition for 201601, another for 201602, and so on, and data is loaded till July 2016 (201607). Then when we load August data, we can load the data into a new table that has the same structure and then switch that table in to 201608 partition in the fact table for optimizing the data loading and minimizing the impact to the fact table. The below diagram shows it;

Here is the way of doing it. I will take the same Fact Table created with this post: SSIS - Fact Loading with Dimension Type II and Dimension Type I. You can use the same code for creating dimension tables and fact tables for trying out this. In that post, I have loaded data related to June and July 2016. Let's see how we can load August data using the technique discussed in this post.

1. Let's take this data set for loading August data.

SELECT CONVERT(date, '2016-08-01') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-12') As TransactionDate, 23 AS ProductId, 101 AS CustomerId, 200 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-14') As TransactionDate, 23 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity
UNION ALL
SELECT CONVERT(date, '2016-08-22') As TransactionDate, 22 AS ProductId, 100 AS CustomerId, 201 AS EmployeeId, 500 AS SalesAmount, 2 AS SalesQuantity

2. We need a different table with the same structure for loading processed data. The below code creates another table for holding data temporarily. Note that it has same columns and it is partitioned same as FactSales table.

CREATE TABLE FactSalesTemp
(
 DateKey int NOT NULL INDEX IX_FactSalesTemp CLUSTERED,
 ProductKey int NOT NULL,
 CustomerKey int NOT NULL,
 EmployeeKey int NOT NULL,
 SalesAmount money NOT NULL,
 SalesQuantity smallint NOT NULL
)
ON ps_SalesDate (DateKey)
GO

3. I will be using the same SSIS package used with my previous post. As you see, now the destination is set to newly created table which is FactSalesTemp.

4. Before loading data into new table, let's add another partition to both table for August data. All we need to do is, add a file group to the scheme and boundary value to the function;

-- Add another file group to the scheme 
ALTER PARTITION SCHEME ps_SalesDate
NEXT USED [PRIMARY]
GO

-- splitting the last partition by adding another boundary value
ALTER PARTITION FUNCTION pf_SalesDate ()
SPLIT RANGE (20160801)
GO

5. Now we can execute the SSIS package and load data into newly created table. Once the data set is loaded into FactSalesTemp, we can check both tables and how partitions are filled.

How partitions are filled;

How tables are filled;

6. As you see, data is loaded into newly created table and they are in partition 9. Now we need to switch the new table into FactSales 9th partition. Here is the way of doing it.

ALTER TABLE dbo.FactSalesTemp 
SWITCH PARTITION 9 TO dbo.FactSales PARTITION 9;

7. Now if you check the records in the table, you will see the FactSales is loaded with new data, and it is loaded to the correct partition.

You can have Altering scheme and function in the Control Flow as a Execute SQL Task before the Data Flow Task added. And, you can have Switching partition with another Execute SQL Task just after the Data Flow Task for completing the SSIS package.

Thursday, May 19, 2016

Truncating Partitions - SQL Server 2016 TSQL Enhancements

For a speedy delete or cleaning the table and resetting some of the properties of the table, we use TRUNCATE TABLE statement. This was really useful specifically with large data table because it saves both resources and time. However this was limited to the entire table, truncating a partition was not possible with it.

SQL Server 2016 has extended this functionality by allowing us to truncate individual partitions or set of partitions. This is really useful if a partition needs to be cleaned up for refreshing data. Here is a sample code on it;

This code creates a database, table in it and inserts set of records;

-- creating database
CREATE DATABASE [Sales] ON  PRIMARY 
( NAME = N'Sales', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016_M\MSSQL\DATA\Sales.mdf' , SIZE = 3048KB , FILEGROWTH = 1024KB ), -- default FG
 FILEGROUP [CurrentFileGroup] 
( NAME = N'Sales Current', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016_M\MSSQL\DATA\Sales Current.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [HistoryFileGroup] 
( NAME = N'Sales Histoty', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016_M\MSSQL\DATA\Sales Histoty.ndf' , SIZE = 2048KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'Sales_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016_M\MSSQL\DATA\Sales_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
GO

USE Sales
GO

-- creating the partition
CREATE PARTITION FUNCTION pf_SalesDate (datetime)
AS RANGE RIGHT
FOR VALUES ('1/1/2003', '1/1/2004', '1/1/2005', '1/1/2006')
GO


-- creating the scheme
CREATE PARTITION SCHEME ps_SalesDate
AS PARTITION pf_SalesDate
TO (HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, CurrentFileGroup)
GO

-- creating the partitioned table
CREATE TABLE SalesOrders
(
 ID int IDENTITY(1,1) NOT NULL,
 [Date] datetime NOT NULL,
 CustomerID int NOT NULL,
 CONSTRAINT PK_SalesOrders PRIMARY KEY (ID, [Date])
)
ON ps_SalesDate ([Date])
GO

INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('01/01/2003', 1)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('02/01/2003', 2)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('01/01/2004', 5)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('02/01/2004', 7)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('05/01/2004', 15)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('01/01/2005', 20)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('03/01/2005', 34)
INSERT INTO SalesOrders ([Date], CustomerID) VALUES ('01/01/2006', 56)
GO 1000000 -- You can change this as you want

-- checking partitions and number of rows
SELECT p.partition_number, rows
FROM sys.partitions p
 INNER JOIN sys.objects o
  ON o.object_id = p.object_id
WHERE o.type = 'U' AND o.name = 'SalesOrders';

Let's try to clean a partition by using DELETE statement.

As you see, it takes 6 seconds for deleting. Since truncating partitions is possible now, it can be used for deleting, saving time and resources.

Truncating is possible with one partition, multiple partitions and ranges;

-- One partition
TRUNCATE TABLE dbo.SalesOrders WITH (PARTITIONS (2));

-- Multiple partitions
TRUNCATE TABLE dbo.SalesOrders WITH (PARTITIONS (2, 3, 4));

-- Range
TRUNCATE TABLE dbo.SalesOrders WITH (PARTITIONS (2 TO 4));

-- Combining range and multiple partitions
TRUNCATE TABLE dbo.SalesOrders WITH (PARTITIONS (1, 2, 4 TO 6));

Saturday, May 23, 2015

Fact Table: Clustered index on date key is enough or partitioning should be done with it?

Fact table in a data warehouse (or a large table in an OLTP database) always holds millions of records and we keep on adding records largely. Generally, it is discouraged to add indexes and constraints to Fact Tables but with modern way of implementing data warehouse / BI solutions, if the table is open for business users for analysis, having indexes on Fact Tables would be beneficial. However, whether it is a traditional implementation or modern implementation, it is still recommended to have a clustered index on most frequent key used for analysis and loading which is Date Key.

While discussing Design Considerations of Fact Tables, a question came up: Clustered index on date key is enough or should we partition the table based on date key too? Do we get additional performance benefits by partitioning on the same?

In terms of performance, if the table is not significantly large, I doubt that clustered index on date key and partitioning on the same will give extra performance benefits. But if it is a very large table, performance of data retrieval will be surely improved. Does this mean that Partitioning is not required for Fact Tables?

Let's analyze with a slightly large table. Below code shows a structure of InternetSales table that contains 7 millions records. It has no indexes, not even a clustered index.

-- Table structure
CREATE TABLE dbo.InternetSales(
 OrderDateKey int NOT NULL,
 ProductKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
) ON PRIMARY;
GO

If business user runs a query like below against the table, you can witness the number of pages (IO) have to be read for satisfying the query.

SET STATISTICS IO ON
GO

SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;

If we add a clustered index on Date Key and then test the same;

CREATE CLUSTERED INDEX ix_InternetSales
ON InternetSales (OrderDateKey);
GO

SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;

We will see a significant difference between previous number of pages read and number of pages read now.

Now the question is, partitioning this table will give more benefits or not. Let's partition and see.

-- Create a partition function
CREATE PARTITION FUNCTION pf_InternetSalesDate (int)
AS RANGE RIGHT
FOR VALUES ('20100101', '20110101', '20120101', '20130101', '20140101')
GO

-- Create the scheme
CREATE PARTITION SCHEME ps_InternetSalesDate
AS PARTITION pf_InternetSalesDate
TO (HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, HistoryFileGroup, CurrentFileGroup)
GO

-- Dropping clustered index for partitioning
DROP INDEX ix_InternetSales ON InternetSales;
GO

-- Recreating with Partitioning scheme
CREATE CLUSTERED INDEX ix_InternetSales
ON InternetSales (OrderDateKey)
ON ps_InternetSalesDate (OrderDateKey);
go

-- And executing the same Business user's query
SELECT 
 DATENAME(m,[OrderDate])
 , p.EnglishProductName Product
  , SUM(SalesAmount) AS TotalSales
  , SUM(TaxAmt) AS TotalTaxAmount
FROM dbo.InternetSales f
 INNER JOIN dbo.DimProduct p
  ON f.ProductKey = p.ProductKey
 INNER JOIN dbo.DimDate d
  ON f.OrderDateKey = d.DateKey
WHERE d.CalendarYear = 2013 
 AND d.EnglishDayNameOfWeek = 'Monday'
GROUP BY DATENAME(m,[OrderDate])
 , p.EnglishProductName;

Here is the result of it;

This result does not show much improvement when compare with previous result, hence one can come to a conclusion saying that partitioning is not required when a clustered index is created on date key. But even with this table, we see a slight improvement, means that a large table will definitely benefits with partitioning.

Okay, now what about this table? should we keep partitioning or not? I prefer to have both implementations because Partitioning gives few more benefits in addition to the performance benefit it offers. It offers More Granular Manageability, letting us perform some operation such as indexing, compression and backing up at partition level instead of on the whole table. It improves Data Loading since loading can be managed at partition levels. Considering all of these, should not we create the clustered index on date key as well as partition the table using date key?

Sunday, May 17, 2015

SQL Server Brain Basher of the Week #012 - Partitions

While I was discussing Partitioning with one of my discussions in this week, one enthusiast asked whether how many partitions can be created per table. Before answering, I asked the same from the audience; and thought to make it as SQL Server Brain Basher of the Week.

What is the maximum number of partitions can be added to a SQL Server table?

There are two answers for this because it is based on the installation; whether it is 32-bit or 64-bit. You can create 15,000 partitions if it is 64-bit version of SQL Server. If it is 32-bit version of SQL Server, only 1,000 partitions can be created.

Note that Partitioning is available only with Enterprise Edition and Developer Edition.

For more info related to Maximum Capacity Specification, refer: https://technet.microsoft.com/en-us/library/ms143432(v=sql.120).aspx.