Saturday, October 5, 2013

What is Columnstore Index - SS SLUG Aug 2013 – Demo V

If you have worked with large databases (or data warehouses), you have already seen enough of scenarios where you do not get much benefits out of indexes, particularly with large data tables. SQL Server 2012 ColumnStore index structure was specifically introduced for addressing this issue.

This new structure is based on Microsoft Vertipaq technology and it goes as a non-clustered index. The different between this and traditional indexing is the storage. The indexes we have worked so far store data in a row-wise structure where as this stores data in a column-wise structure. Another different which make this unusable with OLTP databases is, structure becomes Read-Only.

 ColumnStore1

How this exactly stores data was discussed in the user group meeting. Here are set of images that explains how SQL Server organizes data for ColumnStore indexes.

ColumnStore2

The first image shows a table that contains millions of records with hundreds of columns. If you execute a command for creating a ColumnStore non-clustered index, the first step of SQL Server is grouping records (image 2). Then it creates segments for each column in each group. Third image shows how 16 segments are created. SQL Server uses Dictionary Compression for compressing these segments. These segments will be stored in LOBs and become unit of transfer between the disk and the memory.

Let’s see how it can be used and the performance benefit. This sample code uses ContosoRetailDW and it can be downloaded at: http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=18279.

Have a look on following query. The query accesses three tables (one is very large) and performs set of aggregations.

USE [ContosoRetailDW]
GO
 
SET STATISTICS IO ON
SET STATISTICS TIME ON
 
SELECT 
    p.ProductName
    , d.CalendarYear
    , d. CalendarMonth
    , COUNT(*) TotalOrders
    , SUM(f.SalesQuantity) Quantity
    , SUM(SalesAmount) Amount
    , SUM(CASE WHEN f.DateKey = '2009-12-31 00:00:00.000' THEN SalesAmount ELSE 0 END) AmountToday
FROM dbo.FactSales f
    INNER JOIN dbo.DimDate d
        ON d.Datekey = f.DateKey
    INNER JOIN dbo.DimProduct p
        ON p.ProductKey = f.ProductKey
GROUP BY
    p.ProductName
    , d.CalendarYear
    , d. CalendarMonth
ORDER BY 1, 2, 3
 
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

It has read around 19,000 pages and has taken 40 seconds.

Statistics

Now, let’s create a ColumnStore index on the table. The recommended way is, include all columns when creating the index.

CREATE COLUMNSTORE INDEX IX_FactSales_CStore ON dbo.FactSales (
        SalesKey, DateKey, channelKey, StoreKey, ProductKey, PromotionKey, CurrencyKey
        , UnitCost, UnitPrice, SalesQuantity, ReturnQuantity, ReturnAmount, DiscountQuantity
        , DiscountAmount, TotalCost, SalesAmount,ETLLoadID, LoadDate, UpdateDate)

This might takes minutes based on the number of records in the table. Once it is created, run the SELECT again and see the result.

Statistics2

As you see, number of pages that have been read is around 8,000 and time taken is 2 seconds. It is a significant improvement. However the biggest issue with this is, now table has become a read-only table. However this can be applied in partition level and partitions that contain history data can be benefited from this.

This link: http://technet.microsoft.com/en-us/library/gg492088.aspx gives all info regarding ColumnStore indexes. Have a look on it for understanding restrictions, limitations and scenarios you can use it.

No comments: