Dinesh's Blog :::: Being Compiled ::::: September 2010

Thursday, September 30, 2010

PivotViewer Extension for SQL Server Reporting Services

You might have already seen Microsoft Silverlight PivotViewer for Excel (see my post on it). Just like that you are going to see PivotViewer for SSRS. Good news? Yes, but there is a bad news too, it is not yet available for downloading, it a kind of prototype, here how it goes;

Anyway, have a look on this video, this shows it.

Wednesday, September 29, 2010

SQL Server 2008 Service Pack 2 is available for downloading

SQL Server 2008 SP 2 is available for downloading now. Enhancements includes;

15K partitioning Improvement
Reporting Services in SharePoint Integrated Mode
SQL Server 2008 R2 Application and Multi-Server Management Compatibility with SQL Server 2008
Data-tier Application (DAC) Support

Read more info on this and download from here.

Table variable slower than Temporary table???

I used to use table variables when the resultset is smaller and want to hold it temporarily. The main reason for this selection is, data manipulation with table variables is faster than manipulation with temporary tables when the container contains a smaller amount of data. There were some instances, specially on JOINs, table variables show slowness, have been thinking to investigate but had no time. Thought start investigating on it again when I saw this post of Jack Li.

Simple Test
Let me start with an example, similar to an issue I had. The following code creates a table variable and loads sales data into it. Later, the table variable joins with a permanent table, to get a resultset. Note that there is no logic on the query, this is just for testing. Turn Include Actual Execution Plan on and run the query below.

     DECLARE @TestTable TABLE 

    (

        SalesOrderId int 

        , SalesOrderDetailID int

        , LineTotal money 

        , PRIMARY KEY (SalesOrderId, SalesOrderDetailID)

    )

    INSERT INTO @TestTable

    SELECT SalesOrderID

        , SalesOrderDetailID

        , LineTotal 

    FROM Sales.SalesOrderDetail

    SELECT t.SalesOrderId, s.UnitPrice

    FROM @TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

Now investigate the execution plan for SELECT query. See Estimated Number of Rows, it shows as 1 which is incorrect. This causes Optimizer to use Nested Loops instead of Merge Join and reduces the performance of data retrieval.

Is Merge Join better than Nested Loops?
According to the BOL, Query Optimizer determines the cost of query plan and selects based on two factors;

The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
The cost model of the algorithm dictated by the operators used in the query.

The second factor depends on first one. Therefore having a correct cardinality will make sure that faster execution plan is generated. As the plan generated for our code does not provide a correct cardinality, the algorithm used for joining is not a optimized way to join two results.

Optimizer selects Nested Loop if one of tables contain smaller number of records, based on Estimated Number of Rows, that is the reason for selecting Nested Loop algorithm. But the reality is, number of records in variable table is not small (in this case).

Merge Join works differently. Usually Optimizer selects this algorithm when both resultsets are sorted, in our case, yes, it is. The matching operation for this join works differently and it is usually faster than Nested Loop. For more info, read this.

Using #Temporary Table
The below code does the same, only different is a temporary table has been used instead of table variable.

    USE AdventureWorks2008

    GO

    CREATE TABLE #TestTable 

    (

        SalesOrderId int 

        , SalesOrderDetailID int

        , LineTotal money

        ,CONSTRAINT pk_TestTable PRIMARY KEY

            (SalesOrderId, SalesOrderDetailID)

    )

    INSERT INTO #TestTable

    SELECT SalesOrderID

        , SalesOrderDetailID

        , LineTotal 

    FROM Sales.SalesOrderDetail

    SELECT t.SalesOrderId, s.UnitPrice

    FROM #TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

Now you can see that cardinality of the plan is correct and Merge Join is used as the algorithm for joining two results.

A reason for wrong “Estimated Number of Rows” with table variable
As Jack Li explains, when the code is compiled as a batch, since the table variable does not contain data at the initial stage, optimizer makes an assumption, saying that table variable contains very low number of data, resulting a plan with Nested Loop. One other reason could be unavailability of statistics. SQL Server does not maintain statistics for table variables.

Solving the issue
Again, solution from Jack Li, all we have to do is, recompiling the SELECT before executing the SELECT code. At this time table variable contains data, hence Estimated Number of Rows is correctly determined. See the code below;

    SELECT t.SalesOrderId, s.UnitPrice

    FROM @TestTable t

    INNER JOIN Sales.SalesOrderDetail s

        ON t.SalesOrderId = s.SalesOrderID

            AND t.SalesOrderDetailID = s.SalesOrderDetailID

    OPTION (RECOMPILE)

There will be a cost for recompiling, hence always compare the cost and the improvement, then select the best way.

Monday, September 27, 2010

Tech Insights 2010 – Penang, Malaysia

Tech Insights 2010 was an event organized by Malaysia Microsoft Enthusiasts to explorer new technologies related to Microsoft. It was great to me too; It was a great opportunity to participate as a speaker. I must thank to all organizers, they had made this event as a successful event, everything was organized very well.

See more details on the event: www.techinsights.my

I delivered two SQL sessions, both related to Business Intelligence. The audience was great, many questions were thrown, always lover to see an audience like this, was able to made both sessions as interactive sessions.

My two sessions were named as Simplifying Master Data Management with SQL Server 2008 R2 and Strengthening DW/BI System Loading. I am waiting for feedbacks and comments on these two, will publish once I have them.

- me, delivering the first presentation

- During the first session, audience participation

- More questions from audence

- Me, delivering my second presentation

Friday, September 10, 2010

SQL Server data compression: PAGE

Although the storage cost is going down, we still think about storage optimization because once optimized, it is not only about the cost, it improves many things. One of the features available with SQL Server 2008 Enterprise Edition (and Developer Edition) is data compression. SQL Server supports compression on both ROW and PAGE; this discusses data compression on pages.

The PAGE compression minimizes the storage requirement for redundant data. This compression is applied for each page. Once enabled, SQL Server goes through two techniques; Prefix Compression and Dictionary Compression.

Prefix Compression: This identifies a value for each column that can be used for compression. Then a row that contains values for each column is created below the page header. Next, values in columns will be replaced with the reference to the value in newly created row. Note that column can only reference a value created for it, not for other columns. See the image taken from BOL. It shows a table with three columns. You can see how the column values are fully and partially linked with the references.

Dictionary Compression: This compresses the page further, finding the repeated values anywhere in the page and storing them in same way. Unlike prefix compression, any column value can reference any value added, not limiting to the value created for it. See the image taken from BOL.

Now let’s see how we enable this. Create the following table and insert records as given.

     CREATE TABLE TestTable

    (

        Id int IDENTITY(1,1) PRIMARY KEY

        , Value varchar(50) NOT NULL DEFAULT ('This is the default value')

    )

    GO

    INSERT INTO TestTable

        (Value) VALUES (DEFAULT)

    GO 100000

Retrieve all data, enabling STATISTICS IO on. You will notice that data is held with 523 pages.

sp_estimate_data_compression_savings: This system stored procedure allows us to determine the space can be saved when compression is applied. Execute the code given and analyze the result. First stored procedure can be used to find out the space it currently uses. Second shows the estimate data compression savings.

    -- check the space used

    sp_spaceused 'TestTable'

    -- determine data compression savings

    EXEC sp_estimate_data_compression_savings 'dbo', 'TestTable', NULL, NULL, 'PAGE'

The sp_spaceused says that 4168KB is required for storing data. Note the column “size_with_requested_compression_settings” coming from second stored procedure. It says that the size after compression would be 1096KB, encouraging us to go ahead with compression.

Enabling PAGE compression: This can be done with either CREATE TABLE or ALTER TABLE. Use below code for compressing the table. You can pass NONE, ROW, or PAGE for DATA_COMPRESSION.

    ALTER TABLE TestTable

    REBUILD WITH (DATA_COMPRESSION = PAGE)

Now the data is compressed. Check retrieving records with STATISTICS IO on, you will notice that only 139 pages are used now. Check the space used too. Has it gone down to 1096KB?

Disadvantages: Are there? Yes, there is. This increases the CPU cost for both reading and modifying. If you are to enable, consider this factor too.