Dinesh's Blog :::: Being Compiled ::::: 2010

Wednesday, November 24, 2010

Analysis Services Security: Dimension Data – Select all members or Deselect all members?

While configuring Analysis Services security on Dimension Data, I did not pay much attention on two options available with the screen; Select all members and Deselect all members. However the importance of them was raised up when my colleague Asanka Padmakumara showed a point on them. Which one should we select? See the screen below;

I have selected the first option which is Select all members and granted permission on Accessories and Bikes. I can do the same, select the other option too. Let’s move into Advanced section and see how it has been set;

Note that other two categories (Clothing and Components) have been added under Denied member set. What we have set? Yes, we granted permission on Accessories and Bikes but we have done indirectly is, denying permission on Clothing and Components. Main thing we need to consider is, permission on future categories. Since the items we did not select have gone to Denied member list, all future items will be available for this role. If you need to let this role to access future categories without explicitly setting them, use Select all members for setting permission.

Now let’s use the other option; Deselect all members;

I have given same permission set but the option selected is Deselect all members. Have a look on Advanced section now;

Now you can see the items we selected have been added to Allowed member set. This means all other items we did not select are denied to this role. What will happen to future categories? This is the difference. Since we have explicitly added Accessories and Bikes to Allowed member set, future items will NOT be available under Allowed member set. If you need to disallow this role to access future categories without explicitly setting them, use Deselect all members for setting permission.

Monday, November 22, 2010

Analysis Services Security: DENY does not override GRANT

Is the famous rule of thumb broken? I am sure that you all are aware the rule and you know that DENY always wins over GRANT. If you want to see an exception (or a situation) read one of my previous posts “Column GRANT overrides DENY TABLE, Avoiding with Common Criteria Compliance”.

This post discusses another situation, it is not with SQL Server Relational Engine, it is will SQL Server Analysis Services.

Analysis Services behaves bit differently. As you know, if an object of SQL Server database is denied explicitly, it is always denied even with an explicit grant. For example, if you are in a Role that grants SELECT on Table1 and in another role that denies SELECT on Table1, you will not be able to execute SELECT on Table1. Analysis Services authorizes on items differently. It unions all grants and authorizes. If you are in a Role that grants on Item1 and Item2 (Note that in Analysis Services, once items are granted, all other items are denied, no need to explicitly deny them as SQL Server database engine objects) and in another Role that grants Item2 and Item3, you are granted for Item1, Item2, and Item3.

Let me show you a sample on this. The below image shows a Role in Analysis Services Adventure Works DW 2008R2 database. This is how it is created;

Name: Business Users
Membership: Dinesh
Cubes: Adventure Works – Read
Dimension Data:
- Product Dimension, Category – granted for Accessories and Bikes
- Other Dimensions – granted for all

Second image shows another role in same database. Its properties are as follows;

Name: Coordinators
Membership: Dinesh
Cubes: Adventure Works – Read
Dimension Data:

Product Dimension, Category – granted for Bikes and Clothing
Other Dimensions – granted for all

Business Users role denies for Clothing and Components. Coordinators role denies for Accessories and Components. When Dinesh sees data, he sees union of both; Accessories, Clothing, and Bikes. The reason for this is, Analysis Services uses all GRANTs from all Roles for connected user and then authorizes. That is why he sees Accessories and Clothing even though they are denied from each role. This is how he sees when Categories are loaded with Excel.

Note: This type of security cab be tested with Management Studio too. Start Cube Browser and click on Change User icon. The opened window allows you to set the context with Current User, Other User, or Roles.

Tuesday, November 16, 2010

Way to Visualize: Project Crescent; What is it?

The Era that Business User is used to carry an armload of tools for analyzing and visualizing data, has come to an end!!!!! Yes, Microsoft never stops making business user’s life easier. Microsoft never stop supporting business user to manage and visualize data in a rich way. Microsoft has come out with an exciting facility; The Project Crescent.

What is Project Crescent?

Project Crescent is the recent Microsoft Business Intelligence team innovation that helps to manage and visualize large amount of data quickly in very rich format. It has been built entirely on Silverlight and will be available as a BI client. It allows to show the insights of the business with reports, with highly interactive visualization and animations.

The releasing date of this is unknown. Most probably it will be released with Denali, at the end of 2011 or beginning of 2012. Long time to wait…..

Here is a sneak peak of project Crescent, in Amir Netz’ demo at PASS Summit 2010:

Here is a post from SQL Server Reporting Services team blog: http://www.silverlightshow.net/news/Project-Crescent-built-entirely-on-Silverlight.aspx

Friday, November 12, 2010

Reporting Services: Difference between RDL and RDLC files

Microsoft Reporting Services is an enterprise-capable reporting solution that can be used to create all types of reports such as Production, Analytical, and Dashboard. Building reports is done through Business Intelligence Development Studio which is an extension of Visual Studio. Reports are created as RDL (Report Definition Language) files. Once the reports are created with BIDS, they (RDLs) are hosted in Reporting Services and users can be accessed them in many ways.

What are RDLC files?
RDLC: Report Definition Language, Client Side files are files that are generated with either ReportViewer Control that comes with Visual Studio 2008 Windows or ASP.NET project templates, or Reporing –> Report Application project template. It is a XML file, just like the RDL. It uses the same XML schema that is used by RDL files but RDLC does not require to have values for all elements such as query text. If RDLC file is needed to be hosted in Reporting Services, it needs to be renamed as RDL and all empty elements must be filled manually.

Reporting with ReportViewer Control
The ReportViewer Control allows us to embed a report to ASP.NET web page or Windows form. It supports two modes: Remote Processing mode and Local Processing mode. The Remote Processing mode allows to connect with reports that have been already deployed to SSRS instance. In this case, all the processing take place at the SSRS server. The Local Processing mode allows to create reports within Visual Studio itself. The extension of them are RDLC. This does not require an instance of Reporting Services, hence processing is handled by the client application.

Here are some of important points to remember about these two modes:

Local Processing mode does not require an instance of Reporting Services, hence a license for Reporting Services is not required.
Local Processing mode is not recommended for large reports. It is used for small, OLTP types of reports. They may run infrequently.
Remote Processing can be scaled out but Local Processing.
Local Processing mode supports following extensions only:

Visual Studio 2008: PDF and Excel
Visual Studio 2010: PDF, Excel and Word

Local Processing mode with Visual Studio 2008 does not support Tablix data region. Visual Studio 2010 supports.
Mapping parameters with query parameters has to be done manually with Local Processing mode.

Thursday, November 11, 2010

Future version of SQL Server: “Denali”, CTP available for downloading

Microsoft announced that Community Technology Preview of SQL Server code-named “Denali” available for downloading. This CTP contains enhancements mainly on SQL Server high availability and engine not with BI components such as Analysis Services and Reporting Services.

To see details: http://www.microsoft.com/sqlserver/en/us/product-info/future-editions.aspx

To download: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=6a04f16f-f6be-4f92-9c92-f7e5677d91f9&displaylang=en

To see the enhancements: http://msdn.microsoft.com/en-us/library/bb500435(SQL.110).aspx

Thursday, November 4, 2010

Ready-made Virtual Machine for SharePoint 2010

Are you looking for pre-built virtual machine that includes SharePoint 2010? If yes, here it is… it is available for downloading.

Not only SharePoint 2010, it contains:

Microsoft Office Communication Server 2007 R2
Microsoft Visual Studio 2010
Microsoft SharePoint Server 2010 Enterprise Edition
Microsoft Office Web Applications
Microsoft FAST Search for SharePoint 2010
Microsoft Project Server 2010
Microsoft Office Professional Plus 2010
Microsoft Visio 2010
Microsoft Project 2010
Microsoft Office Communicator 2007 R2

For more details and downloading: 2010 Information Worker Demonstration and Evaluation Virtual Machine.

Tuesday, November 2, 2010

SharePoint Videos: SharePoint 2010 Developer Training Course

Looking for resources to learn SharePoint 2010 development? So many books, articles, and blog posts? Here is an easy way to learn it;

Site: SharePoint 2010 Developer Training Course
Download link: SharePoint 2010 Developer Training Kit

This training kit provides videos and hands on labs which are very useful to understand how SharePoint 2010 works and how it can be used.

We must thank Chris Mayo, Donovan Follette, Girish Raja, Paul Stubbs, and Steve Fox for preparing something like this for developers.

Monday, November 1, 2010

SSIS: Replacing Slowly Changing Dimension (SCD) wizard with the MERGE statement

Slowly Changing Dimensions are business entities in DW/BI systems that have attributes that do not change very often. In order to handle loading of SCDs, we use SQL Server Integration Services Data Flow component: Slowly Changing Dimension Component. It is a wizard that helps us to handle changing attributes.

If you are a SSIS developer and have used SCD wizard, you have already seen the bad performance on it. It works fine with a dataset that contains records less than something around 5000 (this is what I have seen, please do not consider it is as a benchmark) but gives very slow performance on beyond. Recently, I noticed that one of the projects done had used SCD wizard for almost all data loading and the time it takes for completing the load has gone from 5 minutes for 5-6 hours. It significantly increases the time when the data load is getting higher and higher.

How to improve the performance of loading of SCDs?
I had to give a solution for this…. two things came into my mind, one is Kimball Method SSIS Slowly Changing Dimension Component (will make a separate post on this) and the other is MERGE T-SQL statement. Although the MERGE implementation is bit complex, I thought to recommend the MERGE because many have accepted that the performance it gives is superb. The below table shows a comparison made on three different approaches on SCDs loading with some criteria (which is quoted from Matt Masson’s presentation named Performance Design Pattern).

You can easily notice that MERGE gives the best performance though there are drawbacks.

Type of Slowly Changing Dimensions
Slowly Changing Dimensions are categorized into three types named: Type 1, Type 2, and Type3. The Type 1 SCD does not maintain the history of changing attributes, it overwrites values of the attributes. Type 2 maintains historical values for changing attributes. Type 3 that we do not use much (I have not used it at all) maintains separate columns for changed attributes. SSIS SCD wizard supports both Type 1 and Type 2.

Using MERGE instead of SCD wizard
Replacement of SCD wizard with MERGE is not a straightforward technique. If the SCD has both Type 1 and Type 2 types attributes, they need to be handled separately. Let’s create a simple table with following set of data, in order to see the way of handling them with MERGE.
Following attributes in this table are Type 1:

FirstName
LastName

Following attributes are Type 2:

MaritalStatus
Country

Following script creates the Dimension table and inserts data;

     CREATE TABLE dbo.DimCustomer

    (

        CustomerKey int IDENTITY(1,1) PRIMARY KEY

        , SourceCode char(5) NOT NULL

        , FirstName varchar(50) NOT NULL -- Type 1

        , LastName varchar(50) NOT NULL -- Type 1

        , MaritalStatus char(1) NOT NULL -- Type 2

        , Country  varchar(50) NOT NULL -- Type 2

        , StartDate date NOT NULL

        , EndDate date NOT NULL

        , CONSTRAINT IX_DimCustomer_SourceCode UNIQUE (SourceCode)

    )

    GO

    INSERT INTO dbo.DimCustomer

        (SourceCode, FirstName, LastName, MaritalStatus, Country

        , StartDate, EndDate)

    VALUES

        ('AAAA1', 'Dinesh', 'Priyankara', 'M', 'Sri Lanka'

            , '01/01/2009', '12/31/9999')

        , ('AAAA2', 'Yeshan', 'Senthush', 'S', 'Sri Lanka'

            , '01/01/2009', '12/31/9999')

        , ('AAAA3', 'Jane', 'Alvarez', 'S', 'France'

            , '01/01/2009', '12/31/9999')

        , ('AAAA4', 'Shannon', 'Carlson', 'M', 'India'

            , '01/01/2009', '12/31/9999')

    GO

When data loading of SCDs are handled with SSIS,what we usually do is, once the data set is captured, we send it through SCD component. The SCD component inserts new data, updates Type 1 data, and finally inserts Type 2 data. If we are to use MERGE, we need to send the captured data to a temporary table and then use Execute SQL Task for executing the MERGE statement. Assume that following table is the temporary table and it is loaded with captured data.

This code creates the table and populates data;

    CREATE TABLE dbo.TempDimCustomer

    (

        SourceCode char(5) NOT NULL

        , FirstName varchar(50) NOT NULL

        , LastName varchar(50) NOT NULL

        , MaritalStatus char(1) NOT NULL

        , Country  varchar(50) NOT NULL

    )    

    GO

    INSERT INTO dbo.TempDimCustomer

        (SourceCode, FirstName, LastName, MaritalStatus, Country)

    VALUES

        ('AAAA1', 'Dinesh', 'Priyankara', 'M', 'Sri Lanka')

        , ('AAAA2', 'Yeshan', 'Santhush', 'S', 'Sri Lanka')

        , ('AAAA3', 'Jane', 'Alvarez', 'M', 'France')

        , ('AAAA4', 'Shannon', 'Carlson', 'M', 'German')

        , ('AAAA5', 'Jon', 'Yang', 'M', 'Japan')

    GO

Handling Type 1 attributes
The below codes shows the MERGE statement that updates Type 1 attributes. Note that it goes through all records and updates.

    MERGE INTO dbo.DimCustomer c

    USING dbo.TempDimCustomer tc

    ON (c.SourceCode = tc.SourceCode)

    WHEN MATCHED 

        AND 

            (c.FirstName != tc.FirstName

            OR c.LastName != tc.LastName)

        THEN UPDATE

            SET c.FirstName = tc.FirstName

                , c.LastName = tc.LastName;

Handling Type 2 attributes
Handling Type 2 is tricky. If a record is changed, we need to update the old record, setting the EndDate as current date, and then insert a new record for it, with StartDate as current date. See the code below;

    INSERT INTO DimCustomer

        (SourceCode, FirstName, LastName, MaritalStatus, Country

        , StartDate, EndDate)

    SELECT Modified.SourceCode, Modified.FirstName

        , Modified.LastName, Modified.MaritalStatus

        , Modified.Country, Modified.StartDate

        , Modified.EndDate

    FROM

        (MERGE INTO dbo.DimCustomer c

        USING dbo.TempDImCustomer tc

        ON (c.SourceCode = tc.SourceCode)

        WHEN NOT MATCHED 

            THEN INSERT

                (SourceCode, FirstName, LastName, MaritalStatus, Country

                , StartDate, EndDate)

                VALUES

                (tc.SourceCode, tc.FirstName, tc.LastName, tc.MaritalStatus, tc.Country

                , GetDate(), '12/31/9999')

        WHEN MATCHED 

            AND c.EndDate = '12/31/9999'

            AND (c.MaritalStatus != tc.MaritalStatus

                OR c.Country != tc.Country)

            THEN UPDATE

                SET c.EndDate = getDate()

            OUTPUT 

                $Action Action, tc.SourceCode, tc.FirstName, tc.LastName

                , tc.MaritalStatus, tc.Country

                , '10/28/2010' StartDate, '12/31/9999' EndDate) AS Modified

    WHERE Modified.Action = 'UPDATE';

See the inner MERGE first. With first NOT MATCHED section, it inserts new records. Next is for finding MACTHED and Type 2 changes. Note that the check is done only on latest records (EndDate = ‘12/31/9999’). If found, EndDate is updated and records are returned as a OUTPUT of the MERGE. This output is captured by outer INSERT statement and inserts them as new records. Done!

Please do test both approaches before deciding the technique. If the data set is large, it would be better to use MERGE instead of SSIS SCD component.

Thursday, October 28, 2010

Dimension has no attributes…Can I make it as Degenerated Dimension?

One of my friends recently came up with an interesting topic where he has faced an issue with his Data Warehouse design related to Insurance industry. He has come across a situation where he needs to build a dimension for Insurance Claim numbers that holds ONLY claim numbers. He has already build other related dimensions such as Customer and Product, and has built the Fact table too.

This particular Insurance Claim number is one of the attributes set at grain level of the fact. In addition to that, no common claim numbers, all are unique, and number of claim numbers are grown at the same rate that fact table grows. He wanted my opinion on this, thought degenerate the claim number without maintaining a separate dimension. The below image explains what I suggested;

As you see, first diagram has a dimension for maintaining claim numbers. The second has no dimension for claim numbers and claim number has been added to the Fact table. The best option I see is, making it as a degenerated dimension because;

Number of records in the ClaimNumber dimension grows at the same rate that Fact table grows.
Claim number has no other descriptive attributes.
Claim number does not exist outside the transaction (the claim).

I am facing a similar issue on Banking related DW, the Fact focuses on Deposits and the transaction related deposits contain Comments. Comments are not available for all transactions and they are not common among transactions too. I do not like to see NULLs in my Fact table but thought to degenerate into Fact table. I wish I could argue on this with someone……… :)

If you need to read more on Degenerated Dimension, read Kimballl Design Tip #46 at http://www.rkimball.com/html/designtipsPDF/DesignTips2003/KimballDT46AnotherLook.pdf.

Tuesday, October 26, 2010

Is Microsoft's SQL Server really cheaper than Oracle?

Sean McCown has written an article on “http://www.networkworld.com/community/node/63390”. Good article, worth to read.

Do not forget to read comments too.

Monday, October 25, 2010

SSIS Error: Named Pipes Provider: The specified network name is no longer available.

A perfectly working package suddenly throws an error;

This package is a kind of complex one, it contains three Data Flow tasks and couple of other tasks. It basically loads a Fact Table in one of Data Warehouses, goes through set of LookUps which have set for Dimension tables.

The error raises with one of the LookUps. The lookUp is supposed to go through around 40,000 records but stops at around 6,000, throwing the error. Once this started giving the issue, the only difference we can thought about current execution and early executions is, the amount of data, it has been increased.

Solving the issue by changing the protocol TCPIP
After searching for solutions, going through many posts, based on some recommendations, we decided to force SSIS to use TCPIP instead of Named Pipes. We disabled the Named Pipe protocol in SQL Server through SQL Server Configuration Manager. We did the same with where we run the package. Finally we ran the package and ended up with same point;

“[Lookup for - Pending Duration [7825]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Login timeout expired".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "TCP Provider: Only one usage of each socket address (protocol/network address/port) is normally permitted.
".”

We realized that the issue is not exactly related to protocols used by the package hence, went through some of the implementations we have done where we have slightly derailed the track of standardization. One implementation was related to transactions, where we had used explicit BEGIN TRAN and COMMIT TRAN to control the transactions without using the built-in facility. This type of implementation has been explained here if you need to know how to do it. The reasons for using this is arguable but some of them were, MS DTC issues, Firewalls, and company policies.

In order to set the transaction like above, we need to set the RetailSameConnection as True in OLE DB connection. That was the culprit. Since this forces SSIS to maintain one connection through out the package execution, it seems that using one connection for all lookups is not efficient.

Solving the issue
One easiest way to solve the issue is, adding Timeout=0 to the connection string. But this slows down the package execution vastly, though it succeeds. The next way is, handling the transactions in different way. We thought to use the second way because the time takes for the first one is not acceptable.

The above mentioned errors may occur for some other reasons too but this could be the one if you have already face :).

Plan Caching in SQL Server 2008

I got many questions on the post I published “Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?”, seems many are interesting on it. While looking for some info related to it, I found an interesting technical article that speaks about Plan Caching written by Greg Low. Read it, it explains everything well;

http://msdn.microsoft.com/en-us/library/ee343986.aspx

Thursday, October 21, 2010

Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?

While Preethi was presenting his presentation on Query Plans at the SQL Server Sri Lanka User Group meeting, he showed a situation where query plan gets recompiled when a value set for WHERE clause is vary. Although it recompiles, I doubt that it will not be the same if the query is a parameterized query with a stored procedure. There were couple of arguments on this favoring both sides; thought to test it and see.

Let me take simple two queries. The queries below are same except the value passed for the WHERE clause.

     SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= 1

    SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= 1000

SQL Server selects one plan for the first one but it does not use the same for the next though it is same.

This shows clearly that the statement has been recompiled since the value was different. But this is not going to happen if the logic is implemented with a stored procedure;

    CREATE PROC GetOrderDetails @Number int

    AS

        SELECT * FROM Sales.SalesOrderDetail 

        WHERE SalesOrderDetailID <= @Number

    GO

The plan for the procedure will be generated at the first execution. Once generated, the generated plan will be used for next executions even though the value of parameters are vary greatly. See the code below;

    EXEC GetOrderDetails 1

    EXEC GetOrderDetails 1000

You can see that the same plan which was generated for the first execution has been used for second execution even though it is not the best plan for the value 1000. All we have to remember is, stored procedures are not getting re-compiled when values of parameters are vary. If you expect a situation like this, use WITH RECOMPILE with the stored procedure. This instructs SQL Server to not to stored the plan generated, hence it creates a plan at every execution. If it is rare, use WITH RECOMPILE when you call the stored procedure. This creates the plan again when it is executed.

    ALTER PROC GetOrderDetails @Number int

    WITH RECOMPILE

    AS

    ...

    -- OR

    EXEC GetOrderDetails 1000

    WITH RECOMPILE

The next is whether the stored procedure gets recompiled when an index is added to a table used in it. See what BOL says;

It clearly says that the stored procedure will not get recompiled when an index is added to a table used with it. But what I found is the opposite of it. Sometime one of the statements gets compiled, sometime the entire stored procedure gets compiled. This is how I tested;

This code creates a new table and then creates a stored procedure. The execution of the stored procedure creates a plan and it is used for subsequent execution.

    SELECT *

    INTO TempSalesOrderDetail

    FROM Sales.SalesOrderDetail

    CREATE PROC GetTempOrderDetails

    AS

        SELECT SalesOrderDetailID FROM TempSalesOrderDetail 

        ORDER BY SalesOrderDetailID

    GO

    EXEC GetTempOrderDetails

Here is the generated plan for it.

As per the BOL, stored procedure does not get recompiled when an index is added…

    CREATE INDEX IX_1 ON TempSalesOrderDetail (SalesOrderDetailID)

    GO

    EXEC GetTempOrderDetails

Here is the plan now.

Now the plan is different, means the stored procedure has been recompiled. There might be a situation where the behavior as same as BOL says, will see whether we can find a such situation.