Thursday, October 28, 2010

Dimension has no attributes…Can I make it as Degenerated Dimension?

One of my friends recently came up with an interesting topic where he has faced an issue with his Data Warehouse design related to Insurance industry. He has come across a situation where he needs to build a dimension for Insurance Claim numbers that holds ONLY claim numbers. He has already build other related dimensions such as Customer and Product, and has built the Fact table too.

This particular Insurance Claim number is one of the attributes set at grain level of the fact. In addition to that, no common claim numbers, all are unique, and number of claim numbers are grown at the same rate that fact table grows. He wanted my opinion on this, thought degenerate the claim number without maintaining a separate dimension. The below image explains what I suggested;
diagram1

As you see, first diagram has a dimension for maintaining claim numbers. The second has no dimension for claim numbers and claim number has been added to the Fact table. The best option I see is, making it as a degenerated dimension because;

  • Number of records in the ClaimNumber dimension grows at the same rate that Fact table grows.
  • Claim number has no other descriptive attributes.
  • Claim number does not exist outside the transaction (the claim).

I am facing a similar issue on Banking related DW, the Fact focuses on Deposits and the transaction related deposits contain Comments. Comments are not available for all transactions and they are not common among transactions too. I do not like to see NULLs in my Fact table but thought to degenerate into Fact table. I wish I could argue on this with someone……… :)

If you need to read more on Degenerated Dimension, read Kimballl Design Tip #46 at http://www.rkimball.com/html/designtipsPDF/DesignTips2003/KimballDT46AnotherLook.pdf.

Tuesday, October 26, 2010

Monday, October 25, 2010

SSIS Error: Named Pipes Provider: The specified network name is no longer available.

A perfectly working package suddenly throws an error;

[Lookup for - Pending Duration [7825]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 10.0"  Hresult: 0x80004005  Description: "Communication link failure".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 10.0"  Hresult: 0x80004005  Description: "Named Pipes Provider: The specified network name is no longer available.
".

This package is a kind of complex one, it contains three Data Flow tasks and couple of other tasks. It basically loads a Fact Table in one of Data Warehouses, goes through set of LookUps which have set for Dimension tables.

The error raises with one of the LookUps. The lookUp is supposed to go through around 40,000 records but stops at around 6,000, throwing the error. Once this started giving the issue, the only difference we can thought about current execution and early executions is, the amount of data, it has been increased.

Solving the issue by changing the protocol TCPIP
After searching for solutions, going through many posts, based on some recommendations, we decided to force SSIS to use TCPIP instead of Named Pipes. We disabled the Named Pipe protocol in SQL Server through SQL Server Configuration Manager. We did the same with where we run the package. Finally we ran the package and ended up with same point;

[Lookup for - Pending Duration [7825]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005.
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 10.0"  Hresult: 0x80004005  Description: "Login timeout expired".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 10.0"  Hresult: 0x80004005  Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
An OLE DB record is available.  Source: "Microsoft SQL Server Native Client 10.0"  Hresult: 0x80004005  Description: "TCP Provider: Only one usage of each socket address (protocol/network address/port) is normally permitted.
".

We realized that the issue is not exactly related to protocols used by the package hence, went through some of the implementations we have done where we have slightly derailed the track of standardization. One implementation was related to transactions, where we had used explicit BEGIN TRAN and COMMIT TRAN to control the transactions without using the built-in facility. This type of implementation has been explained here if you need to know how to do it. The reasons for using this is arguable but some of them were, MS DTC issues, Firewalls, and company policies.

In order to set the transaction like above, we need to set the RetailSameConnection as True in OLE DB connection. That was the culprit. Since this forces SSIS to maintain one connection through out the package execution, it seems that using one connection for all lookups is not efficient.

Solving the issue
One easiest way to solve the issue is, adding Timeout=0 to the connection string. But this slows down the package execution vastly, though it succeeds. The next way is, handling the transactions in different way. We thought to use the second way because the time takes for the first one is not acceptable.

The above mentioned errors may occur for some other reasons too but this could be the one if you have already face :).

Plan Caching in SQL Server 2008

I got many questions on the post I published “Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?”, seems many are interesting on it. While looking for some info related to it, I found an interesting technical article that speaks about Plan Caching written by Greg Low. Read it, it explains everything well;

http://msdn.microsoft.com/en-us/library/ee343986.aspx

Thursday, October 21, 2010

Do we need to recompile Stored Procedures when an index is added or parameter value is vary greatly?

While Preethi was presenting his presentation on Query Plans at the SQL Server Sri Lanka User Group meeting, he showed a situation where query plan gets recompiled when a value set for WHERE clause is vary. Although it recompiles, I doubt that it will not be the same if the query is a parameterized query with a stored procedure. There were couple of arguments on this favoring both sides; thought to test it and see.

Let me take simple two queries. The queries below are same except the value passed for the WHERE clause.

SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= 1
SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= 1000
    

SQL Server selects one plan for the first one but it does not use the same for the next though it is same.
plan1

This shows clearly that the statement has been recompiled since the value was different. But this is not going to happen if the logic is implemented with a stored procedure;

CREATE PROC GetOrderDetails @Number int
AS
    SELECT * FROM Sales.SalesOrderDetail 
    WHERE SalesOrderDetailID <= @Number
GO

The plan for the procedure will be generated at the first execution. Once generated, the generated plan will be used for next executions even though the value of parameters are vary greatly. See the code below;

EXEC GetOrderDetails 1
EXEC GetOrderDetails 1000

plan2 
You can see that the same plan which was generated for the first execution has been used for second execution even though it is not the best plan for the value 1000. All we have to remember is, stored procedures are not getting re-compiled when values of parameters are vary. If you expect a situation like this, use WITH RECOMPILE with the stored procedure. This instructs SQL Server to not to stored the plan generated, hence it creates a plan at every execution. If it is rare, use WITH RECOMPILE when you call the stored procedure. This creates the plan again when it is executed.

ALTER PROC GetOrderDetails @Number int
WITH RECOMPILE
AS
...
 
-- OR
 
EXEC GetOrderDetails 1000
WITH RECOMPILE

The next is whether the stored procedure gets recompiled when an index is added to a table used in it. See what BOL says;
BOL

It clearly says that the stored procedure will not get recompiled when an index is added to a table used with it. But what I found is the opposite of it. Sometime one of the statements gets compiled, sometime the entire stored procedure gets compiled. This is how I tested;

This code creates a new table and then creates a stored procedure. The execution of the stored procedure creates a plan and it is used for subsequent execution.

SELECT *
INTO TempSalesOrderDetail
FROM Sales.SalesOrderDetail
 
CREATE PROC GetTempOrderDetails
AS
    SELECT SalesOrderDetailID FROM TempSalesOrderDetail 
    ORDER BY SalesOrderDetailID
GO
 
EXEC GetTempOrderDetails 

Here is the generated plan for it.
plan3 
As per the BOL, stored procedure does not get recompiled when an index is added…

CREATE INDEX IX_1 ON TempSalesOrderDetail (SalesOrderDetailID)
GO
 
EXEC GetTempOrderDetails 

Here is the plan now.
plan4

Now the plan is different, means the stored procedure has been recompiled. There might be a situation where the behavior as same as BOL says, will see whether we can find a such situation.

TechInsights – Malaysia – Session Ratings

I just received ratings for sessions I delivered in Malaysia, glad to see that the sessions have been useful and valuable for participants. Ratings received are as follows;

Session: Strengthening DW/BI system loading
Overall Rating: 7.0

Session: Simplifying Master Data Management with SQL Server 2008 R2
Overall Rating: 6.8

Thanks for Ervin and Lilian for inviting me for TechInsights, thanks for the support given by all event organizer, and thanks for all feedbacks from participants.

Wednesday, October 20, 2010

Understanding Ownership Chaining and Impersonation

Being a DBA or DBE, understanding how Ownership Chains work and how impersonate another principal for changing the execution context are really important. This topic was discussed while I was conducting the course 6232A on last Saturday, thought to blog on it as it would be useful to you all.

What is Ownership Chain?
When we perform an action with set of statements, objects used are accessed sequentially and this sequence is called as a Chain. Once the chain is established,  permissions on objects in the chain are evaluated differently, not the way it checks permissions when they are separately accessed. If the objects accessed in the chain are owned by a same owner, Ownership Chain is established. Within the ownership chain, SQL Server checks the permission for the first object and skips checking on others. This is the advantage of Ownership Chain because it offers slight performance benefit skipping permission checks on other objects. If SQL Server finds a broken ownership chain, it starts evaluating permission on them.

What is Impersonation?
SQL Server Impersonation means, allowing to change the execution context with another principal. Simply, perform some actions (execute set of queries) using someone else privileges. This can be done explicitly by using the stand-alone EXECUTE AS statement or implicitly by using the EXECUTE AS clause on modules.

Let’s first check with Ownership Chaining and then go for Impersonation. I did this in a TestDatabase, you can do the same. The first code snippet creates two logins and two users. Note that SalesPerson is added to db_owner role.

-- Create two logins
CREATE LOGIN SalesPerson WITH PASSWORD = '123'
, CHECK_POLICY = OFF
GO
CREATE LOGIN MarketingPerson WITH PASSWORD = '123'
, CHECK_POLICY = OFF
GO
 
-- Create users for logins.
USE TestDatabase
GO
CREATE USER SalesPerson FROM LOGIN SalesPerson
GO
sp_addrolemember 'db_owner', 'SalesPerson'
GO
 
CREATE USER MarketingPerson FROM LOGIN MarketingPerson
GO

With next code you can see that Impersonation has been done explicitly using EXECUTE AS. What I do here is, impersonate SalesPerson and then create a schema and a table. Note that SalesPerson becomes the owner of these objects.

EXECUTE AS USER = 'SalesPerson'
GO
CREATE SCHEMA Sales
GO
 
CREATE TABLE Sales.Sales
(
    Date date
    , Amount money
)
GO
 
REVERT
GO

Next code impersonates MarketingPerson and sees whether he can access the table Sales.Sales. Since he has no permission on this table, error is thrown for SELECT query.

The SELECT permission was denied on the object 'Sales', database 'TestDatabase', schema 'Sales'.

EXECUTE AS USER = 'MarketingPerson'
GO
SELECT * FROM Sales.Sales
 
REVERT
GO

Let’s check Ownership Chaining. With next code, SalesPerson creates a stored procedure that returns data from Sales.Sales, and he grants EXECUTE permission to MarketingPerson.

EXECUTE AS USER = 'SalesPerson'
GO
 
CREATE PROC Sales.GetSales
AS
    SELECT * FROM Sales.Sales
GO
 
GRANT EXECUTE ON Sales.GetSales TO MarketingPerson
 
REVERT
GO

We know that MarketingPerson has no read permission on Sales.Sales table but with next code, you can see that he can read the table through Sales.GetSales procedure.

EXECUTE AS USER = 'MarketingPerson'
GO
-- MarketingPerson can access the table
-- through the SP
EXEC Sales.GetSales
 
-- This throws the same error
SELECT * FROM Sales.Sales
 
REVERT
GO

The reason for this is, the Ownership Chain. As MarketingPerson has EXECUTE permission, he can execute the SP. Although he has no permission on Sales.Sales, SQL Server does not evaluate permission on it for MarketingPerson because it is owned by the same owner who owns Sales.GetSales. Note that we have not explicitly set owners for both Sales.GetSales and Sales.Sales, hence owners of them are considered as Sales schema’s owner; SalesPerson. Below code shows it;

SELECT USER_NAME( OBJECTPROPERTY(OBJECT_ID('Sales.Sales'), 'OwnerId'))

If the SP contains an object owned by someone else, it breaks Ownership Chain. Let’s test this too. Below code adds a new table, note that no impersonation is established, so, it is created by dbo. Since it is under Sales schema too, the owner of the schema (SalesPerson) becomes the owner of the new table.

CREATE TABLE Sales.SalesCommisions
(
    SaleNumber int
    , Amount money
)
GO

Next, the code of the SP is altered by SalesPerson. Now it contains both tables but still MarketingPerson can execute the SP without any issues even though he has no permission on new table.

EXECUTE AS USER = 'SalesPerson'
GO
 
ALTER PROC Sales.GetSales    
AS
BEGIN
    SELECT * FROM Sales.Sales
    SELECT * FROM Sales.SalesCommisions
END
GO
 
REVERT
GO
 
EXECUTE AS USER = 'MarketingPerson'
GO
-- Can see both tables
EXEC Sales.GetSales
 
-- Both statement throw errors
SELECT * FROM Sales.Sales
SELECT * FROM Sales.SalesCommisions
 
REVERT
GO

Still the Ownership Chain works. Let’s break it. Let’s change the owner of the new table Sales.SalesCommisions as dbo.

ALTER AUTHORIZATION ON OBJECT::Sales.SalesCommisions TO dbo

Now if MarketingPerson executes the SP, he gets an error for Sales.SalesCommisions table:
The SELECT permission was denied on the object 'SalesCommisions', database 'TestDatabase', schema 'Sales'.

The reason is, the second object, Sales.SaleCommisions is not owned by the same person who owns the SP, hence SQL Server checks and sees whether MarketingPerson has permission on Sales.SaleCommissions.

How can we let MarketingPerson to execute the SP without giving explicit permission on Sales.SaleCommissions? Impersonation is the answer. We can use EXECUTE AS explicitly with the stored procedure, instructing SQL Server to execute the stored procedure with someone else privileges without using CALLER’s privileges. See the code below;

EXECUTE AS USER = 'SalesPerson'
GO
 
ALTER PROC Sales.GetSales    
WITH EXECUTE AS 'SalesPerson'
AS
BEGIN
    SELECT * FROM Sales.Sales
    SELECT * FROM Sales.SalesCommisions
END
GO
 
REVERT
GO

Now if MarketingPerson executes the SP, he will see the result without seeing an error because the execution context for the SP’s execution is set with SalesPerson not with MarketingPerson.