Monday, May 29, 2017

Understanding Relational Algebra

When the relational model was introduced, in order to work with the model or in order to retrieve or update data in the model, languages were introduced. They are called as relational languages. Initially, two languages: Relational Algebra and Relational Calculus were introduced by Codd during 1971 as basis of relational languages.

When you start studying databases and its related languages, these are the two main languages you learn first, and of course, they are not much user-friendly. If you have worked with modern Database Management Systems such as Microsoft SQL Server or Oracle, then you know that how TSQL or PLSQL is powerful and richer when you compare to Relational Algebra and Relational Calculus. However, if you are to study database and related languages, it is always better to start with basis as it explains the fundamentals.

The Relational Algebra

It is a high-level procedural language. It describes how an operation is performed between two relations (tables: that is the word familiar to us that results another relation. The operation is described with expressions and it can be nested, means, an output of one relation can be used to performed another operation.


This language is considered as theoretical language as well as relation-at-a-time language. It manipulates all tuples (records) using one statement without looping them. There are many different types of operations in Relational Algebra but Codd originally proposed eight operations and they are categoried into two: Unary and Binary.

Unary Operations

There are two operations defined under this: Selection and Projection. They are called as unary operations since they operate on only one relation.

Binary Operations

There are six operations under this category: Cartesian Product, Union, Set Difference, Join, Intersection and Division. Unlike unary operations, they work on pairs of relations.

Here are some sample images I used in my classes. Note that SQLs can be written in different ways for getting the output. The one I have written is not the only way for getting the result.

Symbols

There are set of symbols that are used with operations, here are the standard symbols used with Relational Algebra.


Selection

This operation returns tuples from a single relation based on the specified predicate (condition). Multiple predicates can be added using AND, OR and NOT.


Projections

This operation returns a relation that contains a vertical subset of used relation. In other words, it returns set of tuples only with specified attributes.


Cartesian Product

This operation creates a relation combining two relations, concatenating every tuple in one relation with every tuple in other relation. In simple term, if the first table has 100 records with 5 attributes and other table has 50 records with 10 attributes, this creates an output with 5000 records (100 * 50) and 15 attributes (5 + 10).


Union

This operation makes a relation containing all tuples from first relation or second relation or both first and second relations. This eliminates duplicates. Remember, both relations used must be union-compatible.


Set Difference

This operation creates a relation containing tuples in first relation that are not in second relation. Just like the previous one, both relations must be union-compatible.


Intersection

This operation creates a relation containing tuples in both first and second relations. Both relations must be union-compatible.


Division

This operation creates a relation containing selected attributes in first relation, matching with every tuple in second relation. See the image; It tries to answer Which customers are registered from ALL the countries ranked as 2.


I will make a separate post on Join operation as it has multiple joins.

Wednesday, May 24, 2017

SQL Server bcp Error - Unexpected EOF file encountered in BCP file

Here is another issue I experienced today with bcp utility (see my previous post related to this: SQL Server bcp Error - Invalid ordinal for field in xml format). The file I tried imported was UNIX-based file and bcp did not identify the proper row-terminator;


I was sure about the row-terminator but this did not work even with BULK INSERT. As usual, did a search, then found a suggestion to use hexadecimal value for row-terminator instead of \n.

And it worked.


You may experience the same, try this and see whether it solves the issue.

Tuesday, May 23, 2017

SQL Server bcp Error - Invalid ordinal for field in xml format

We still use bcp utility for importing or exporting data, specifically when we have to work with large amount of data (I have written post on this, comparing different methods for loading data, read it for more info: How to load/insert large amount of data efficiently). In order to load data from a file, need to instruct the utility with the structure of the file and mapping, and instruction can be passed either specifying details with switches along with bcp command or adding a format file to the command.

If you need to use a format file, it needs to be generated first. We can use same bcp command for generating it.

bcp tempdb.dbo.Customer format nul -T -S DINESH-PC\SQL2014 -c -t, -x -f "E:\Temp\Customer_c.xml"

for more information on bcp command, read this.

Sometime it does not work as we expect. You may experience the following error when you try to generate the file;

SQLState = HY000, NativeError = 0
Error = [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid ordinal for field 3 in xml format file.

If experience it, the main reason could be Column names with spaces. Check and see whether the table used for generating the xml file has column names with spaces. If so, you will surely get this error.

Here is an example. As you see, if you try to execute the above bcp command against this table, you will get this error.

USE tempdb;
GO

IF OBJECT_ID('dbo.Customer') IS NOT NULL
 DROP TABLE dbo.Customer;
GO
CREATE TABLE dbo.Customer
(
 CustomerId int PRIMARY KEY
 , [Customer Code] char(5) NOT NULL
 , [Customer Name] varchar(50) NOT NULL
 , [Credit Limit] money NOT NULL
);
GO


If you remove spaces from columns, there will be no errors and file will be generated.


Monday, May 22, 2017

SQL Server does not always write data to files, proportional to the amount of free space in each file

We add multiple data files to file groups to get data distributed (it is always better to get them distributed among multiple physical volumes) for improving the performance. But does it guarantee that it always distributes data among files?

This is something every DBA should know. There are certain situations that SQL Server does not write data to all files allocated to the file group though there are multiple files in the same file group. Generally, SQL Server uses round-robin proportional fill algorithm to allocate extents. This means, when there are two data files allocated to one file group and there is a table created on that file group, allocations are made in each data file proportional to the amount of free space in each file. Example, if the first file has 60MB free space and second file has 40MB free space, when 10MB is written to the table, 6MB is written to first file and 4MB is written to the second file. This makes sure that data is distributed properly for improving the performance.

This behavior gets changed when data files are expanded automatically. Files are expanded when the space of them are exhausted and AutoGrowth is enabled. This expansion happens one at a time, again in round-robin way. Continuing with above example, if space of files are exhausted, SQL Server expands the first file based on the AutoGrowth setting, without expanding the second file. When the first file is exhausted, it expands the second file without expanding the first file. This makes data distribution uneven. When only first file is expanded, data is written only to the first file, means data is not getting stripped across all files. This is where you see the issue.

See this code. It creates a database with three data files; one file under Primary File Group and two files under FG1 File Group. Then it creates a table on FG1 and inserts set of records.

USE master;
GO

DROP DATABASE TestDatabase;
GO

-- Two additional data files are added under FG1
CREATE DATABASE [TestDatabase]
 ON  PRIMARY 
( NAME = N'TestDatabase'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase.mdf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB), 
FILEGROUP [FG1]  DEFAULT
( NAME = N'TestDatabase_Data1'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data1.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB),
( NAME = N'TestDatabase_Data2'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data2.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB)
 LOG ON 
( NAME = N'TestDatabase_log'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_log.ldf' 
 , SIZE = 10MB , FILEGROWTH = 5MB )
GO


USE TestDatabase;
GO

-- Create a table
-- This will be created on FG1 as it is the default
CREATE TABLE dbo.TestTable
(
 Id int identity(1,1) Primary Key
 , Name char(8000) not null
);
GO

-- Inserting 900 records
INSERT INTO dbo.TestTable
 (Name) VALUES (Replicate('a', 8000));
GO 900

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;

See the result when we check the file sizes. Focus on TestDatabase_Data1 and TestDatabase_Data2 files. They are exhausted.


Since we have enabled AutoGrowth, files will be expanded if we enter more data. Let's enter some more data and see whether both files are getting expanded.

-- Inserting 100 more records
INSERT INTO dbo.TestTable
 (Name) VALUES (REPLICATE('a', 8000));
GO 100

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;


See, only first file has been expanded.


How to avoid this behavior?

This can be overcome by either expanding manually or enabling Trace Flag 1117. For expanding file manually, it is better to disable AutoGrowth for stopping automatic expansion.

Saturday, May 20, 2017

Management Studio has new DAX Query Window

We have been writing DAX queries using MDX Query Editor whenever we need to do something with Analysis Services Tabular Model databases (Read my old post on this: How to write SELECT for Analysis Services Tabular using DAX). Now, with this month Management Studio release (release number 17.0), we have a new Query Window for DAX.


Once the new release is installed, you should see the icon and menu item for DAX Query Window;


Writing DAX Queries

Here is an example of DAX queries that can be written. As you see, first statement uses SUMMERIZE function to produce a result that shows sales by day of the week and product. Second query creates a measure using DEFINE MEASURE and uses it again with another query.

EVALUATE 
SUMMARIZE(Sales_201501, Sales_201501[CalendarDayOfWeekLabel]
 , Sales_201501[ProductName]
 , "Sales", SUM(Sales_201501[SalesAmount]))

GO

DEFINE MEASURE Sales_201501[NewSales] = SUMX(FILTER(Sales_201501, Sales_201501[IsWorkDay]="WeekEnd")
          , Sales_201501[SalesAmount])
EVALUATE 
SUMMARIZE(Sales_201501, Sales_201501[CalendarDayOfWeekLabel]
 , Sales_201501[ProductName]
 , "Sales", Sales_201501[NewSales])

Here is the result of it;


Friday, May 19, 2017

Management Studio Quick Launch has PresentOn for turning Presentation Mode on

A new option has been added to Management Studio - Quick Launch for enabling Presentation Mode on and off. This is really useful during classes, workshops and presentations because it allows us to quickly Zoom in, enhancing the visibility of SSMS.



The Presentation Mode is based on qlt_presentermode_default.xml file. It has pre-defined font family and size for both environment and editor. When the Presentation Mode is turned on, it uses the setting in this file and changes the way it displays.

In order to turn on or edit the file, all we need to do is, go to Quick Launch and start typing Present. When you start typing, it shows items related to it.


If I select PresentEdit, it will open the xml file and allows us to edit it. If we select PresentOn, then it will set the environment as per the configuration in the xml file.


To revert back to default setting, type Restore... to get RestoreDefaultFonts item and select it.

Thursday, May 18, 2017

What is Quick Launch in Management Studio?

Have you noticed an input box called Quick Launch at top-right corner of the Management Studio? Have you used it? Do you know that why it has been given?


This input box allows us to search IDE specific options, templates and menus and open them without going through standard menu items. For example, if I want to change the font and size of the font, I have to go through Tools -> Options -> Environment -> Font and Colors. But Quick Launch allows us to open it quickly without going through menu items.

If you click on it (Shortcut is Ctrl + Q), and type @, then it shows possible items that can be searched;


If you type @opt, then it allows use to search a specific menu under options;


Once searched, it displays the result. The required item can be opened by clicking it;


As you see, this can be used for opening many items related to IDE without going through multiple steps.


Wednesday, May 17, 2017

Power BI - Convert Direct-Query to Import

Power BI supports three types of connectivity: Import, DirectQuery and Live Connection. We generally use DirectQuery if the dataset is large and use Live Connection for models that have been created with SSAS. Once the Power BI file is created with DirectQuery, can we change it to Import?

This is something I was asked yesterday and my answer was Yes. We can the mode used, and get the data imported than referencing data via metadata.

How to convert DirectQuery to Import?

Let's create a file with AdventureWorksDW2014 with DirectQuery mode;


Once it is created, we can see the only Relationship View is visible as no data is imported.


Let's just create a visual and save the file.


Let me close the file and open it again. Once opened it again, if the file has been created using DirectQuery mode, I will be seeing a button right-bottom called DirectQuery: Enabled (click to change).


I can click on it and change the mode. This is what I get when it is clicked.


Let me select Import and continue. Now I see data imported to my Power BI file.



Monday, May 15, 2017

Reporting Services - An error occurred within the report server database....Database 'ReportServerTempDB' cannot be opened

Yesterday, I posted on an error related to Reporting Services. Here is another error related to Reporting Services encountered at one of my client sites. When the Reporting Services portal is visited, it says;


When the Web Service is visited, it clears says that something is wrong with the Reporting Services Temp DB.

An error occurred within the report server database. This may be due to a connection failure, timeout or low disk condition within the database. (rsReportServerDatabaseError) Get Online Help
Database 'ReportServerTempDB' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

This says that we need to check the database immediately. When it was viewed via Management Studio, we noticed that ReportSererTempDB is missing. It can happen for various reasons. However, if you are experiencing the same, you may be able to fix it without doing much. We fixed it without much effort, steps used for fixing are listed below;

For simulating the same, I deleted the mdf file related to ReportServerTempDB and this is how the database is shown with my server now. It is similar to the issue faced by my client.


What should we do? Here are the steps;

1. Backup the Report Server database if you have no backups (In my case, it is ReportServer)

2. Open Reporting Services Configuration Manager and go to Database. Click on Change Database.


3. Wizard starts. Select Create a new report server database and click Next to continue.


4. Enter a new name for the database. Note that the original database (ReportServer) is not deleted and we are not replacing it. This creates two new databases; ReportServer1 and ReportServer1Temp. Complete the wizard.


5. Once completed, you should see two new database. Note the old database too.


6. Since it is a new database now, Portal does not show the reports we had. We need to bring the old database back.


7. Go to Management Studio and rename the current Report Server temp database to old name. If the old Report Server Temp database is still exist, it has to be deleted first. Once renamed, I will be having my databases as below.


8. Now we need to go back to Reporting Services Configuration Manager and start Change Database again. Select Choose an existing report server database.


8. Select the old report server database and complete the wizard.


9. Once done, you should be able to see your old content. You may delete the report server database created for fixing this (in my case, it is ReportServer1 database)


Sunday, May 14, 2017

Reporting Services - HTTP Error 404. The requested resource is not found.

What if you suddenly experience this when you open the Reporting Services Portal?

HTTP Error 404. The requested resource is not found.


There can be many reasons for this error and you may find the exact reason by reading Windows Application Log, Report Server Service Trace Log (default location is C:\Program Files\Microsoft SQL Server\MSRS13.MSSQLSERVER\Reporting Services\LogFiles) or SQL Server Error Log. When I go through the trace, I found that it was related to HttpListner and permission.

What shall we do? It is always better to find the exact root-cause and fix it. But, if you need to do a quick fix, there is a simple way to try and see.

It is all about change the Virtual Directories of Reporting Services. You may not fix the issue with this method but can try and see;

1. Open the Reporting Services Configuration Manager.

2. Go to Web Service URL and change the name of the Virtual Directory. In my case, it was ReportServer and I change it as ReportServer1.


3. Do the same for Web Portal URL too.


4. Once this is done, you should be able to open the portal. As you see, nothing has happened to content of it, it shows all items it had.


If you need to set the same Virtual Directories you had, you can do the same again for renaming them.

** Note that, you need to keep regular back of your report database and all content of report server. If you cannot get your errors fix, only option is reinstalling and setting it up again.

Saturday, May 13, 2017

Power BI - Cumulative (Running) Total - Measure or Calculated Column?

Showing Cumulative or Running Totals is not an uncommon requirement in reporting, hence it is required with almost all reporting solutions. I wrote two posts on this: Analysis Services (SSAS): Calculating running totals for a given period using MDX and SQL Server: Calculating running totals using T-SQL that show multiple ways of calculating cumulative totals discussing the ways of getting cumulative totals using MDX and TSQL. Now it is time to write the same using DAX. DAX is the language used for writing expressions with both Analysis Services Tabular Model and Power BI and it is a rich and easy-to-learn language.

Should I create a Measure or Calculated Column for Cumulative Totals?

Cumulative Totals can be implemented either using Calculated Columns or Measures. It is up to you to decide because the result may vary based on the way they are visualized. Once decided, you need to understand the context related to the functions used for the calculation because the behavior of the functions are different.

Generally, if I have to show the Cumulative Totals with all detail lines as well as Totals, then it is better to implement it as a Measure because the calculation does not happen as we expect at Totals level. Example, if you need to show rows in a Table Visual, and you want to see Cumulative Totals for each line as well as in Totals of the Table Visual, then Measure is the best choice. If not, Calculated Column will do the needful.

The calculation for the Cumulative Totals using Calculated Column and Measure is different. Because of that, you cannot use the same expression for the calculation.

Again the calculation is different if the source table has multiple columns. If there are multiple columns, they need to be included in filtering for calculating the correct value.

Creating Cumulative Totals

Let's take two tables and see how we can generate the Cumulative Totals. Let's use two tables shown below;


As you see, first table ItemsCompletedByDayAndTeam contains values for days and teams. The second table ItemsCompletedByDay contains values only for days.

Let's see how we can add Cumulative Totals using Calculated Columns. Usually we use the following expression for calculating it but It does not produce the totals we need, it shows the total of ItemCompleted in all rows.

CALCULATE
(
   SUM(Table[ValueColumn])
   , FILTER
     (
        ALL(Table[KeyColumn])
 , Table[KeyColumn]  <= MAX(Table[KeyColumn])  
     )
)


The reason for this is, MAX Function used. Since the context becomes the entire table for functions when they are used with Calculated Columns, it returns the max-value considering all values. What we need is the current value for the expression written, hence EARLIER Function should be used instead.

Here are the two expressions against two tables;

-- Expression against the ItemsCompletedByDayAndTeam table
CL_CT_1 = CALCULATE
          (
             SUM(ItemsCompletedByDayAndTeam[ItemsCompleted])
             , FILTER
               (
                  ALLEXCEPT(ItemsCompletedByDayAndTeam, ItemsCompletedByDayAndTeam[Team])
                  , ItemsCompletedByDayAndTeam[Day]  <= EARLIER(ItemsCompletedByDayAndTeam[Day]) 
                      && ItemsCompletedByDayAndTeam[Team] <= EARLIER(ItemsCompletedByDayAndTeam[Team])  
               )
          )

-- Expression against the ItemsCompletedByDay table
CL_CT_2 = CALCULATE
          (
              SUM(ItemsCompletedByDay[ItemsCompleted])
              , FILTER
                (
                   ALL(ItemsCompletedByDay[Day])
                   , ItemsCompletedByDay[Day]  <= EARLIER(ItemsCompletedByDay[Day])  
                )
          )


If we use these two with two Table Visuals, this is how it looks like;


Notice the values in columns and totals of them. Expressions we have written work well with each row but not with totals. If you do not need totals for Cumulative Totals, you can use Calculated Columns for calculating Cumulative Totals.

Although the calculation is possible with Calculated Columns, it is always better to use Measures instead. If a measure is used, then we do not need to worry on totals as it shows them accurately.

Here are the expressions for measures against both tables;

-- Expression against the ItemsCompletedByDayAndTeam table
M_CT_1 = CALCULATE
         (
            SUM(ItemsCompletedByDayAndTeam[ItemsCompleted])
            , FILTER
              (
                 ALL(ItemsCompletedByDayAndTeam)
                 , ItemsCompletedByDayAndTeam[Day] <= MAX(ItemsCompletedByDayAndTeam[Day]) 
                    && ItemsCompletedByDayAndTeam[Team] <= MAX(ItemsCompletedByDayAndTeam[Team])
              )
          ))

-- Expression against the ItemsCompletedByDay table
M_CT_2 = CALCULATE
         (
            SUM(ItemsCompletedByDayAndTeam[ItemsCompleted])
            , FILTER
              (
                 ALL(ItemsCompletedByDayAndTeam)
                 , ItemsCompletedByDayAndTeam[Day] <= MAX(ItemsCompletedByDayAndTeam[Day]) 
              )
          )


When they are added to visuals;


Now it shows row values as well as totals accurately.


Thursday, May 11, 2017

Think before you restart the SQL Server Service

When we face an issue with SQL Server or any other software, we simply think that either restart of the server or service would solve the issue, and we do it. The situation can be something like; No Response, Takes long time to produce the result, or does not allow to make new connections. Whatever the situation, you need to think twice before restarting the service.

Why we should not restart it immediately? One thing you need to remember is, SQL Server logs all possible information related its processes in the server logs. By default SQL Server maintains 6 archives along with the current file and files are getting replaced with every restart. For example, when you restart, it creates a new log file, makes the old-current-file as Archive #1, Old Archive #1 as Archive #2, and so on, deleting the last Archive file. If multiple restarts were done, useful information in log files would erased and will not be able to see what caused the issue.


Even though it maintains 6 files, I think it is better to maintain more than six files. A number like 10-15 would do better even though it consumes space unnecessary. Who knows, one day they will be very useful if something happens to the server and need to find reasons for it.

Considering the fact discussed, make sure that SQL Server is not getting restarted unnecessarily. If need, it is better to take a copy of all existing files and then do the restart.

What if that the required log has been over-written?

You may experience this when you troubleshoot an issue. There is another way to see what has happened with SQL Server though the container is not as rich as the log. It is the default trace (Thanks for the Data Platform MVP group who helped me to understand this). Generally, default trace records many events including errors. By default it is enabled but for some reasons if you need to enable or disable, it can be done with sp_configure.


By Querying sys.traces, the path for the trace files can be found and files can be opened either using TSQL or Profiler.


 

Tuesday, May 9, 2017

Power BI - Creating Measures for Live Connection is now possible

Good news! With Power BI May 2017 update, we can create measures for Live Connection made for Analysis Services Tabular Model. Earlier, we had no way of creating measures when connect to Analysis Services databases, hence had to use measures available only with the model. This limited the usage of Power BI with Analysis Services as users could not create measure as they want.

Now it has been addressed up to some extent. We can create measures using DAX against both Live Connection - Analysis Services Tabular Model and Power BI Service Datasets.

Here is an example of it.

I have connected with AW Internet Sales Tabular Model 2014 database using Live Connection. As you see, the column chart is created with Sales Amount against Month Number and it is filtered for Year 2013. In addition to that, I have added a new measure called SalesAmountYTD using the following DAX;

SalesAmountYTD = TOTALYTD(sum('Internet Sales'[Sales Amount]), 'Date'[Date])


Note that the measure is not part of the Model and it is considered as a Report Measure. It can be created either clicking New Measure Button or clicking the ellipsis button in any attribute.