Wednesday, June 21, 2017

Power BI Parameters - What are they and when they can be used?

Microsoft Power BI supports adding parameters for queries and use them with various areas. However, since it has some limitations and cannot use with all areas, it has not been noticed much. Let's talk about Power BI parameters and see the way of using it.

Let's take a scenario and see how parameters can help on that. I have a set of CSV files that shows monthly sales. We need to to create a report using one of the files and then publish. Whenever the report needs to be refreshed for a different file, I need to change the data source by changing many things. What if I can just open the report using Power BI Desktop and change the source using a menu and get the report refreshed rather changing many things?

It can be done with Power BI Parameters. We can create a parameter with predefined values (in this case, source file names along with paths) and use it for changing the source.

Power BI Parameter can be created in three ways;
  • Parameter that accepts a value via an input box
  • Parameter that is formed using a predefined list
  • Parameter that is formed using another query
Parameter with a predefined list
Let's focus on the second one first. Here are the steps for creating a Power BI report with a parameter;

1. Open Power BI and connect with a CSV file. In my case, the file name is Sales_201501.csv.


2. Once the file is loaded, click Edit Queries in Home ribbon for opening Query Editor. You should one query under Queries

Parameters can be created only with Query Editor. However, created parameters can be accessed in both Query Editor and Data Model.

3. Click Manage Parameters in Home ribbon. Once the dialog box is opened, click New to create a new Parameter.

4. Name the parameter as SourceFile

5. Select Text for Type.

6. Select List of values for Suggested values.

7. Enter File Names along with paths.

8. Set Default Value as you need. This is the default value for the designer.

9. Set Current Value as you need. This setting is for the end user, or to use with the report level.


Now you should see the parameter in Queries. If you want to open it as another table in the data model, you can right click on it and select Enable Load. We do not need to enable this option for using the parameter at report level.

Now we need to make sure that source of our file is set with the created parameter. There are two ways of doing it. One is open the Advanced Editor and change M Code. Other way is, using Data source settings. Let's use the second option. If you need to know how the M Code is getting changed, open it and see after the change.

10. Click Data source settings in the Home ribbon. This opens the Data Source Setting dialog box.


11. Select the source and click Change Source.... This opens a dialog box that allows you to change settings related to your source.

12. You should notice that the button given for File Path has an option for creating parameters and selecting a parameter.


13. Select Parameter as the option and select name parameter created.


14. Click OK and close the Data source settings.

15. All set. Click Close & Apply for saving the queries and closing Query Editor.

16. Create a visual as you can distinguish results when different file is selected. This is what I created from my file.


Let's change the source using the parameter now.

17. Click down-arrow in Edit Queries in Home ribbon. You should see three options. Select the last one that is Edit Parameter.


18. You should see Enter Parameters dialog box that shows all parameters. Change the source file to different one.


19. Once clicked OK, you should be prompted for Applying. Apply the settings and you should see the report with new source now.


Like this way, we can use the parameter in other sections like Filtering rows in Query Editor, Replacing, etc. We still cannot use the parameter with DAX and it is one limitations.

Saturday, June 17, 2017

Relational Algebra - Joins - Theta Join, Equijoin, Natural Join, Outer Join, Semijoin

I wrote a post on Relational Algebra that discusses most of operations related to it. This is the continuation of it and this explains Join Operations related to Relational Algebra. You may find that it is different with Joins we do now but it is the foundation for all type of joins we do with our relational databases.

Join Operations

Join is one of the main operations we perform against relations (tables) for retrieving data. It is done over the Cartesian Product of the two operand relations, using a SELECT statement with a Predicate. We are familiar with Joins like INNER JOIN, OUTER JOIN and CROSS JOIN but initially there were introduced with types like Theta Join, Equijoin, Natural Join, Outer Join and Semijoin. Modern DBMSs have enhanced these and have different implementations and that is why we do not see these types with mentioned names. But let's try to understand each of these types and how they are represented with Relational Algebra. Note that I have used different SQL Statement but it can be implemented using many techniques. Remember, almost all Joins are based on Cartesian Products.

Theta Join

This is based on a Predicate added to a Cartesian Product. In simple term, if you have joined two tables using CROSS JOIN, then you can add a filter to the result using one of the comparison operators. See the example given. Note that it can be implemented using SELECTION over a Cartesian Product as well.


Equijoin

This is same as Theta Join but the comparison operator is equal. Generally, if the operator of the Theta Join is equal operator (=), then the join is called as Equijoin instead of Theta Join, Here are two examples;



Natural Join

Natural Join is an Equijoin of two relations over all common attributes. In other words, when joining two tables, join is done using all common columns. Therefore, explicit Predicate is not required. See the sample given. I have used NATURAL JOIN which is not available with some DBMSs. Note that Common Attributes are not duplicated.


Outer Join

This join type includes both matching and no matching values from one relation and matching values from the other relation when two relations are joined. The relation that returns all tuples is determined using the Symbol used for the operation. If the Symbol is opened for the Left Relation, it is considered as the relation that returns all tuples. This is implemented using either LEFT or RIGHT in SQL.


Semijoin

Here is the last one. This join performs a join operations over two relations and projects over the attributes of first operand (or the relation). With this join, tuples can be limited for the join operation by adding a predicate, increasing the performance of the join operation.

 

Wednesday, June 14, 2017

Power BI - Scheduled Refresh section is not available in Settings

Power BI Desktop allows us to import data from files such as CSV and Excel. Once imported, it automatically creates a Model that can be used for creating reports. Good thing is, when we publish the report, Dataset that is created with the model is also getting published. This makes sure that we can create a schedule and refresh the Dataset published for seeing latest.

See below image. It shows the Gateway connection and Scheduled refresh settings for an uploaded text file. The Dataset created for the file is PowerBI Test Report.


Now if I get the settings related to the Dataset called Sales_201501;


I will not see same settings. What is the reason? It is also based on a text file.

This will happen when the uploaded Dataset is not a Model. This file has been uploaded using Power BI Services (app.powerbi.com), not using Power BI Desktop. Unlike Power BI Desktop, Power BI Services does not create a Model when a file is uploaded and uploaded file will be sill treated as a Dataset but not as a Model.

If you need to get your Dataset continuously refreshed by creating a schedule, then make sure that it is created using Power BI Desktop.

You can get the uploaded file refreshed automatically if the file is in either SharePoint or OneDrive.

Saturday, June 10, 2017

Power BI - Cannot highlight or pin visuals when grouped with shapes

Power BI allows us to add shapes like rectangle or oval for marking specific area, showing covered visuals as they are related and operate together. This is really useful when you need to highlight a group of visuals that has same behavior or the visuals are related to same business process.


When you have visuals, you can add the required shape and change properties as you need. The below image shows an added rectangle with specific Title and filled color with transperancy set to 75%.


Once added and set, it can be nicely seen with the report. But you may notice that you have lost some of the functionalities. You cannot highlight a slice of Pie Chart and make other visuals filtered. You cannot pin the visuals for Dashboards when they are published to online service.


However, this can be handled. The reason for this is, the added rectangle appears on top of visuals. If you send the rectangle "back", then this issue gets sorted.


As you see, now you can highlight slices in the Pie Chart and you can pin visuals to a Dashboard.


Wednesday, June 7, 2017

Naming columns in the View

How do you name columns returned from your created view? Generally, you do not need to explicitly name them as the names of view-columns are derived from base-tables. But, if there is any ambiguity in the name for a column or calculated column, then it must be named explicitly.

What are the ways of naming columns?

There are two ways of naming columns in the view. One way is, mentioning names along with CREATE VIEW statement. This requires names for all columns, you cannot just name set of columns in the view. Here is an example of it;


Other way is, adding aliases to columns. Most of us prefer this way because we do not need to name all columns explicitly. In most cases, we need to name columns that has ambiguity in the name or it is a calculated column. Remember if above technique is used for naming columns, aliases added will be overwritten.



Tuesday, June 6, 2017

Creating Horizontal Views and Vertical Views

Everyone has either used or created views for getting or creating a dynamic result using one or multiple tables. And everyone knows the usage of views and purpose of creating them. But many do not how they have been categorized; such as Horizontal and Vertical. Here is a simple explanation on them.

Horizontal Views

If you think about the standard definition of a view, it is a virtual relation (table) that is result of one or more relational operations on one or more relations (tables). In simple terms, is a logical table defined with a SELECT statement based on one or more tables. Now, what is a Horizontal View? It is a view that limits the number of records to be produced. We create Horizontal Views by adding WHERE clause.

USE AdventureWorks2014;
GO

CREATE VIEW dbo.Sales_2012
AS
SELECT * FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2012;
GO

Vertical Views

This view results set of selected columns from one or more tables. We implement this by selecting required columns, optionally adding the WHERE clause.

USE AdventureWorks2014;
GO

CREATE VIEW dbo.Sales_2012_Summary
AS
SELECT SalesOrderNumber, OrderDate, SubTotal 
FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2012;
GO

In addition to that, there is another type called Grouped and joined views that is actually either a Horizontal View or Vertical View with aggregations and multiple joins, hence I do not think that it has to be taken as a major type.


Monday, June 5, 2017

What should be considered on Visualization?

Creating a report with stunning, meaningful and appropriate visuals is the key for delivering the information quickly, hence special consideration has to be made when designing reports. Otherwise, the created report will not be attractive and users will not see what they want and as they want.


There are few things need to be considered when designing reports;
  • Placement - you need to make sure that visuals added to the report are placed or positioned at the right place. How do we decide? Generally, most important visuals (A KPI for an example) should be placed in the top-left corner of the report because it is the place everyone looks at immediately. Not only that, when the report is opened using a mobile device, this makes sure that the important ones are displayed first. In addition to that, make sure the space between visuals is consistence and sized appropriately.
  • Appropriate Visuals - If something can be displayed using a Column Chart does not mean that it is the best visual for showing it. Therefore, right visual should be selected for displaying information. For an example, if just one value has to be shown, something like Revenue for the month, a visual like KPI or Card can be used. If something needs to be shown with values against a set of parameters, something like, showing revenue by month should be displayed with Column Chart. If something needs to be shown against set of parameters using percentage, not values, Pie or Donut will be appropriate.
  • Story Telling - Make sure that you report contains relevant and related information only, it should not be cluttered. If the report is created for Sales Department, make sure it shows information related to it. Not only that, if the report contains set of visuals related another area, example, a process, highlight it or group them. You can add a unfilled rectangle covering visuals related.
  • Formatting - The size of the visual, font used, font size, color and labels, all matters. Make sure that the title is properly displayed for visuals and you can make some visuals large and some small based on the importance of them. Same goes for font size. If it is a KPI and it is most important one, increasing the font size of it will definitely add a value to the report. Another important thing in formatting is, showing numerals properly. No need to show the whole number unless the visual is something like table. Generally, if the number is really large, consider to show it in thousands or millions.
  • Coloring - There are two things to consider with coloring; background color and colors on visuals. Generally light background color will do a lot as visuals can be highlighted with most of the colors. Therefore it is recommended to make the background light, something like white. However, making the background dark has become a trend because it makes the dashboard elegant. But when selecting colors for visuals, you need to be very careful as dark colors in visuals may not blend properly with a dark background. Another consideration is, company theme. Your company may have a color theme for all web applications and you might have to stick to it. If so, select appropriate colors for visuals based on the theme.
    What about colors used with visuals? One thing you need to remember is, the standard color-set used with BI. Generally, we use Green for Excellent, Amber for Neutral and Red for worse. Therefore, you should use a color like Red for indicating a success of something. Another thing is, do not use all these three colors with all types of visuals. For an example, when a KPI is shown, one color is used for either coloring font or background for the visual that is fine. Similarly, these three colors can be used with a tabular visual with all possible cells that is something you need to consider. You may see your table busy when many cells are colored with all three colors, hence it is better to color only one type of cells with one color.
There are many more tiny things to consider but above considered as the most important considerations. List anything if you think that I have missed some.

Sunday, June 4, 2017

Calculate the average value against a large table - SQL Server Brain Basher of the Week #068

Let's talk about another Interview Question related to SQL Server development. This is about one of the aggregate functions that is AVG. Here is the question;

You have a table called SalesOrderHeader and it has millions of records. It has a column called Freight and you need to the average of it. You need to make sure that only 10% of records is used for calculating the average.

What would be the best way? The standard answer is, write a query to get 10% of records using TOP and calculate the average.

SELECT AVG(t.Freight) AverageFreight
FROM (SELECT TOP 10 PERCENT Freight
 FROM Sales.SalesOrderHeader) t

But this might not give you the accurate average as you consider only first set of records. It will be more accurate if you pick records randomly and then calculate the average. You may add ORDER BY to your statement with NEW_ID function.

SELECT AVG(t.Freight) AverageFreight
FROM (SELECT TOP 10 PERCENT Freight
 FROM Sales.SalesOrderHeader
 ORDER BY NEWID()) t

Second method gives the most accurate value as it picks records randomly. However the cost is high with the statement. There is another way to achieve the same. It is using TABLESAMPLE operator.

SELECT AVG(Freight) AverageFreight
FROM Sales.SalesOrderHeader
TABLESAMPLE (10 PERCENT)

Here is the comparison between both methods. Notice the total cost.


As you see, TABLESAMPLE gives better performance than the first method.

Since it picks records randomly, the average it returns different at each execution. If you need to same value for all your executions, REPEATABLE option has to be used with repeat_seed

SELECT AVG(Freight) AverageFreight
FROM Sales.SalesOrderHeader
TABLESAMPLE (10 PERCENT) REPEATABLE (1)

You will get the same average as long as the repeat_seed is same.

Saturday, June 3, 2017

Power BI - No page to enter SQL Server credentials when connecting - Here is the reason

Have you experienced that Power BI just connects with your SQL Server without requesting your login credentials? Or you might need to connect with your SQL Server using someone else credentials but Power BI continues to the next page without prompting the page related to credentials.

This is not a bug or issue, this is because of the cache maintained by Power BI Desktop.

When I try to connect with my SQL Server using Power BI Desktop, I get the initial page where I can enter server name and database name. But when I click on OK, it straightaway opens the page for table selection instead of showing the page related to authentication.


I know that this happens because I have connected to the same server before and my connection is cached. What if I want to connect the same using a different account?

Clearing the data source cache

It is possible to clear the saved information related to previously connected sources. You can open this setting using File menu -> Options and settings -> Data source settings. This page shows all connections you have made previously; Data sources in current file shows all connections you have made with opened file and Global permissions shows all connections you have made with all files


You can clear cached permissions by selecting the connection and clicking Clear Permissions. Once cleared, you should see the page that requests credentials for connecting with the source.


Friday, June 2, 2017

Power BI does not show all columns in SQL Server View

I experienced a strange issue with Power BI today, I might have missed something or there is something unknown to me. This is what I experienced; Power BI does not show all columns when trying to import records from a SQL Server view.

Just for simulating the issue, I created a view with AdventureWorks2014, using three tables, Product, ProductSubCategory, and ProductCategory, and named as vProduct.

USE [AdventureWorks2014]
GO

CREATE OR ALTER VIEW dbo.vProduct
AS
 SELECT p.ProductID, p.Name Product, s.Name SubCategory, c.Name Category
 FROM Production.Product p
  INNER JOIN Production.ProductSubcategory s
   ON p.ProductSubcategoryID = s.ProductSubcategoryID
  INNER JOIN Production.ProductCategory c
   ON s.ProductCategoryID = c.ProductCategoryID;
GO

Then I granted SELECT permission to Jack user.

GRANT SELECT ON dbo.vProduct TO Jack;
GO

If Jack executes a SELECT against the view, he sees records with all columns.


Now, if Jack connects to the database with Power BI, this is what he sees.


Anyone has experienced this? I made a search but could not find a solution. I might have made a mistake or missed something but could not figure it out. Appreciate if you can share your thoughts or a solution if you have already seen this and sorted out.

I used the latest Power BI version that is May 2017.

Note that I see the same result for both Import and Direct Query.