Sunday, March 30, 2014

Magic Quadrant for Business Intelligence and Analytics Platforms – 2014 February

Gartner has published its Magic Quadrant for Business Intelligence and Analytics platforms covering 17 capabilities grouping them into 3 categories. Business intelligence and Analytics as a software platform has been considered for this.

I have summarized capabilities used for this in a simpler format. You can read the entire document at: http://www.gartner.com/technology/reprints.do?id=1-1QYUTPJ&ct=140220&st=sb.

  • Information delivery
    • Reporting: ability to create interactive reports with or without parameters
    • Dashboards: ability to display operational or strategic information using graphical interfaces such as graphs, gauges and maps.
    • Ad hoc report/query: ability to get required information without IT support connected with predefined sources, metrics, hierarchies, etc. 
    • Microsoft Office integration: ability use Microsoft Office (Excel,..) as reporting and analytics client including advanced features like cell-locking and write-back. 
    • Mobile BI: ability to develop and deliver content to mobile devices and use native functionalities such as touchscreens, camera, etc.
  • Analysis
    • Interactive visualization: ability to interact with information with advanced visual representations that go beyond traditional visualization options such as charts, heat-maps, tree-maps, etc.
    • Search-based data discovery: ability to do search on both structured and unstructured and map easily into a classification structure of dimension and measures.
    • Geospatial and location intelligence: ability to combine geographic and location-related data from multiple sources such as aerial maps, GISs with enterprise data, and display them by overlaying on maps. Advanced features such as distance and route calculations, geofencing, 3D visualizations are considered too.
    • Embedded advanced analytics: ability to leverage statistical function library to consume analytics methods such as Predictive model markup language, R-based models, forecasting, “what-if” analysis with visualizations.
    • Online analytical processing (OLAP): ability to provide OLAP features such as drilling-down, slicing and dicing, write-back with “what-if”, etc. using data architectures like relational, multidimensional or hybrid and storage architectures like disk-based or in-memory. 
  • Integration
    • BI infrastructure and administration: ability to use all components such as security, metadata, etc. by all tools in the platform. The platform should support multitenancy.
    • Metadata management: ability to manage systems-of-records semantic and metadata centrally, including user-defined data mashup and metadata.
    • Business user data mashup and modeling: ability to create user-defined analytical models connecting with multiple multi-structured sources using code-free, drag-and-drop functionalities. Advanced features such as semantic auto-discovery, intelligence join etc. are considered too.
    • Development tools: ability to provide set of programmatic and visual tools for developing objects such as reports, dashboards, models etc.
    • Embeddable analytics: ability to create and modify analytic content, visualizations, and application embedding them into business processes.
    • Collaboration: ability to start discussion thread on information, analysis, analytic content, etc.
    • Support for big data sources: ability to connect with hybrid, columnar and array-based sources such as MapReduce.

Thursday, March 20, 2014

Door-Quiz – SQL Server Sri Lanka User Group Meet-up

We had another Door-Quiz at SS SLUG yesterday and as usual, the question asked was tricky. However, many had circled the correct answer and one lucky person grabbed the goodie-pack. Here is the question asked;

image

Whenever we see a numerator is getting divided by 0, all we expect  is an error because, it is a common fact that division by zero has no value. If we execute the SELECT statement separately, SQL Server will definitely throw an error saying it. But the tricky part of this question is, combining this fact with EXISTS predicate. EXISTS in this scenario only considers the existence of result-set produced by SELECT rather executing it as a general SELECT. Since the predicate becomes TRUE, SQL Server goes to PRINT ‘yes’ without producing any error.

Answer for the question is “yes"Smile.

Thursday, March 13, 2014

Who invented dimensional modeling?

Who introduced or invented Dimensional Modeling? Was it invented by Ralph Kimball? Since he completely aligns with Dimensional Modeling in his business intelligence strategies, many believe that it was from him. Unfortunately, it is one of the misperceptions persist in the industry related to Dimensional Modeling.

What is Dimensional Modeling?
Dimensional Modeling is a technique that has been used for decades for structuring databases in simple manner, supporting business users’ requirements such as understanding data, navigating through them easily, and running both simple and complex data analysis queries. Dimensional model designed with relational database management systems is called as Start-Schemas and dimensional model designed with multi-dimensional databases is called as OLAP cubes.

Who invented this?
History is limited on this, however, as per the record exist;

  • The terms “Facts” and “Dimension” were introduced with a joint project developed by General Mills and Dartmouth University in 1960’s.
  • During 1970’s, Nielsan Marketing Research team used these techniques for grocery and drug store data.
  • In 1980’s Nielsen Marketing Research and IRI used grocery and drug store scanner data to link with customer’s internal shipment data.

As per the history, no single person holds the credit of introducing the concept but it is something that have been evolved with contributions of many.

Read more about Dimensional Modeling: http://en.wikipedia.org/wiki/Dimensional_modeling

Why SQL Server does not drop the procedure and recreate the object when the code is written properly?

This is bit funny, it took few minutes to figure out the reason for getting “There is already an object named '' in the database.” error when the code for dropping and recreating is properly written. It is one of key things to remember when coding, hence sharing the experience.

Have a look at the code and the error thrown. Note that it is not the actual code, the code below is written to show the scenario. Please assume that there are many other statements before and after the code given.

  1.  
  2. ...
  3. ...
  4.  
  5. GO
  6.  
  7. -- dropping GetSalesFor2007 if exist
  8. -- and creating
  9. IF OBJECT_ID('dbo.GetSalesFor2007') IS NOT NULL
  10.     DROP PROC dbo.GetSalesFor2007
  11. GO
  12. CREATE PROCEDURE dbo.GetSalesFor2007
  13. AS
  14. BEGIN
  15.     
  16.     SELECT SalesOrderID, SalesOrderNumber
  17.     FROM Sales.SalesOrderHeader
  18.     WHERE OrderDate BETWEEN '01/01/2007'
  19.         AND '12/31/2007 23:59:59.000'
  20. END
  21.  
  22. -- dropping GetSalesFor2008 if exist
  23. -- and creating
  24. IF OBJECT_ID('dbo.GetSalesFor2008') IS NOT NULL
  25.     DROP PROC dbo.GetSalesFor2008
  26. GO
  27. CREATE PROCEDURE dbo.GetSalesFor2008
  28. AS
  29. BEGIN
  30.     
  31.     SELECT SalesOrderID, SalesOrderNumber
  32.     FROM Sales.SalesOrderHeader
  33.     WHERE OrderDate BETWEEN '01/01/2008'
  34.         AND '12/31/2008 23:59:59.000'
  35. END
  36.  
  37. ...
  38. ...

image

As you see, SQL Server tries to create the procedure (Line no: 27) without dropping it using the DROP PROC statement given (Line no: 25). Simply, the reason for this is, missing the batch separator between end of first procedure and DROP statement of second procedure.

What is a batch?
T-SQL Batch is a collections of SQL statements that need to be parsed, normalized and executed as a single unit. The end of the batch is indicated using GO statement (Read more on GO: http://dinesql.blogspot.com/2009/01/separating-batches-using-new-word.html). There are two important points we need to remember when working with T-SQL batches;

  1. The boundaries for scopes of variables are determined using batch separators, hence variables declared in one batch cannot be used beyond the GO statement. In other words, variables declared cannot be used in a different batches other than the batch it is declared.
  2. Most DDL statements require separate batches and may not be combined with other statements.

If the second point is clearly understood, then the reason for above error can be figured out. Statements like CREATE PROC, CREATE VIEW cannot be combined with other statements in the batch and these statements must start with a new batch. If you note the DROP statement for the second procedure (Line no: 25), you will see that there is no GO statement in between end of first procedure and beginning of DROP statement for the second procedure. Therefore the DROP statement becomes a part of the first batch which contains the first procedure, making the DROP statement as part of first procedure. Have a look on below output;

image

As you see, the DROP statement is in the body of first procedure. If we place a GO statement at the end of the first procedure, it will not become a part of first procedure.

Here are all statements that require a separate batch; CREATE DEFAULT, CREATE FUNCTION, CREATE PROCEDURE, CREATE RULE, CREATE SCHEMA, CREATE TRIGGER, and CREATE VIEW.

Sunday, March 9, 2014