Showing posts with label TOP. Show all posts
Showing posts with label TOP. Show all posts

Friday, May 27, 2016

SET ROWCOUNT overrides TOP?

While I was going through an implemented set of codes, I noticed that ROWCOUNT is heavily used (of course, it is an old implementations) and not only that, TOP is with newly added codes. There is an interesting behavior when these two are combined, which was one of the issues I was troubleshooting. Let me explain with an example.

First of all, remember, there is a Behavior Change with ROWCOUNT in SQL Server future release. There will be NO effect on INSERT, DELETE and UPDATE statement with ROWCOUNT in SQL Server future release. Therefore avoid ROWCOUNT with these action statements, instead use TOP.

Have a look on below query. It sets ROWCOUNT to 100, instructing that stop processing the query after 100 rows are returned. However, TOP is used with SELECT, limiting records to 10 records. Because of that we get only 10 records. This proves that TOP overrides ROWCOUNT.

USE AdventureWorks2016CTP3;
GO

SET ROWCOUNT 100;
SELECT TOP (10) *
FROM Sales.SalesOrderHeader;


Now let's change the query bit. Let's instruct SQL Server to process only 10 records using ROWCOUNT and have TOP 100 with SELECT statement.

USE AdventureWorks2016CTP3;
GO

SET ROWCOUNT 10;
SELECT TOP (100) *
FROM Sales.SalesOrderHeader;

What we expect is 100 records as we know that TOP overrides ROWCOUNT but it returns only 10.


What is the reason? This is how it works when ROWCOUNT and TOP are combined. ROWCOUNT overrides TOP keyword in the SELECT only if the set value for ROWCOUNT is smaller that value set with TOP. This is something you need to remember. Best is, try to use only TOP without using ROWCOUNT.

Monday, July 13, 2015

MDX Top function

If we need get Top n records from a relational database table, we simply use SELECT TOP (n).... But how do you do the same with Multi-dimensional database like Analysis Services databases? MDX is the language we used for querying Analysis Services databases, or Cubes. MDX function called TopCount  is the known function for similar operation but it is believed that it has to be used with a numeric expression and the result of TopCount is based on values of the numeric expression used.

In a way, it is true. But what if we need to get a set of dimension members using Top functionality without using a numeric expression. To elaborate, if I want to get 5 products from Products table, I can write something like below using TSQL;

USE AdventureWorks2014;
GO

SELECT TOP (5) Name
FROM Production.Product;

If I want to do the same with Product Dimension in a cube, how can it be done? Can I use TopCount without using a numeric expression? Yes, it is possible. The numeric expression for TopCount function is optional, hence it works like below;

SELECT 
 {} ON 0
 , {TopCount([Product].[Product Categories].[Product].Members, 3)
  } ON 1
FROM [Adventure Works];


In addition to that, Head  function can be used too.

SELECT 
 {} ON 0
 , {Head([Product].[Product Categories].[Product].Members, 3)
  } ON 1
FROM [Adventure Works];


Sunday, April 24, 2011

TOP WITH TIES: Have you shown your TOP 10 Customers properly?

Think about a scenario where you have to show your top 10 customers based on their purchases. If the record set is as below;

image

and if you show first 10 records using TOP 10, have you shown all best customers? See the 11th record, it is same as 10th order, hence he should be added to top-10 list. In this case, top-10 list becomes top-11 list. But we cannot change the query as TOP 11 because it is not always guaranteed that 11th record is same as 10th record. In that case, what would be the way?

The solution is, TOP 10 WITH TIES. It adds additional rows to the resultset if any tied rows for 10th are there. See the code below;

image

Note that WITH TIES can be only used with SELECT statement and ORDER BY must be used too.

Monday, February 22, 2010

Understanding the TOP and TABLESAMPLE operators

Result set can be filtered in many ways. Most common ways is, using the 3rd primary property of the SQL SELECT statement which is the WHERE condition which filters specific set of rows from the result set.

This post is not about WHERE condition, it is all about TOP and TABLESAMPLE.

The TOP operator gives you top n number of records as you need. The “top n” can be changed by changing the order of the result set by using ORDER BY clause.

The TABLESAMPLE operator allows you to randomly pick data from the table. You can instruct to SQL Server to return specific number of records or percent of rows.

Where we can use this? I think that this is commonly used for getting the average from a large result set. If you have a table with millions of records and average has to be calculated on one of the columns, it is worthwhile to use TABLESAMPLE (10 PERCENT) rather than using all records.

   1: -- use all rows in the table for calculating
   2: SELECT AVG(Freight)
   3: FROM Sales.SalesOrderHeader
   4: -- use approximately 10% of rows
   5: SELECT AVG(Freight)
   6: FROM Sales.SalesOrderHeader
   7: TABLESAMPLE (10 PERCENT)

REPEATABLE OPTION
If you execute the above two codes again and again, the average returns from first statement is always same but second statement. This is because of the way it picks records. If you need the same average for all the execution of second statement, REPEATABLE option should be used. The REPEATABLE option has to be used with a repeat_seed, and as long as the repeat_seed is same and no records have been change, same average is returned.

   1: SELECT AVG(Freight)
   2: FROM Sales.SalesOrderHeader
   3: TABLESAMPLE (10 PERCENT) REPEATABLE (1)

What is SYSTEM option?
The SYSTEM option is optional, but it is used by default though you do not use it in your query.

   1: SELECT AVG(Freight)
   2: FROM Sales.SalesOrderHeader
   3: TABLESAMPLE SYSTEM (10 PERCENT)

The SYSTEM option returns approximate percentage of rows and generates a random value for each data page. SQL Server decides which data pages to be included for the sample based on the random value generated and the percentage specified in the query. If a page is decided to to be included, all rows in the page is included for sampling, else the page is excluded. If you use set the TABLESAMPLE with number of rows, instead of a percentage, number of rows will be converted to a percent and process same way.