Monday, February 15, 2010

Are you calculating Average correctly?

Almost all engineers have used the AVG aggregate function, but have you ever checked whether the result is correct? This was discussed while I was conducting my new SQL class, here is an example for explaining it, where you might make a mistake;

   1: CREATE TABLE dbo.EmployeeCommision
   2: (
   3:     EmployeeId int PRIMARY KEY,
   4:     CommisionGiven money NULL
   5: );
   6:  
   7: -- Inserting five employees' commison records
   8: -- In real implementation, two tables 
   9: -- will maintain for holding employees records
  10: -- and commison records
  11: INSERT INTO dbo.EmployeeCommison 
  12:     (EmployeeId, CommisionGiven)
  13: VALUES
  14:     (1, 50), (2, 50), (3, NULL), (4, 50), (5,50)
  15:   
  16: -- Calculating the average commision given 
  17: -- to an employee, You may use LEFT OUTER JOIN 
  18: -- if two tables are maintained
  19: SELECT AVG(CommisonGiven) 
  20: FROM dbo.EmployeeCommision
  21: -- Result of above is 50, which is wrong in 
  22: -- this case to correct the issue, use 
  23: -- ISNULL function with AVG function
  24:  
  25: SELECT AVG(ISNULL(CommisonGiven, 0)) 
  26: FROM dbo.EmployeeCommision
  27: -- Now the result is 40, which is correct.

The reason for this is because, all aggregate functions ignore NULL values. Only exception is COUNT(*) that counts NULL values.

2 comments:

Prithiviraj kulasinghan said...

When you execute a query like this, you will see a warning message:

Warning: Null value is eliminated by an aggregate or other SET operation.

This message could be removed by executing SET ANSI_WARNINGS OFF statement.
By Default, if you have a null value it is eliminated from aggregate functions except COUNT(*)

The reason for this lies with the interpretation of NULL. It could be Not applicable. For example, if you have spouse's age in a column and want to find the average, you should eliminate those who don't have one.
Of course, NULL is misused these days. That's one of the reason MS asks people to minimize the use of NULL.

Dinesh Priyankara said...

Thanks for the comment Preethi, for some reason, I have missed the explanation.... updated the post.