Monday, January 2, 2017

Should we use Compress function on string values to reduce the storage cost?

While designing my new database in Azure as an Azure SQL Database, was thinking to manage the cost for the storage, hence thought to apply COMPRESS function where-ever possible even though it does not drastically reduce the cost in terms of finance. However, there are two main things to consider if data is going to be compressed using COMPRESS function;
  • Compressed data cannot be indexed.
  • Compressed data has to be stored in a varbinanry(max) column
If above facts violates the business requirements, then it is not possible to use the function for compressing data. But assume that the column does not need to be indexed and no harm of storing data in compressed format, then it can be simply applied. Now the question is, will it be useful with all values stored in this particular column?

For that, I made a simple test. See code below and the output of it.

DECLARE @Value1 varchar(8000) = 'Compresses the input expression using the GZIP algorithm.'
DECLARE @Value2 varchar(8000) = 'Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max)'
DECLARE @Value3 varchar(8000) = 'Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max). Compressed data cannot be indexed. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. For automatic compression at the row or page level during storage, see Data Compression.'
DECLARE @Value4 varchar(8000) = 'Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max). Compressed data cannot be indexed. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. For automatic compression at the row or page level during storage, see Data Compression.Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max). Compressed data cannot be indexed. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. For automatic compression at the row or page level during storage, see Data Compression.Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max). Compressed data cannot be indexed. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. For automatic compression at the row or page level during storage, see Data Compression.Compresses the input expression using the GZIP algorithm. The result of the compression is byte array of type varbinary(max). Compressed data cannot be indexed. The COMPRESS function compresses the data provided as the input expression and must be invoked for each section of data to be compressed. For automatic compression at the row or page level during storage, see Data Compression.'
 
 SELECT DATALENGTH(@Value1) LengthOfOriginalValueOfValue1
 , DATALENGTH(COMPRESS(@Value1)) LengthOfCompressedValueOfValue1
 , DATALENGTH(@Value2) LengthOfOriginalValueOfValue2
 , DATALENGTH(COMPRESS(@Value2)) LengthOfCompressedValueOfValue2;

 SELECT DATALENGTH(@Value3) LengthOfOriginalValueOfValue3
 , DATALENGTH(COMPRESS(@Value3)) LengthOfCompressedValueOfValue3
 , DATALENGTH(@Value4) LengthOfOriginalValueOfValue4
 , DATALENGTH(COMPRESS(@Value4)) LengthOfCompressedValueOfValue4;


This clearly shows us that we do not get benefits with all types of values stored and it shows and compression works well only with larger values. Therefore I decided to apply this only for few columns. If you have the same requirements, make sure you apply the compression only for columns that has larger values.

No comments: