Tuesday, October 1, 2013

SQL Server: How much stuff in this database is compressed?



One of the best ways to reduce IO waits is to do less IO :).  Compression/decompression comes at a CPU cost*, but if a system has CPU to spare while experiencing a storage bandwidth, latency, or capacity issue then compression should be one of the first considerations.

This is just a starting point for evaluating the use of compression in the database.  Prolly also makes sense to look for compression candidates based on size (might not make sense to compress a table that will never consume more than 8 pages) and qualification for online index rebuild (some data types cannot be rebuilt online).    
 
SELECT
   pt.data_compression_desc,
   COUNT(pt.data_compression_desc) AS [partition_count],
   SUM(ptstats.used_page_count) * 8 AS [UsedPagesKB],
   SUM(ptstats.reserved_page_count) * 8 [AS ReservedPageKB]
FROM
   sys.dm_db_partition_stats AS ptstats
JOIN
   sys.partitions as pt
   ON
   ptstats.partition_id = pt.partition_id
GROUP BY
   pt.data_compression_desc


*Lets talk a little about the CPU cost of compression.  First of all - don't just compare peak CPU busy before and after implementing compression :)  Got to also consider how long a particular workload takes to complete.  If the workload took 4 hours to complete before compression at an average of 40% CPU utilized, and it now takes 2 hours at 85% CPU utilization... its not fair to say that compression had a 45% CPU utilization overhead.  If the workload completed in half the time (2 hours) at twice the CPU utilization (80%), the compression would have had no net overhead.  As it is, in half the time at more than double the CPU utilization it is about a 5% overhead.

Its best if in the baseline, you can actually sum the worker time, physical IO, and logical IO across all queries in the workload.  Compare that to worker time, physical IO, and logical IO after implementing compression.  Sure, go ahead and throw in workload elapsed time and average CPU utilization :)  

Its important to note that the net CPU overhead of compression may be LESS than expected based on the amount of CPU ticks involved in compress/uncompress operations.  Here's why: 
1. You'll likely do less total physical IO.  Transfering physical IO contents into the bpool takes a small amount of CPU.
2. The database pages are compressed in memory - so a given query of page compressed contents (assuming reasonable compression rate) should perform fewer logical reads than the same query against uncompressed data.  Fewer logical reads means less total CPU consumption.

So, compress/uncompress gobbles up some CPU.  But fewer physical and logical IOs will give some back.  And most systems I observe have a lot more CPU resources to spare than IO resources.  

No comments:

Post a Comment