Thursday, August 8, 2013

AIX hotspot hunting and potential benefit of tier 0 cache: lvmstat overtime data

Recently I posted a method for tacking on a timestamp to lvmstat output with awk.  Maybe you think that's not a big deal.  In my world, it is :)

The graph above is a heat map of a logical volume with a JFS2 filesystem on top.  Its cio mounted, and has tons of database files in it.  With this particular database engine, all reads and writes to a cio filesystem are 8k (unlike Oracle and SQL Server which will coalesce reads and writes when possible).  Also, all database reads go into database cache, unlike Oracle which can elect to perform direct path reads into PGA rather than into the SGA database cache.  Those considerations contribute to the nice graph above.

The X axis is the logical partition number - the position from beginning to end in the logical volume.
The left primary axis represents the number of minutes in out of 360 elapsed minutes that the logical partition was ranked in the top 32 for the logical volume by iocnt.  (Easy to get with the -c parameter for the lvmstat command.)

So... what use is such a graph?  Together with data from lslv, you can map hot logical partitions back to their physical volumes (LUNs)... that is invaluable when trying to eliminate QFULL occurrences at the physical volume level.  For another thing, together with database cache analysis (insertion rates at all insertion points, expire rates, calculated cache hold time, etc) heat maps like these can help to estimate the value of added database cache.  They can also help to estimate the value of a tier 0 flash automated tiering cache - whether the flash is within the storage array, or in an onboard PCIe flash device.  I've worked a bit with EMC xtremsw for x86 Windows... looking forward to working with it for IBM Power and PCIe flash form factor.  Hoping to test with the QLogic Mt Rainier/FabricCache soon, too.  If the overall IO pattern looks like the graph above, as long as the flash capacity together with the caching algorithm results in good caching for the logical partitions on the upward swing of the hockey stick, you can expect good utilization of the tier 0 cache.  I personally prefer onboard cache, because I like as little traffic getting out of the server as possible.  In part for lower latency... in part to eliminate queuing concerns... but mostly to be a good neighbor to other tenants of shared storage.

So... in reality, the system I'm staring at today isn't as simple as a single logical volume heat map.  There are more than 6 logical volumes that contain persistent database files.  All of those logical volumes have hockey stick shaped graphs as above.  Its easy to count the number of upswing logical partitions across all of those LVs, and find out how much data I really want to see in the flash cache.  Now... if the flash cache transfer size is different than the logical partition size that should be considered.  If my logical partition size is 64mb, and the flash tiering always promotes contiguous 1 GB chunks from each LUN, that could lead to requiring a lot more flash capacity to contain all of my hot data.  On the other hand, if the transfer size into tier 0 cache is 1 mb, and the heat is very uneven among 1 mb chunks within each hot LP... the total flash cache size for huge benefit might be a lot smaller than the aggregate size of all hot LPs.  Something to think about.  But I can't give away all of my secrets. Keeping at least some secrets is key to sasquatch survival :)

No comments:

Post a Comment