Thursday, September 17, 2015

Planning for VMware database OIO

I've mentioned before that planning for database disk io is a negotiation-ideally among application owners, DBA, and storage administrators (additional stakeholders may be virtualization admins or server administrators).
This blog post will demonstrate why I think of this optimization as a negotiation.

A bunch of thoughts about the blog post below, which I'll clean up and then reference in the comments there.  Maybe even add pictures?

I don't want this to be recieved as a rant, moreso a method for planning to accommodate expected levels of database outstanding io (OIO) on VMware systems.


Disk queues always fill from application/database level down-eg guest LUN => guest vHBA adapter => host LUN => host HBA adapter.

Since there is only 1 guest LUN for the database, and maximum guest LUN service queue depth is 254, that's the max oio that LUN will pass to the guest adapter*, which will get passed to the host LUN, and on to the host adapter.  Additional slots at lower levels won't be used.  The 512 command per host LUN limit leads to lots of empty slots if the guest can only send 254.  Add another guest LUN on the host LUN, and max OIO for both guest LUNs becomes 508, a better match.

*But* in the article EMC VNX & VMAX storage are mentioned. For the VNX host LUN queue depth should be limited: if using virtual provisioning no greater than 32 to prevent QFULL messages.  Otherwise, (14*data disks)+32 is the VNX max LUN queue depth to avoid QFULL messages.

The VMAX is a bit more generous. 64 is commonly used as a LUN queue depth, and presented LUNs are guaranteed resources for a queue depth of 64.  Resources can be borrowed to serve a queue length up to 384 if the queue depth was set higher than 64.

So - it's not just the front end port max aggregate queue depth that needs considered in planning, but host LUN max queue depth as well. Hitachi VSP is another example: recommended max host LUN queue depth of 32.

Application side determines io needs, and can design from top down.  Disk Io needs can be expressed in terms of peak read & write bytes/sec, iops, outstanding io with a duration of peak.  Further characterization can come from descriptions of burstiness, sequential/random, and distribution of IO sizes.

On the hardware side, design from bottom up.  Read IOPs & read bytes/sec are important considerations for determining a sufficient number of HDDs or other devices to meet spec. 

Write bytes/sec is used to validate the amount of SAN write cache available.  Rate of destaging that write cache is dependent on how many devices underneath, RAID CPU and write amplification factors, and how many read cache misses are satisfied while write cache is destaging.

Total front end IOPs is used to validate sufficiency of front end CPU.

Total bytes/sec(read and write) is used to validate storage network bandwidth end-to-end.  

The host LUN queue depth is based on array considerations.
The amount of oio the application needs determines the number of host LUNs.  Guest LUNs are in the same count, with a guest LUN queue depth that agrees with the host LUN queue depth.
Now, present as many host LUNs (with guest LUNs on top) to meet the outstanding IO requirement.  If outstanding IO requirement is 512 and LUN queue depth is 32, a minimum of 16 LUNs are needed.

*An extra consideration, especially for Windows guests: Windows has an aggregate limit per Windows PhysicalDisk of 256 for wait queue and service queue.  Once (service queue length + wait queue length) is 256, additional threads that want to submit io must sleep until a queue slot opens for them.





No comments:

Post a Comment