Beyond SQL on VMware Best Practices

Going beyond SQL in VMware Best Practices has involved settings and configurations that are never mentioned in the SQL Server On VMware Best Practices Guide. Working in a healthcare environment means working with a large EMR (Electronic Medical Record) provider like EPIC. In tuning one of their many large sql databases they had some recommendations. Some of these are throwbacks to my old days of tweaking performance out of a desktop with settings like TCP parameter KeepAliveTime. How many of you go beyond the recommendations in the SQL Best Practice Guide?

floating book in quaint library
Knowledge shared becomes wisdom attained (Photo by Jaredd Craig on Unsplash)

I’ve listed here some of the settings that have been suggested for our larger more accessed and just plain busy sql servers:

ESXi Settings

Adjust the Round Robin IOPS limit from the default 1000 to 1 on each database LUN. Refer to VMware KBA 2069356 for more information on setting this parameter. (We already utilize Round Robin but each lun was set to the default)

Why would you want to make this change?
“The default of 1000 input/output operations per second (IOPS) sends 1000 I/O down each path before switching. If the load is such that a portion of the 1000 IOPS can saturate the bandwidth of the path, the remaining I/O must wait even if the storage array could service the requests. The IOPS or bytes limit can be adjusted downward allowing the path to be switched at a more frequent rate. The adjustment allows the bandwidth of additional paths to be used while the other path is currently saturated. “

How to make this change:

In ESXi 5.x/6.x:
for i in esxcfg-scsidevs -c |awk '{print $1}' | grep naa.xxxx; do esxcli storage nmp psp roundrobin deviceconfig set –type=iops –iops=1 –device=$i; done

Where, .xxxx matches the first few characters of your naa IDs.
 
To verify if the changes are applied, run this command:

esxcli storage nmp device list
 
You see output similar to:
 
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba33:C1:T4:L0, vmhba33:C0:T4:L0

Registry Settings

Configure Windows TCP Parameters in the Registry

The default setting for the Windows TCP parameter KeepAliveTime is two hours. This setting controls how often TCP sends a keep-alive packet to verify that an idle connection is still intact. Reducing it from two hours to five minutes helps windows detect and clean up stale network connections faster.


How to make this change:

Use regedit to create the DWORD KeepAliveTime (if it does not currently exist) at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ . Then modify the value to 300000 (time in milliseconds).

The default setting for the Windows TCP parameter TCPTimedWaitDelay is four minutes. This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. By reducing the value of this entry, TCP/IP can release closed connections faster and provide more resources for new connections.

How to make this change:

Using regedit to create the REG_DWORD TCPTimedWaitDelay (if it does not currently exist) at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ . Then modify the value to 30 .

What kind of ESXi settings, Windows registry settings, config file changes, etc. do you implement in your environment that goes beyond the SQL Server On VMware Best Practices Guide ? As always, I look forward to your comments and sharing of knowledge.