Beyond SQL on VMware Best Practices

Going beyond SQL in VMware Best Practices has involved settings and configurations that are never mentioned in the SQL Server On VMware Best Practices Guide. Working in a healthcare environment means working with a large EMR (Electronic Medical Record) provider like EPIC. In tuning one of their many large sql databases they had some recommendations. Some of these are throwbacks to my old days of tweaking performance out of a desktop with settings like TCP parameter KeepAliveTime. How many of you go beyond the recommendations in the SQL Best Practice Guide?

floating book in quaint library
Knowledge shared becomes wisdom attained (Photo by Jaredd Craig on Unsplash)

I’ve listed here some of the settings that have been suggested for our larger more accessed and just plain busy sql servers:

ESXi Settings

Adjust the Round Robin IOPS limit from the default 1000 to 1 on each database LUN. Refer to VMware KBA 2069356 for more information on setting this parameter. (We already utilize Round Robin but each lun was set to the default)

Why would you want to make this change?
“The default of 1000 input/output operations per second (IOPS) sends 1000 I/O down each path before switching. If the load is such that a portion of the 1000 IOPS can saturate the bandwidth of the path, the remaining I/O must wait even if the storage array could service the requests. The IOPS or bytes limit can be adjusted downward allowing the path to be switched at a more frequent rate. The adjustment allows the bandwidth of additional paths to be used while the other path is currently saturated. “

How to make this change:

In ESXi 5.x/6.x:
for i in esxcfg-scsidevs -c |awk '{print $1}' | grep naa.xxxx; do esxcli storage nmp psp roundrobin deviceconfig set –type=iops –iops=1 –device=$i; done

Where, .xxxx matches the first few characters of your naa IDs.
 
To verify if the changes are applied, run this command:

esxcli storage nmp device list
 
You see output similar to:
 
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba33:C1:T4:L0, vmhba33:C0:T4:L0

Registry Settings

Configure Windows TCP Parameters in the Registry

The default setting for the Windows TCP parameter KeepAliveTime is two hours. This setting controls how often TCP sends a keep-alive packet to verify that an idle connection is still intact. Reducing it from two hours to five minutes helps windows detect and clean up stale network connections faster.


How to make this change:

Use regedit to create the DWORD KeepAliveTime (if it does not currently exist) at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ . Then modify the value to 300000 (time in milliseconds).

The default setting for the Windows TCP parameter TCPTimedWaitDelay is four minutes. This setting determines the time that must elapse before TCP/IP can release a closed connection and reuse its resources. By reducing the value of this entry, TCP/IP can release closed connections faster and provide more resources for new connections.

How to make this change:

Using regedit to create the REG_DWORD TCPTimedWaitDelay (if it does not currently exist) at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ . Then modify the value to 30 .

What kind of ESXi settings, Windows registry settings, config file changes, etc. do you implement in your environment that goes beyond the SQL Server On VMware Best Practices Guide ? As always, I look forward to your comments and sharing of knowledge.

HPE DL560 Gen9 Will Soon Be on VMware HCG

HPE DL560 Gen9 servers will soon be on the VMware HCG. Like many of you we plan ESXi upgrades to new versions months into the future. As we approach our planned upgrades we hope that our HPE hardware will be on VMware’s HCG (Hardware Compatibility Guide) as approved for the latest ESXi 6.7 version. I can’t remember the last time that our hardware wasn’t on the HCG as approved.

My understanding is that most people that are using HPE server hardware are using the DL360 models. Typically those models get qualified faster. For philosophical reasons we like to use the DL560 models. We currently have quite a few of the DL560 gen9 models that we want to use. So far HPE has not bothered to have the DL560 gen9 model qualified as compatible with ESXi 6.7 but they did qualify the gen10 model. Our HPE guy has hinted that the DL560 is just not purchased nearly as much as the DL360s and that contributes to the slowness of getting the hardware on the HCG.

Well I have good new for those in the same boat as us. HPE has notified us that the qualification is being performed and will be completed around April 2nd. This may or may not come true but I thought I’d share the information.

I am very interested to hear what you are doing in your datacenter with HPE and VMware. Are HPE DL360s more your flavor or do you like the HPE DL560s like we do? Comments are always welcome! I look forward to hearing from all of you especially if you have HPE DL560 gen9 servers in your environment.

Datacenter with full racks of servers with flashing lights.  HPE DL560 gen9 servers are among the other servers