vSAN

VMware vSAN Witness Host Not Found

I experienced a weird issue while using VMware Workstation that caused the issue of VMware vSAN Witness Host Not Found in the health check of the vSAN cluster. Let's look at the issue and the resolution

In continuing to work with the 2-node VMware vSAN stretched cluster in the home lab, I ran into a little issue with the network showing as partitioned and the witness host showing it was in STANDALONE mode when I could ping between the witness appliance and both hosts as well as ping using the vmkernel interfaces from the vSAN hosts and back to the witness appliance. As configured in a recent post, I had configured the Witness Appliance inside of VMware Workstation to POC running the witness outside of my vSAN cluster in the home lab. As it turns out I had ran into not a connectivity issue, but rather an MTU packet size issue as I will explain. If you run into the issue of the VMware vSAN Witness host not found.

VMware vSAN Witness Host Not Found

In the issue I experienced, it appeared like the network connectivity between the witness node and the other two hosts was good or so I thought.ย  However, I had errors in the health of the vSAN cluster.ย  As you can see below, the error presented,ย Witness host not found under theย Stretched cluster section.

VMware-vSAN-Witness-Host-Not-Found-Error
VMware vSAN Witness Host Not Found Error

More detail to some degree on the error if we click on the “witness host not found” message.ย  We see theย Found 0 witness hosts on stretched cluster.ย  The number of witness host on stretched cluster is not 1.

Found-0-witness-hosts-on-stretched-cluster-error
Found 0 witness hosts on stretched cluster error

A helpful command that verifies the state of the Witness host is theย esxcli vsan cluster get command.ย  Note below, theย Local Node State: STANDALONE which indicates the witness host is isolated or in a “network partition”.ย  This means it can’t properly see the other vSAN hosts.

Running-the-esxcli-vsan-cluster-get-command
Running the esxcli vsan cluster get command

There were a few helpful VMware KB articles that helped to point me in the right direction of what was going on:

vSAN Health Service – Witness host not found (2130585)
vSAN Health Service – Network Health – vSAN Cluster Partition (2108011)
vSAN Health Service – Network Health – Hosts small ping test (connectivity check) and Hosts large ping test (MTU check) (2108285)

VMware Workstation Limitation for VMware vSAN Witness Host

As cool as it is to be able to use VMware Workstation for hosting the VMware vSAN Witness Host, it does have a limitation.ย  In putting the errors together, the inability for the stretched cluster to see the Witness host and the “large ping” test fail, I looked at how the Witness host portgroup was configured.ย  I had set the portgroup to jumbo frames, however, in looking the VMware Workstation NIC I was using had not been set to jumbo frames so wasn’t able to communicate.

VMware-vSAN-Witness-Appliance-port-group-configuration
VMware vSAN Witness Appliance port group configuration
NIC-settings-MTU-set-to-jumbo-frames
NIC settings MTU set to jumbo frames

The problem is as far as I have found, VMware Workstation doesn’t support jumbo frames especially outside of the vSwitch, i.e. for bridged traffic out to the LAN.ย  The resolution for me to get past the network partition on the server was to set the MTU value back to 1500 for the vSAN Witness portgroup.ย  This resolved the partition issue and vSAN Witness host not found error, however, I am still left with the large ping test failing.ย  This seems to be more of a soft error however as the cluster is now up and running and able to talk to the vSAN Witness host.

Thoughts

If you run into a network partition issue, be sure your port group/physical nic, switchport all match up the MTU value you are passing along.ย  If you set the MTU to jumbo frames at the portgroup level and don’t have this configured on the physical layer, you will have issues.ย  In my case the limitation of VMware Workstation is evident here as I am not able to pass along jumbo frame sizes bridged to the LAN from the VMware Workstation vSwitch.ย  By configuring the portgroup back to MTU value of 1500, the VMware vSAN Witness Host is found and the cluster is no longer partitioned.

Subscribe to VirtualizationHowto via Email ๐Ÿ””

Enter your email address to subscribe to this blog and receive notifications of new posts by email.



Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.