vSAN

Replacing VMware vSAN Cache Disk and Resyncing vSAN Objects

A look at Replacing VMware vSAN Cache Disk and Resyncing vSAN Objects after a failed cache drive in the home lab VMware vSAN environment

In the previous post, VMware vSAN Home Lab Hardware Failure and Resiliency, I covered my first hardware failure in my vSAN home lab environment. The environment has been up and running now for around a year on an all flash, NVMe-based disk configuration. Of course, these are commodity drives purchased from Amazon, so not enterprise-grade hardware by any stretch. As covered, I was well pleased with how vSAN handled the failure and the resiliency of the solution. After ordering a replacement NVMe drive for the cache disk that failed, I replaced the drive and was ready to go with bringing the vSAN environment back to a healthy state. In this post, we will take a look at replacing VMware vSAN cache disk steps and resyncing vSAN objects.

Removing the Dead vSAN Cache Disk

A quick overview of my vSAN environment:

  • VMware vSphere 6.7 Update 1
  • 2-node stretched cluster with witness appliance
  • All NVMe storage (cache and capacity) using commodity Samsung EVO drives

After the failure, the first thing I did was Google around on steps to replace vSAN disks to try to find a good step-by-step walk through or guidance directly from VMware, which I found in the VMware KBย Remove Disk Groups or Devices from vSAN.ย  This provides a good basic overview of the process involved and a few points to note, that I covered in the previous post, including removing the cache disk removes the disk group in general, etc.ย  Just a quick look again at this process, I have already posted these images in the previous post, but as a quick refresher of the steps taken, this is how I removed the dead drive.ย  I found many posts stating what seemed like the obvious, but you don’t want to physically remove the device before you remove it from vSAN.ย  So, this is what I did.

Removing-the-dead-cache-disk-from-vSAN-disk-management
Removing the dead cache disk from vSAN disk management
Confirming-the-removal-of-the-dead-cache-disk-from-vSAN-disk-management
Confirming the removal of the dead cache disk from vSAN disk management

Replacing VMware vSAN Cache Disk and Resyncing vSAN Objects

Now that the dead cache drive was removed from the vSAN configuration, I powered off the host and removed the dead drive, installed a new Samsung 970 250 GB NVMe drive, along with another 1 TB 970 for a standalone datastore, powered the server back up and was ready to bring vSAN back to a healthy state.

Host-still-in-maintenance-mode-from-reboot-and-has-2-free-disks-that-can-be-claimed
Host still in maintenance mode from reboot and has 2 free disks that can be claimed

Now, we simply need to claim the disks to be used by vSAN.ย  To do that highlight your vSAN enabled cluster and navigate toย Configure > vSAN > Disk Management, select the particular host you want to claim disks for,ย and click the “Claim unused disks for vSAN” button which is the first button with the check marks.

Claim-Unused-Disks-for-VMware-vSAN
Claim Unused Disks for VMware vSAN

I now have a healthy cache disk displaying in theย Claim Unused Disks for vSANย dialog.ย  Select which drive(s) to use for theย cache tier and which drive(s) to use for theย capacity tier.

Selecting-the-drives-to-be-used-for-VMware-vSAN-in-the-disk-group
Selecting the drives to be used for VMware vSAN in the disk group
The-task-to-add-disks-to-the-vSAN-cluster-begins
The task to add disks to the vSAN cluster begins

Resyncing the vSAN Objects

Now that disks have been claimed again on the host, the objects need to be resynced.ย  This will happen automatically.ย  However, you can also kick this off manually as well in theย Resyncing Objectsย menu.ย  As you can see below, I haveย 217 objects that need to be resynced.ย  You can also click theย Resync Now button to manually initiate the process.

Resyncing-objects-dashboard-shows-a-number-of-objects-that-need-to-be-synchronized
Resyncing objects dashboard shows a number of objects that need to be synchronized

As the process gets underway to Resync the vSAN objects, you will see the Resyncing Objects, Bytes left to resync, and ETA to compliance all adjust accordingly as the data is calculated and as the Resyncing operation progresses.

Resyncing-of-vSAN-objects-begins-after-disks-are-claimed
Resyncing of vSAN objects begins after disks are claimed

A different view of the data.ย  Under theย Virtual Objects dashboard, you can see theย Placement and Availability status showing theย Reduced availabilityย count and theย Healthy count.

Virtual-Objects-shows-healthy-vs-reduced-availability-objects-in-the-vSphere-client

Virtual Objects shows healthy vs reduced availability objects in the vSphere client.ย  As the data is calculated in the home lab environment, the Resyncing Objects count has gone up considerably as well as the Bytes left to resync, and the ETA to compliance.

Resyncing-vSAN-objects-continues-to-progress
Resyncing vSAN objects continues to progress

The vSphere client provides a powerful little tool when it comes to Resyning objects.ย  In a production environment, you may want to even further reduce the impact to production performance with the Resyncing operations.ย  Theย Resync Throttling control allows throttling the Resync operation.ย  You can see the optionย Enable throttling for resyncing objects traffic and adjust the slider to define the threshold.

The-vSphere-client-allows-the-ability-to-set-Resync-Throttling
The vSphere client allows the ability to set Resync Throttling

Another interesting view and visibility that the vSANย Resyncing Objectsย dashboard gives is displaying the individual virtual machines and theย Bytes Left to Resync andย ETA which allows granularly seeing the times left and data to be copied for specific VMs and their respective VMDKs.

Resyncing-objects-dashboard-shows-the-ETA-to-compliance-and-amount-of-data-to-resync-per-VMDK
Resyncing objects dashboard shows the ETA to compliance and amount of data to resync per VMDK

Under theย Capacityย dashboard for vSAN, my home lab capacity for the vSAN datastore has now returned to the previous 1.82 TB configured (with deduplication and compression).

The-vSAN-capacity-now-reflects-the-capacity-pre-disk-failure
The vSAN capacity now reflects the capacity pre disk failure

After allowing the Resyncing Objects operation to run to completion, the VMs are now all listed asย Healthy.

After-the-vSAN-resync-objects-operation-completes-all-objects-were-showing-as-healthy-once-again
After the vSAN resync objects operation completes all objects were showing as healthy once again

I have to say the entire process used in replacing VMware vSAN Cache Disk and Resyncing vSAN Objects was easy, intuitive, and informative.ย  The overall process to change out the dead vSAN cache drive and allow the vSAN objects to resync once a new cache drive was installed, disks claimed for the vSAN, and resync the objects was a breeze

Takeaways

The home lab environment running vSAN has performed extremely well for me in my testing on an all NVMe configuration.ย  Using commodity hardware, this is the first failure I have had in a year of running the configuration 24×7 with no downtime.ย  I have bashed on this environment pretty well also as it gets hit just about everyday with new VMs provisioned, tore down, appliances deployed, etc.ย  Replacing VMware vSAN Cache Disk and Resyncing vSAN Objects process was super simple and I love the visibility that VMware gives to the underlying vSAN processes such as the Resync operation right within the vSphere client.ย  You are left with no guesswork on what is going on with the underlying infrastructure.

Subscribe to VirtualizationHowto via Email ๐Ÿ””

Enter your email address to subscribe to this blog and receive notifications of new posts by email.



Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.