Software Defined Storage

Ceph Backup: Don’t Lose Your HCI Data

Don't overlook the importance of Ceph backup! Learn why it's vital to have reliable backups for your Ceph or CephFS storage solution.

This is more a public service announcement rather than a technical blog, but we will take a high-level look at a warning that I want to make sure that those who are using Ceph or CephFS on top of Ceph for HCI storage either for virtual machines or your files, you want to make sure that you don’t just trust your hypervisor-based virtual machine backups. Why? Well, let’s take a look.

What is Ceph? Brief overview

Ceph is software-defined storage that takes local disks on server hosts and aggregates these with servers in a cluster so that the storage pool looks like a single shared storage volume. You can also enable CephFS on top of Ceph to have an HCI solution where you can store your files on top of Ceph.

Why backups matter?

Backups are extremely important even with a software-defined solution like Ceph. Why? Well, even though your data is protected from a hardware failure, you still need backups to protect you from things like accidental data deletion or ransomware. Let’s say you are running CephFS on top of Ceph and a user has files stored in CephFS. Their workstation then gets infected with ransomware that encrypts all their files stored in the CephFS share.

The HCI data protection doesn’t prevent the files from being encrypted, nor does it automatically recover those files. To Ceph this is just normal file changes. So, you need a backup solution for that.

Traditional VM backups don’t protect CephFS

One of the things I discovered recently, and glad I did, and I should have thought about this beforehand. The CephFS mount isn’t backed up like you think it would be in a regular virtual machine backup. In fact, when you try a “File level restore” on a modern backup solution, you won’t see the files listed in your mount directory. Why?

Well, because the files don’t actually reside in the mount directory. CephFS presents the files to the mount so the users can see and interact with the files. However, they aren’t actually in that location.

Case in point. Let me show you two screenshots of the same server, looking at the file-level view and what a modern backup solution actually sees if you try to perform a file-level restore.

Below is the file-level view of the server. As you can see there are definitely files mounted in this directory.

File level view of a cephfs folder mounted in ubuntu server
File level view of a cephfs folder mounted in ubuntu server

Below is a backup I took of one of the VMs that is participating in the CephFS-enabled cluster. As you can see, when I mount the backup of the virtual machine, no files are present. Keep this in mind if you are using Ceph or another software-defined storage solution like GlusterFS, as an example. They both will show the same thing. Don’t be caught unawares if you have a disaster and need to recover. Also, this has nothing to do with the backup solution itself. The screenshot is from NAKIVO Backup & Replication. But the same test in Veeam resulted in the same.

Backup level view of cephfs mount folder
Backup level view of cephfs mount folder

Solution for backing up Ceph and CephFS

So, what is the solution for backing up something like Ceph and CephFS. Well, there are a couple of scenarios:

  • Something like Virtual Machines running on Ceph
  • Files residing in CephFS

With Ceph storage that is backing something like virtual machine hypervisors, traditional backups that you would take of VMs are fine. It will capture the data since the hypervisor is aware of the storage and “sees” it, so the backup solution will be able to as well.

However, CephFS is a little more dangerous here as it is a file system running on top of Ceph. So, there is the additional abstraction layer to think about. How did I work around this?

I used physical machine backup agents running in side the virtual machine. Using a backup solution like Veeam that has agents (most other solutions do as well), you can install the agent inside the virtual machine (even though most of these are called physical machine backup agents) and the agent will then see the files mounted as a user would see them and will be able to perform a backup of your data there.

Guest files restore

Below, I will walk you through what you then see with a guest files restore in something like Veeam. Here I have created an agent-based backup of the Ubuntu Server virtual machine. Now I am choosing to restore > Guest files restore.

Choose guest files restore
Choose guest files restore

Choose your operating system type.

Choose your os type
Choose your os type

Choose the virtual machine backup you want to mount.

Choose the virtual machine
Choose the virtual machine

Choose the restore point.

Choose the restore point
Choose the restore point

Choose the helper host (the linux machine that will be used to mount the backup.

Choose the helper host for the guest files restore
Choose the helper host for the guest files restore

Enter a reason (optional).

Restore reason in veeam
Restore reason in veeam

Finally, click Browse.

Browse the mounted backup for file restore
Browse the mounted backup for file restore

You will then get a file browser that will enable you to browse your backed-up data and restore it.

Files are available to restore in the guest files restore
Files are available to restore in the guest files restore

Wrapping up

Even though software-defined storage is awesome and allows us to do so many things in the enterprise data center, be sure to understand the implications of backing up and recovering your data. Don’t assume that since you have a backup of all the virtual machine hosts in the cluster that you will be able to recover your data. The abstraction layer that is used with these types of storage like CephFS prevents traditional VM-level backups from capturing the data in the mounted folders. You will need to have a local backup agent to capture the files mounted from your HCI storage in something like CephFS.

Subscribe to VirtualizationHowto via Email ๐Ÿ””

Enter your email address to subscribe to this blog and receive notifications of new posts by email.



Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.