Ceph Storage Calculator to find Capacity and Cost
If you are spinning up a Ceph storage pool and working with Ceph to store things like virtual machines in your Proxmox VE Server cluster, you may want to have an easy way to calculate the usable storage for your Ceph cluster. Also, besides finding capacity, you may want to have a feel for the cost of the storage for the amount of storage capacity you have. I have created a simple, but helpful ceph storage calculator to find capacity and cost in your Ceph Cluster.
Ceph Calculator
Ceph Usable Storage Calculator
Note the following input fields of the Ceph calculator:
- Number of hosts
- Number of Disks per Host
- Disk Size
- Cost per Disk
- Replication Factor
- Erasure coding calculator
- Data chunks (k)
- Coding Chunks (m)
Ceph Capacity
Ceph is a hyper-converged storage solution (HCI) that is commonly used for Proxmox VE clusters for storing virtual machines. All of the Proxmox VE servers contribute local storage to the overall storage pool to have a single logical storage volume.
However, there is overhead from a capacity standpoint when you have an HCI storage solution like Ceph since the underlying resiliency mechanisms will take storage space to protect your data. So there will be a difference between raw and usable capacity.
Factors that affect Ceph Capacity
There are several factors that can affect the usable space in your Ceph storage cluster. These include:
- Number of servers
- Number of disks per server
- Erasure coding configuration
- Erasure code parity
- Replication configuration
- Data size and type of data you are storing
- Resiliency level
Erasure Coding Considerations
Erasure coding can be hard to wrap one's mind around at times, but it is a way to store distributed data with parity across a number of disks. So you have a data component and a parity, or redundancy data. The erasure code settings are important to understand since they can affect the usable capacity, data protection, and performance of the Ceph cluster.
Below is a high-level overview of Ceph storage architecture:
Capacity Planning Best Practices
It is a good idea to use a Ceph storage calculator like we have here to understand the capacity you will have and the cost of your storage in the Ceph storage cluster. Even with proper planning, you will still need to regularly monitor and adjust your capacity to make sure you have the best performance and data protection with your configuration.
You will want to have the minimums when it comes to Ceph storage and with Ceph more is always better so that your initial investment can make sure you can select the proper configuration from the beginning.
- The minimum number of hosts for a replicated Ceph environment is 3
- The recommended minimum number of hosts for erasure coding with Ceph is 6
Replication vs. Erasure Coding:
- Replication: This is a Simple and fast type of data resiliency that works in environments where storage overhead is ok and you just want to get up and going with Ceph.
- Erasure Coding: This configuration is more storage-efficient and is the preferred type of Ceph configuration for large-scale deployments where you want to minimize costs.
Fault Tolerance:
- The Replication Factor basically determines how many disk or host failures your cluster can tolerate without losing your data
- Erasure Coding Parameters (k and m) values in the config determine the number of failures your cluster can have, with m specifying the number of disks that can fail per object.
Cost Considerations:
- Balancing the Cost per Disk with the Replication Factor or Erasure Coding settings is extremely important for optimizing both performance and budget.
Scalability:
- Setting the Number of Hosts and Disks per Host is an important consideration that will affect the type of fault tolerance you choose and how much capacity you will have
Common Capacity Planning Mistakes
You will want to avoid the common mistakes when planning your Ceph storage cluster. These may include the following:
- Underestimating the data growth you expect to see
- Missing the storage requirements for your workloads
- Not understanding erasure coding and how it can impact usable capacity
- Not taking a degraded state or maintenance states into account
- Not planning for data loss scenarios
Wrapping up
Hopefully, this Ceph storage calculator will be helpful to those who are wanting a quick and easy way to calculate the usable capacity and cost of their Ceph storage cluster. Still yet, you need to plan your deployment of your Ceph storage carefully and select either from replication or erasure coding settings as your preferred means of protecting your data.
10hostX10diskX1Tb RF: 2
Disk Failure Tolerance Per Object: 1
Means 1 host (10disk) can be down without dataloss ?