Storage Reliability: Durability vs. Availability

Introduction

Storage is bound to fail — regardless of whether you are using HDDs, SSDs, enterprise-grade mediums or not, data integrity is not guaranteed. Many factors can cause storage to fail or become unavailable: bit-flips, RAID failure — even a fiber cut can disrupt your access to stored information. In the case of web-hosting, or content delivery networks (CDNs with storage APIs or cloud storage providers), the term “Storage durability” and “Storage availability” are often used incorrectly — leading to, at best, no loss of data — or worse, losing valuable (unrecoverable) information.

Having said that, the distinction between the two terms is small, but succinct: durability refers to how safe data is from being lost, while data availability refers to how often you can access your stored data.

Storage Durability

When a provider mentions the term “reliability,” or “uptime,” they are often speaking of your guaranteed access (you may be promised, say, that your data is accessible 99.95% of the time during any given month; this is equal to around 43 seconds a day of downtime or almost 5 hours of downtime a year). While reliability can refer to the durability of your stored objects, it is crucial to have this clarified to avoid any unintended consequences; otherwise, a small fire, or events of force majeure (floods, earthquakes, etc. that can damage drives), may wipe out a large portion (or even all of) of data.

One idea that has not been mentioned yet is how storage durability can be improved. When storing important information, cloud platforms often employ “RAID” setups and backups. The most important, but often overlooked feature, is off-site replication.

A simple analogy can explain: imagine if all of your data is stored on your thumb-drive. If there is a flood, your flash drive might become irrecoverably damaged and data is lost. If you had two mirrored USB drives, albeit having one stored in another state or province, the latter drive will still be intact, thus maintaining the integrity of your data. This can be further improved by having local mirrors, or “RAID,” setup among both drives — this ensures that in the event of one site being lost, the other site still has a layer of redundancy.

Simply put: from data corruption, to complete data loss, data durability is a measure of how safe data is from being lost. If a provider advertises 99.99995% data durability, this means that there will be a 0.00005% chance of data being lost (this does not necessarily have to be a complete loss — data loss refers to a loss of 1 or more bits) during a given year.

Storage Availability

The easiest category of “reliability” to understand is availability. While providers often strive to guarantee network stability and have mirrors to ensure uptime during maintenance, there are always factors that can defeat even the most reliable infrastructure.

If “host A” states that they will guarantee availability for 99.99% of the time, while “host B” states that they will only guarantee availability for 99% of the time, we are looking at almost a four day difference in possible downtime a year.

(for easier calculations, you can refer to https://uptime.is)

Depending on the use-case, a single second of lost availability might cost hundreds — even millions of dollars in lost revenue. So, taking storage availability into account is significant and as important as durability.

Conclusion

When browsing for cloud providers, or even building your own server, data durability and availability are key factors to deciding on a suitable solution. Data loss or corruption can occur at any time — even with the biggest of cloud providers. Furthermore, data availability should not be overlooked either (not being able to access your objects can mean not being able to load product images during, ex. A Christmas sale — resulting in significant financial penalties).

In essence, while they may seem the same, taking both data availability and data durability into account will ensure that no unexpected surprises occur.

Glossary

Service Availability

A service's overall availability and guaranteed uptime.

HDD

Hard Disk Drive. A type of non-volatile data storage device that reads and writes data on physical discs.

SSD

Solid State Drive. A type of non-volatile data storage device that reads and writes data electronically.