Softplorer Logo

VPS Guide

Redundancy and Backups in VPS

Backups and redundancy solve different problems — backups recover from data loss, redundancy prevents downtime — and conflating them produces infrastructure that fails at both.

Overview

A server fails. The team goes to restore from the automated backup. The last successful backup is from eleven days ago — the backup job has been failing silently since a disk quota was hit, and nobody checked. The restore completes. Eleven days of data are gone. The backup system was running. The backup system was not working. These are different things, and the difference only surfaces when the restore is needed.

How to think about it

Redundancy keeps the service running when a component fails. A load balancer distributing traffic across two servers means that when one server fails, the other continues serving requests. The service stays up. No data is lost. No restore is required. Redundancy addresses availability.

Backups allow recovery after data is lost or corrupted. A database that is accidentally dropped, a server that is compromised and wiped, a deployment that destroys data — backups are the mechanism for recovering what was lost. Backups don't keep the service running during the failure. They enable restoration after it. Backups address recoverability.

A service with redundancy but no backups stays up when hardware fails and loses everything when data is corrupted. A service with backups but no redundancy can recover from most failures but goes down for the duration of every restore. Most production services benefit from both — they solve different failure modes and neither substitutes for the other.

How it works

Basic backup on VPS: automated snapshots of the server or automated database exports to a location separate from the primary disk. The key word is separate — a backup on the same physical disk as the data it's backing up provides no protection against disk failure. Backups need to be off-server: a different cloud storage bucket, a different provider's object storage, a remote server. Frequency determines the recovery point objective — the maximum amount of data loss acceptable in the worst-case scenario. Daily backups mean up to 24 hours of potential data loss.

Backup verification is not optional. A backup that has never been restored is not a backup — it is an untested file. Running a test restore on a regular schedule, confirming the data is intact and usable, is the only way to know the backup system is functioning. This is tedious. The alternative is discovering the backup doesn't work during an actual recovery.

Basic redundancy on VPS: two instances behind a load balancer in the same datacenter eliminates single-server failure as a cause of downtime. Two instances across different availability zones or datacenters eliminates datacenter-level failures. Each step up in redundancy reduces the probability of downtime at higher infrastructure cost and operational complexity. The appropriate level depends on what the downtime cost actually is — not what it might theoretically be.

Where it breaks

Backups fail because nobody tested them, because they're stored on the same disk as the source, because the backup job fails silently and nobody notices, or because the retention policy deletes old backups before anyone realizes the recent ones are corrupted. Every one of these failures is preventable with monitoring and periodic testing. Every one of them has caused production data loss.

Redundancy fails because the application wasn't designed to run on multiple instances — local session state, filesystem writes that don't sync, database connections that don't reconnect after failover. Adding a load balancer in front of a stateful application doesn't produce redundancy. It produces a system that fails in a more confusing way.

In context

Backups are cheap relative to the value of the data they protect. A provider-managed daily snapshot plus weekly exports to object storage adds a small fraction to the monthly bill. The cost of not having them, in the event of data loss, is the full value of what was lost — which for production applications is almost always more than any infrastructure cost. This is the clearest ROI calculation in infrastructure.

Redundancy costs more and requires more architectural thought. Doubling the instance count roughly doubles the compute cost. The load balancer adds more. Database replication or shared database infrastructure adds more still. For applications where downtime costs money per minute, this investment has clear return. For applications where occasional downtime is tolerable, the architectural complexity of redundancy may produce more failure modes than it prevents.

Managed backups from the provider trade control for convenience — the backup happens automatically, the retention is configured once, restores are handled through the control panel. What you give up is visibility into whether backups are succeeding. Provider-managed backup systems fail silently too, just less often. Verifying backup integrity still requires periodic testing regardless of who runs the backup system.

From understanding to decision

The useful exercise before designing backup and redundancy is to write down the three failure scenarios most likely to affect the service, and then check whether the current infrastructure survives each one. Server hardware failure, accidental data deletion, bad deployment that corrupts data — each requires a different protection. Designing for the actual failure modes produces more useful infrastructure than designing for what sounds architecturally impressive.

If availability and recoverability are core requirements, not optionalIf multi-instance architectures and load balancing are being evaluatedIf the architecture needs to support redundancy from the application level up

Where to go next

Hetzner
Hetzner
Cost-conscious developers and teams building European-primary infrastructure
DigitalOcean
DigitalOcean
Dev teams and startups that need composable cloud infrastructure without dedicated DevOps
Vultr
Vultr
Developer teams needing global infrastructure reach with a consistent API across 32+ locations