Helpful context:


On March 2, 2021, a fire broke out at an OVHcloud data center in Strasbourg, France. The building that burned housed SBG2, one of four data centers on the site. Within hours, millions of websites were offline - not because they lacked servers, but because all their servers were in SBG2. The customers who had been warned, who had set up backups, who had configured replication to a second site, recovered in hours. The ones who had not lost data they would never recover.

The OVHcloud fire is not the only story like this. Flooding, power grid failures, fiber cable cuts, software bugs that cascade through an entire region’s infrastructure - every few years, something takes down a geographic area. The question is not whether it will happen to your systems. The question is how much of your data and traffic can survive when it does.

This is what multi-region architecture is built to answer.


The Vocabulary of Geography

Before building across regions, the terms need to be precise, because cloud providers use them consistently and conflating them leads to incorrect assumptions about what survives what.

A zone (or availability zone) is an isolated failure domain within a data center campus. It has its own power supplies, cooling systems, and network connections. A failure in one zone - a power trip, a cooling malfunction, a software bug that corrupts infrastructure management - does not propagate to other zones. Within a single GCP region, there are typically three zones (us-central1-a, us-central1-b, us-central1-c). They are physically close enough that network latency between them is under 1ms, but independent enough that simultaneous failure is very unlikely.

A region is a geographic area - a city or cluster of cities - that contains multiple zones. GCP’s us-central1 is in Iowa. europe-west1 is in Belgium. asia-east1 is in Taiwan. Regions are far enough apart that a single physical event (earthquake, regional power grid failure, hurricane) cannot simultaneously affect two regions. Network latency between regions is in the tens to hundreds of milliseconds depending on geography.

A dual-region is a pairing of two nearby regions, offered by some cloud providers as a managed abstraction with synchronous or near-synchronous replication between them. GCP’s Cloud Storage dual-region, for example, stores data redundantly across two specific regions (e.g., us-central1 and us-east1) with strong consistency guarantees.

A multi-region deployment spans three or more regions, typically across continents. This is the configuration for globally distributed services that must tolerate regional failures anywhere in the world.


What You Are Actually Protecting Against

The choice of how many regions to use comes down to what failure scenarios you are designing for.

Single zone deployment protects against individual machine failures (the cloud handles this transparently). It does not protect against zone failures.

Multi-zone, single region deployment (the minimum for any production system) protects against zone failures. A power outage in us-central1-a does not take down your service if your replicas are spread across -a, -b, and -c. Network latency between zones is low enough that synchronous replication is practical. This adds minimal complexity and is the baseline for production workloads.

Multi-region deployment protects against regional disasters. It introduces latency challenges (tens of milliseconds between regions), data consistency challenges (do you replicate synchronously, accepting latency, or asynchronously, accepting potential data loss?), and cost (cross-region data transfer is billed separately). The decision is not “multi-region is better” - it is “does my business require surviving a regional outage, and what am I willing to pay in complexity and cost to achieve that?”


RPO and RTO: The Two Questions That Define Your Architecture

Before choosing an architecture, you need answers to two questions.

Recovery Point Objective (RPO): How much data can you afford to lose? If your database fails and you restore from the last backup, you lose everything since that backup. An RPO of zero means you can lose no data - every write must be committed to a durable location before you acknowledge it to the user. An RPO of one hour means you can tolerate losing the last hour of writes.

Recovery Time Objective (RTO): How long can your service be down before the business consequences become unacceptable? An RTO of five minutes means you must be able to detect the failure, trigger failover, and resume serving traffic within five minutes.

RPO drives replication strategy. RTO drives failover automation and health checking.

An RPO of zero forces synchronous replication: every write must be committed in two regions before the user receives an acknowledgment. This works but adds latency to every write equal to the round-trip time between regions. For us-central1 to us-east1, that is roughly 30ms - acceptable for some workloads, fatal for others (high-frequency trading, real-time gaming).

A relaxed RPO (seconds to minutes) allows asynchronous replication: writes commit locally, then stream to the secondary region. Writes are cheaper and faster. If the primary region fails, you lose the last few seconds of writes - the replication lag at the moment of failure.


Cross-Region Replication: Synchronous vs Asynchronous

Synchronous replication gives you RPO = 0 but costs latency on every write. It is practical for dual-region configurations where regions are geographically close. Google Cloud Spanner offers globally synchronous replication across multiple regions using the TrueTime API (GPS and atomic clocks) to order transactions without a central coordinator. It achieves this at the cost of somewhat higher write latency than a single-region database.

Asynchronous replication accepts some data loss in exchange for write performance. The primary region acknowledges the write immediately; the secondary region receives it moments later. This is how most database replication works by default. The replication lag - the gap between primary and secondary - is the RPO window. In steady state it may be milliseconds. Under load or during network congestion it may grow.


Active-Active vs Active-Passive

Active-passive: one region serves all traffic. The other sits warm, replicating data but not serving requests. On failure of the primary, a failover procedure promotes the passive region to primary. Traffic is redirected by DNS or global load balancer. This is simpler to reason about (no conflict resolution, no split-brain risk) but wastes the passive region’s compute capacity during normal operation.

Active-active: all regions serve traffic simultaneously. A user in Europe hits the European region; a user in the US hits the US region. Writes may happen in any region and must be replicated to others. This is more efficient (capacity is never idle) and more complex (what happens if two regions accept conflicting writes to the same record? who wins?). Active-active requires either a globally consistent database (Spanner) or careful domain partitioning (European users only write to the European database, never to the US one).


Global Load Balancing

Routing users to the right region requires load balancing at a global level - above the level of any single region’s infrastructure.

GCP’s Global Load Balancer uses Anycast: a single IP address is advertised from multiple PoPs (Points of Presence) worldwide. When a user makes a DNS request for your service, they receive that same IP. Their traffic is routed by the internet’s routing protocols to the nearest PoP where it enters Google’s private network backbone and is forwarded to the nearest healthy region’s backend.

This means users naturally hit the geographically closest region (lowest latency), and if a region goes unhealthy, the load balancer stops routing traffic to it within seconds - no DNS TTL propagation delay, no manual intervention.


Data Residency and Compliance

Multi-region architecture is not purely a reliability decision. For many businesses, it is a compliance requirement. GDPR requires that personal data of EU residents be stored within the EU (or in countries with equivalent protections). Healthcare data in the US is subject to HIPAA. Financial data in various jurisdictions is subject to local regulations.

This creates constraints on where data can live. A truly global database that replicates everywhere may inadvertently store EU user data in a US data center. Cloud providers offer data residency controls - policies that restrict which regions data can be written to and replicated in, with audit logging to prove compliance.

The tension: global replication improves reliability and reduces latency for global users. Data residency requirements restrict where data can go. The engineering problem is designing a system that satisfies both.


The Cost of Going Multi-Region

Multi-region is not free. Cross-region data transfer (egress) is billed per gigabyte in most cloud providers. A database that replicates 100 GB/day across regions pays for that transfer. At GCP’s pricing, cross-region replication within the US runs around $0.01-$0.08 per GB; cross-continent egress is higher. For data-intensive workloads, this cost can be substantial.

Beyond egress, multi-region adds operational complexity: more infrastructure to provision, more failure modes to test, more runbooks to write. The added reliability must be worth the added complexity and cost. For a startup, multi-zone is almost always sufficient. For a service with contractual SLAs measured in nines, multi-region is often the only way to meet them.


Summary

Configuration Survives Latency impact Complexity
Single zone Machine failure None Minimal
Multi-zone, single region Zone failure <1ms between zones Low
Dual-region One region failure ~30ms sync or near-zero async Medium
Multi-region active-passive Regional disaster Write latency + failover time High
Multi-region active-active Regional disaster Minimal for reads, complex writes Very high

The architecture you need is the one that meets your RPO and RTO requirements at acceptable cost. Starting with multi-zone is the right default. Adding regions is a deliberate decision, made when the business consequences of regional failure justify the engineering investment.


Read next: