Replication Patterns

Replication maintains copies of data on multiple machines to provide fault tolerance, improved read performance, and lower latency for geographically distributed users.

Benefits

Goal	How Replication Helps
High availability	One node fails, others continue serving
Lower latency	Serve reads from a nearby replica
Read scalability	Spread read load across many nodes
Disaster recovery	Survive entire datacenter failures

Three Approaches

Leader-Follower (Primary-Replica)

One node (the leader) accepts all writes. Followers copy from the leader and serve reads.

Loading diagram...

PostgreSQL, MySQL, MongoDB, and Redis use this model.

Synchronous vs Asynchronous:

Type	Mechanism	Trade-off
Synchronous	Leader waits for follower ACK before confirming write	Durability, but slower
Asynchronous	Leader confirms immediately, replicates in background	Fast, but potential data loss
Semi-synchronous	Wait for one follower, others async	Balance

Most systems use asynchronous or semi-synchronous replication. Fully synchronous replication is limited by the slowest replica.

Leader Failure Handling:

Failover process:

Detect the leader is down (timeout-based)
Select a new leader (typically the most up-to-date follower)
Redirect clients to the new leader
Old leader rejoins as follower when it recovers

Potential issues:

Data loss: Asynchronous replication may have accepted writes that never reached any follower
Split brain: Two nodes both assume leadership
Duplicate writes: Clients retry during the transition

Multi-Leader (Active-Active)

Multiple nodes accept writes and sync with each other.

Loading diagram...

Use cases:

Multi-datacenter deployments (one leader per DC)
Offline-capable applications (device is a leader when disconnected)
Collaborative editing

Conflict resolution:

When two leaders accept conflicting writes simultaneously:

Strategy	Mechanism	Downside
Last write wins (LWW)	Timestamp determines winner	Can lose data silently
Merge values	Combine both changes	Only works for some data types
Keep all versions	Application decides	Complexity pushed to application
Custom logic	Business rules	Requires implementation and maintenance

LWW is common due to simplicity, but it can lose data. If Alice updates at T=1 and Bob updates at T=2, Bob wins even if Alice's update arrived first at most replicas.

Leaderless (Dynamo-style)

No designated leader. Any node can accept writes. Clients write to and read from multiple nodes.

Loading diagram...

Cassandra, DynamoDB, and Riak use this model.

Quorums:

In a quorum-based system, n represents the total number of replicas, w represents the number of nodes that must acknowledge a write, and r represents the number of nodes queried for a read. For consistency, the sum of w and r must be greater than n.

If writes go to a majority and reads come from a majority, at least one node queried will have the latest value.

Configuration	Trade-off
w=n, r=1	Fast reads, slow writes
w=1, r=n	Fast writes, slow reads
w=n/2+1, r=n/2+1	Balanced

Handling stale replicas:

Read repair: When nodes disagree during a read, update the stale ones
Anti-entropy: Background process that compares and syncs replicas

Replication Lag

With asynchronous replication, followers are always behind the leader. Under load, this lag can extend from milliseconds to seconds or minutes.

Read-Your-Writes Problem

A user writes data, refreshes the page, and the data is not visible. The write went to the leader, but the read came from a stale follower.

Solutions:

Read from leader for user's own data (within a time window)
Track the last write timestamp and ensure reads see at least that version

Monotonic Reads Problem

A user sees data, refreshes, and it disappears. Then refreshes again and it reappears. Different replicas have different lag.

Solution: Route users to the same replica (session affinity).

Consistent Prefix Problem

A question and answer appear out of order because they were replicated out of order.

Solution: Keep causally related data on the same partition.

Consistency Models

From strongest to weakest:

Model	Guarantee
Linearizable	Operations appear atomic and in real-time order
Sequential	Operations appear in some total order
Causal	Causally related operations are ordered correctly
Eventual	Replicas eventually converge

Linearizability

The strongest guarantee. The system behaves as if there is only one copy of the data.

Required for:

Leader election (cannot have two leaders)
Unique constraints (no duplicate usernames)
Distributed locks

Cost: Cannot survive network partitions while maintaining linearizability (CAP theorem).

Eventual Consistency

The weakest useful guarantee. Given enough time without writes, all replicas converge to the same value.

"Eventually" can range from milliseconds to hours.

Appropriate for:

Social media feeds
View counts and likes
Analytics data

Not appropriate for:

Bank account balances
Inventory counts
Data where correctness is more important than availability

Choosing a Replication Strategy

Requirement	Recommended Approach
Simple read scaling	Leader-follower, async
High write availability	Multi-leader or leaderless
Strong consistency	Leader-follower, sync (or single leader)
Multi-datacenter	Multi-leader with one leader per DC
Partition tolerance	Leaderless with quorums

Production Examples

System	Model
PostgreSQL	Leader-follower (sync or async)
MySQL	Leader-follower
MongoDB	Leader-follower (replica sets)
Cassandra	Leaderless
DynamoDB	Leaderless (single region), multi-leader (global tables)
CockroachDB	Leader-follower with Raft consensus

Summary

Synchronous replication is durable but slow. Asynchronous is fast but can lose data on leader failure.
Multi-leader requires conflict resolution. LWW is common but can lose data.
Leaderless requires quorums. w + r > n for strong consistency.
Replication lag causes consistency anomalies. Read-your-writes, monotonic reads, consistent prefix.
Linearizability is expensive. Use only where correctness requires it.
Monitor replication lag. Without visibility into replica state, consistency reasoning is impossible.

Benefits​

Three Approaches​

Leader-Follower (Primary-Replica)​

Multi-Leader (Active-Active)​

Leaderless (Dynamo-style)​

Replication Lag​

Read-Your-Writes Problem​

Monotonic Reads Problem​

Consistent Prefix Problem​

Consistency Models​

Linearizability​

Eventual Consistency​

Choosing a Replication Strategy​

Production Examples​

Summary​

Table of Contents

Benefits

Three Approaches

Leader-Follower (Primary-Replica)

Multi-Leader (Active-Active)

Leaderless (Dynamo-style)

Replication Lag

Read-Your-Writes Problem

Monotonic Reads Problem

Consistent Prefix Problem

Consistency Models

Linearizability

Eventual Consistency

Choosing a Replication Strategy

Production Examples

Summary