Distributed Systems

A distributed system runs on multiple machines that communicate over a network. This section covers the challenges of distributed computing, the CAP theorem, consistency models, consensus algorithms, and distributed data patterns.

Reasons for Distribution

Reason	Description
Scale	Multiple machines provide more capacity than a single machine
Reliability	Multiple machines provide redundancy; one failure does not cause total failure
Latency	Servers in multiple regions reduce network latency for geographically distributed users
Cost	Many commodity machines can be more cost-effective than one high-end machine

Challenges

Network Unreliability

Networks experience message loss, duplication, reordering, and variable latency. Network partitions can isolate portions of the system.

Clock Unreliability

Machine clocks drift and are not perfectly synchronized. NTP provides approximate synchronization but not perfect agreement. Determining the order of events across machines based on timestamps is unreliable.

Partial Failures

In distributed systems, some components can fail while others continue operating. Distinguishing between a slow component and a failed component is difficult.

CAP Theorem

The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously:

Property	Description
Consistency	Every read returns the most recent write
Availability	Every request receives a response
Partition Tolerance	The system continues operating during network partitions

Network partitions are unavoidable in distributed systems. During a partition, the system must choose between:

CP (Consistency over Availability): Reject requests rather than return potentially stale data
AP (Availability over Consistency): Return available data (potentially stale) rather than reject requests

PACELC Extension

PACELC extends CAP to address normal operation (no partition):

If Partition: Choose Availability or Consistency
Else (normal operation): Choose Latency or Consistency

Strong consistency requires waiting for replicas to confirm, increasing latency. Lower latency requires accepting potentially stale reads.

Consistency Models

Strong Consistency

Every read returns the most recent write, as if the system were a single machine.

Aspect	Description
Implementation	Single leader handles all writes, synchronously replicates before confirming
Trade-off	Slower writes, reduced availability during failures

Eventual Consistency

If writes stop, all replicas eventually converge to the same state. Immediately after a write, different replicas may return different values.

Aspect	Description
Implementation	Asynchronous replication; writes confirm before replicas synchronize
Trade-off	Stale reads are possible

Read-Your-Writes

A client always sees its own writes, though it may not see other clients' recent writes.

Aspect	Description
Implementation	Route reads to the server that handled the client's write, or track versions
Trade-off	Additional routing or tracking overhead

Causal Consistency

If event A caused event B, all replicas see A before B. Causally unrelated events can appear in any order.

Aspect	Description
Implementation	Vector clocks to track causal dependencies
Trade-off	Complexity of tracking causality

Consensus

Consensus algorithms enable multiple machines to agree on a value despite message loss and machine failures.

Raft Algorithm

Raft is a consensus algorithm designed for understandability.

Phase	Description
Leader Election	Nodes vote to elect one leader
Log Replication	Leader accepts writes and replicates to followers
Safety	Only one leader at a time is guaranteed

Paxos

Paxos is an older consensus algorithm with the same goals as Raft but greater complexity.

Consensus Applications

Application	Description
Leader election	Selecting which node coordinates operations
Distributed locks	ZooKeeper, etcd
Configuration management	Agreement on current system configuration
Replicated databases	Determining the order of writes

Distributed Transactions

Distributed transactions coordinate updates across multiple services atomically.

Two-Phase Commit (2PC)

Phase	Description
Prepare	Coordinator asks all participants if they are ready to commit
Commit	If all participants agree, coordinator instructs them to commit; otherwise, abort

Limitation: If the coordinator fails after the prepare phase, participants are blocked waiting for a decision.

Saga Pattern

Sagas break a transaction into a sequence of local transactions. Each step has a compensating action that undoes it if a later step fails.

Loading diagram...

Style	Description
Choreography	Services coordinate via events (decentralized)
Orchestration	Central service directs the transaction flow

Sagas provide better availability than 2PC but have more complex failure handling.

Time and Ordering

Logical Clocks

Logical clocks order events without relying on physical time.

Lamport Timestamps: Each event receives a counter value. On sending a message, include the timestamp. On receiving, set local timestamp to max(local, received) + 1.

Vector Clocks: Each node tracks its own counter plus its knowledge of other nodes' counters. This enables detecting concurrent events (neither caused the other).

Physical Clocks

Physical clocks are necessary for some applications (audit logs, time-based expiration).

Challenge	Description
Clock drift	Server clocks diverge from true time
NTP limitations	NTP provides approximate, not perfect, synchronization
Leap seconds	Periodic adjustments cause edge cases

TrueTime (Google Spanner): GPS receivers on servers provide time with bounded uncertainty (time is between X and Y).

Distributed Data Patterns

Replication

Pattern	Description	Trade-off
Single Leader	One server handles writes; replicas handle reads	Simple but write bottleneck
Multi-Leader	Multiple servers accept writes	Requires conflict resolution
Leaderless	Any node handles any request; uses quorums	W + R > N ensures consistency

Partitioning (Sharding)

Strategy	Description	Trade-off
Range partitioning	Partition by value ranges	Hot spots if distribution uneven
Hash partitioning	Hash key, mod by partition count	Even distribution, no range queries
Consistent hashing	Keys and nodes on a ring	Adding/removing nodes affects only neighbors

Design Considerations

Topic	Considerations
Trade-offs	Consistency vs availability vs latency; articulate which to prioritize and why
Use case matching	Strong consistency for financial systems; eventual consistency acceptable for social feeds
CAP in practice	During a network partition, does the system reject requests or return potentially stale data?
Failure modes	What happens when specific components fail? When the network partitions?
Operational complexity	Distributed systems are operationally complex; acknowledge this in designs

Reasons for Distribution​

Challenges​

Network Unreliability​

Clock Unreliability​

Partial Failures​

CAP Theorem​

PACELC Extension​

Consistency Models​

Strong Consistency​

Eventual Consistency​

Read-Your-Writes​

Causal Consistency​

Consensus​

Raft Algorithm​

Paxos​

Consensus Applications​

Distributed Transactions​

Two-Phase Commit (2PC)​

Saga Pattern​

Time and Ordering​

Logical Clocks​

Physical Clocks​

Distributed Data Patterns​

Replication​

Partitioning (Sharding)​

Design Considerations​

Table of Contents