Skip to main content

Distributed Systems

A distributed system runs on multiple machines that communicate over a network. This section covers the challenges of distributed computing, the CAP theorem, consistency models, consensus algorithms, and distributed data patterns.

Reasons for Distribution

ReasonDescription
ScaleMultiple machines provide more capacity than a single machine
ReliabilityMultiple machines provide redundancy; one failure does not cause total failure
LatencyServers in multiple regions reduce network latency for geographically distributed users
CostMany commodity machines can be more cost-effective than one high-end machine

Challenges

Network Unreliability

Networks experience message loss, duplication, reordering, and variable latency. Network partitions can isolate portions of the system.

Clock Unreliability

Machine clocks drift and are not perfectly synchronized. NTP provides approximate synchronization but not perfect agreement. Determining the order of events across machines based on timestamps is unreliable.

Partial Failures

In distributed systems, some components can fail while others continue operating. Distinguishing between a slow component and a failed component is difficult.

CAP Theorem

The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously:

PropertyDescription
ConsistencyEvery read returns the most recent write
AvailabilityEvery request receives a response
Partition ToleranceThe system continues operating during network partitions

Network partitions are unavoidable in distributed systems. During a partition, the system must choose between:

  • CP (Consistency over Availability): Reject requests rather than return potentially stale data
  • AP (Availability over Consistency): Return available data (potentially stale) rather than reject requests

PACELC Extension

PACELC extends CAP to address normal operation (no partition):

  • If Partition: Choose Availability or Consistency
  • Else (normal operation): Choose Latency or Consistency

Strong consistency requires waiting for replicas to confirm, increasing latency. Lower latency requires accepting potentially stale reads.

Consistency Models

Strong Consistency

Every read returns the most recent write, as if the system were a single machine.

AspectDescription
ImplementationSingle leader handles all writes, synchronously replicates before confirming
Trade-offSlower writes, reduced availability during failures

Eventual Consistency

If writes stop, all replicas eventually converge to the same state. Immediately after a write, different replicas may return different values.

AspectDescription
ImplementationAsynchronous replication; writes confirm before replicas synchronize
Trade-offStale reads are possible

Read-Your-Writes

A client always sees its own writes, though it may not see other clients' recent writes.

AspectDescription
ImplementationRoute reads to the server that handled the client's write, or track versions
Trade-offAdditional routing or tracking overhead

Causal Consistency

If event A caused event B, all replicas see A before B. Causally unrelated events can appear in any order.

AspectDescription
ImplementationVector clocks to track causal dependencies
Trade-offComplexity of tracking causality

Consensus

Consensus algorithms enable multiple machines to agree on a value despite message loss and machine failures.

Raft Algorithm

Raft is a consensus algorithm designed for understandability.

PhaseDescription
Leader ElectionNodes vote to elect one leader
Log ReplicationLeader accepts writes and replicates to followers
SafetyOnly one leader at a time is guaranteed

Paxos

Paxos is an older consensus algorithm with the same goals as Raft but greater complexity.

Consensus Applications

ApplicationDescription
Leader electionSelecting which node coordinates operations
Distributed locksZooKeeper, etcd
Configuration managementAgreement on current system configuration
Replicated databasesDetermining the order of writes

Distributed Transactions

Distributed transactions coordinate updates across multiple services atomically.

Two-Phase Commit (2PC)

PhaseDescription
PrepareCoordinator asks all participants if they are ready to commit
CommitIf all participants agree, coordinator instructs them to commit; otherwise, abort

Limitation: If the coordinator fails after the prepare phase, participants are blocked waiting for a decision.

Saga Pattern

Sagas break a transaction into a sequence of local transactions. Each step has a compensating action that undoes it if a later step fails.

Loading diagram...
StyleDescription
ChoreographyServices coordinate via events (decentralized)
OrchestrationCentral service directs the transaction flow

Sagas provide better availability than 2PC but have more complex failure handling.

Time and Ordering

Logical Clocks

Logical clocks order events without relying on physical time.

Lamport Timestamps: Each event receives a counter value. On sending a message, include the timestamp. On receiving, set local timestamp to max(local, received) + 1.

Vector Clocks: Each node tracks its own counter plus its knowledge of other nodes' counters. This enables detecting concurrent events (neither caused the other).

Physical Clocks

Physical clocks are necessary for some applications (audit logs, time-based expiration).

ChallengeDescription
Clock driftServer clocks diverge from true time
NTP limitationsNTP provides approximate, not perfect, synchronization
Leap secondsPeriodic adjustments cause edge cases

TrueTime (Google Spanner): GPS receivers on servers provide time with bounded uncertainty (time is between X and Y).

Distributed Data Patterns

Replication

PatternDescriptionTrade-off
Single LeaderOne server handles writes; replicas handle readsSimple but write bottleneck
Multi-LeaderMultiple servers accept writesRequires conflict resolution
LeaderlessAny node handles any request; uses quorumsW + R > N ensures consistency

Partitioning (Sharding)

StrategyDescriptionTrade-off
Range partitioningPartition by value rangesHot spots if distribution uneven
Hash partitioningHash key, mod by partition countEven distribution, no range queries
Consistent hashingKeys and nodes on a ringAdding/removing nodes affects only neighbors

Design Considerations

TopicConsiderations
Trade-offsConsistency vs availability vs latency; articulate which to prioritize and why
Use case matchingStrong consistency for financial systems; eventual consistency acceptable for social feeds
CAP in practiceDuring a network partition, does the system reject requests or return potentially stale data?
Failure modesWhat happens when specific components fail? When the network partitions?
Operational complexityDistributed systems are operationally complex; acknowledge this in designs