Skip to main content

Database Scaling

Database scaling addresses the bottleneck that occurs when a single database cannot handle application load.

Scaling Strategies Overview

Loading diagram...
StrategyScaling MechanismAdvantagesDisadvantages
VerticalLarger hardwareSimple, no code changesHardware limits, expensive
ReplicationIncreased read capacityRead scalability, high availabilityWrite bottleneck remains
ShardingData distributionNear-linear scalingComplex queries, joins
FederationFunctional splitIndependent scalingCross-function queries difficult

Replication

Master-Slave (Primary-Replica)

One master handles writes; replicas handle reads.

Loading diagram...

Operation:

  1. All writes go to master
  2. Master logs changes to binary log
  3. Replicas pull and apply changes
  4. Reads distributed across replicas

Advantages:

  • Read scalability (add more replicas)
  • High availability (promote replica if master fails)
  • Backups without impacting master

Disadvantages:

  • Write bottleneck (single master)
  • Replication lag (eventual consistency)
  • Failover complexity

Master-Master (Multi-Primary)

Multiple nodes accept writes.

Loading diagram...

Advantages:

  • Write scalability
  • Geographic distribution
  • No single point of failure

Disadvantages:

  • Conflict resolution required
  • Increased latency for consistency
  • Complex management

Replication Modes

ModeConsistencyLatencyData Safety
SynchronousStrongHigherNo data loss
AsynchronousEventualLowerPossible data loss
Semi-synchronousStrong for 1 replicaMediumBalanced

Sharding (Horizontal Partitioning)

Sharding splits data across multiple databases based on a shard key.

Loading diagram...

Sharding Strategies

Range-Based Sharding

Loading diagram...

Advantages: Efficient range queries, simple implementation Disadvantages: Hot spots with skewed data, uneven distribution

Hash-Based Sharding

Loading diagram...

Advantages: Even distribution, no hot spots Disadvantages: Range queries span all shards, resharding is expensive

Directory-Based Sharding

Loading diagram...

Advantages: Flexible mapping, easy resharding Disadvantages: Lookup service is single point of failure, extra latency

Sharding Comparison

StrategyDistributionRange QueriesReshardingComplexity
RangeCan be unevenEfficientMediumLow
HashEvenExpensiveHardMedium
DirectoryFlexibleVia lookupEasyHigh

Shard Key Selection

Good Shard Key PropertiesBad Shard Key Properties
High cardinality (many values)Low cardinality (few values)
Even distributionSkewed distribution
Frequently used in queriesRarely queried
ImmutableChanges frequently

Examples:

  • Suitable: user_id, order_id, tenant_id
  • Unsuitable: country (skewed), created_date (hot spot on current date)

Sharding Challenges

Cross-Shard Queries

Loading diagram...

Problem: Query must hit all shards and merge results Solution: Denormalize data to keep related data on same shard

Cross-Shard Joins

Loading diagram...

Solutions:

  • Denormalize (store user info in orders)
  • Application-level joins
  • Use same shard key for related data

Resharding

ApproachDowntimeComplexity
Stop writes, migrateYesLow
Double-write during migrationNoHigh
Consistent hashingMinimalMedium

Federation (Functional Partitioning)

Federation splits databases by function or domain.

Loading diagram...

Advantages:

  • Independent scaling per function
  • Smaller databases, faster queries
  • Team autonomy

Disadvantages:

  • Cross-function joins impossible
  • More databases to manage
  • Application complexity

Denormalization

Denormalization trades storage for read performance by duplicating data.

Normal vs Denormalized

Loading diagram...
AspectNormalizedDenormalized
StorageMinimalDuplicated
Write speedFasterSlower (update copies)
Read speedSlower (joins)Faster (single query)
ConsistencyEasyMust sync duplicates

Appropriate use cases:

  • Read-heavy workloads (10:1+ read:write ratio)
  • Expensive joins required frequently
  • Data changes infrequently

SQL vs NoSQL

AspectSQL (Relational)NoSQL
SchemaFixed, structuredFlexible, dynamic
ScalingVertical (primarily)Horizontal (designed for it)
ConsistencyACIDBASE (typically)
JoinsNative supportApplication-level
Use casesComplex queries, transactionsSimple queries, high scale

NoSQL Types

TypeData ModelExamplesUse Cases
Key-ValueSimple key to valueRedis, DynamoDBCaching, sessions
DocumentJSON documentsMongoDB, CouchbaseContent, catalogs
Wide ColumnColumn familiesCassandra, HBaseTime series, IoT
GraphNodes and edgesNeo4j, NeptuneSocial networks, recommendations

Scaling Decision Tree

Loading diagram...

Production Examples

CompanyStrategyScale
FacebookMySQL + sharding by user_idBillions of users
InstagramPostgreSQL + sharding2B+ monthly users
UberMySQL to Schemaless (custom)Millions of trips/day
NetflixCassandra (NoSQL)200M+ subscribers
PinterestMySQL sharding + HBaseBillions of pins

Performance Reference

ConfigurationApproximate Throughput
Single MySQL~10K queries/second
With replicasMultiply read capacity
ShardedNear-linear scaling