Skip to main content

Storage Engines

Storage engines determine how databases write data to disk and retrieve it. Different engines make different trade-offs between read performance, write throughput, and space efficiency.

Primary Storage Engine Types

Most databases use one of two approaches: B-trees or LSM-trees.

B-Trees

B-trees have been the standard since 1970 and power most relational databases. Data is organized in a tree of fixed-size pages (typically 4KB).

Loading diagram...

Write operation:

  1. Find the correct leaf page
  2. Update it in place
  3. If the page is full, split it

Read operation:

  1. Start at root
  2. Follow pointers down the tree
  3. Completes in O(log n) page reads

Characteristics:

  • Predictable read performance
  • Suitable for read-heavy workloads
  • 50+ years of production use

Limitation: Every write touches the disk at a random location. If a record is updated 10 times, the entire page is rewritten 10 times.

Used by: PostgreSQL, MySQL/InnoDB, SQL Server, Oracle

LSM-Trees

LSM-trees (Log-Structured Merge-trees) batch writes in memory, then flush sorted runs to disk.

Write -> Memtable (RAM) -> SSTable (Disk) -> Compacted SSTables
|
WAL (for durability)

Write operation:

  1. Append to write-ahead log (durability)
  2. Insert into in-memory buffer (memtable)
  3. When buffer fills, flush to disk as sorted file (SSTable)
  4. Background process merges SSTables

Read operation:

  1. Check memtable first
  2. Check SSTables from newest to oldest
  3. Use bloom filters to skip files that do not contain the key

Characteristics:

  • Writes are sequential (faster)
  • Suitable for write-heavy workloads
  • Better compression (sorted data compresses well)

Limitation: Reads may check multiple files. Compaction uses disk bandwidth.

Used by: Cassandra, RocksDB, LevelDB, HBase

Selection Criteria

WorkloadRecommendedRationale
Read-heavy OLTPB-treeConsistent read latency
Write-heavy loggingLSM-treeSequential writes
Mixed workloadDependsTest both
AnalyticsColumnarDifferent access pattern

General guidance:

  • Traditional web applications with more reads than writes: B-tree (PostgreSQL, MySQL)
  • Event ingestion, logs, time-series data: LSM-tree (Cassandra, RocksDB)

Indexes

Indexes are separate data structures that enable row retrieval without full table scans.

Primary index: Determines how the actual data is stored. In a clustered index, rows are physically sorted by the primary key.

Secondary index: A separate structure pointing to the primary data. Multiple secondary indexes are possible, but each one increases write overhead.

Covering index: Includes extra columns to answer queries from the index alone, without accessing the main table.

Trade-off: Every index speeds up reads but slows down writes.

Column-Oriented Storage

For analytics, row-based storage (B-tree or LSM) is suboptimal.

Row-oriented storage:

Row 1: [user_id, name, email, created_at, last_login, ...]
Row 2: [user_id, name, email, created_at, last_login, ...]

Column-oriented storage:

user_ids:    [1, 2, 3, 4, 5, ...]
names: [alice, bob, carol, ...]
emails: [a@x.com, b@x.com, ...]

For a query like SELECT AVG(age) FROM users, a row store reads every field for every user. A column store reads only the age column. For tables with 100 columns and billions of rows, this difference is significant.

Used by: Redshift, BigQuery, Snowflake, ClickHouse

In-Memory Databases

Keeping data in RAM provides significant speed improvements (RAM is approximately 100,000x faster than SSD for random access). In-memory databases still write to disk for durability.

  • Redis: Periodic snapshots + append-only file (AOF)
  • VoltDB: Replication + snapshots
  • Memcached: No durability (pure cache)

Limitation: Storage capacity is limited to available memory.

Summary

Engine TypeRead PerformanceWrite PerformanceUse Case
B-treesConsistentRandom writesTraditional OLTP
LSM-treesVariableSequential writesWrite-heavy workloads
Column storesAnalytics-optimizedNot for transactionsData warehousing
In-memoryFastestFastestCache, real-time