Storage Engines

Storage engines determine how databases write data to disk and retrieve it. Different engines make different trade-offs between read performance, write throughput, and space efficiency.

Primary Storage Engine Types

Most databases use one of two approaches: B-trees or LSM-trees.

B-Trees

B-trees have been the standard since 1970 and power most relational databases. Data is organized in a tree of fixed-size pages (typically 4KB).

Loading diagram...

Write operation:

Find the correct leaf page
Update it in place
If the page is full, split it

Read operation:

Start at root
Follow pointers down the tree
Completes in O(log n) page reads

Characteristics:

Predictable read performance
Suitable for read-heavy workloads
50+ years of production use

Limitation: Every write touches the disk at a random location. If a record is updated 10 times, the entire page is rewritten 10 times.

Used by: PostgreSQL, MySQL/InnoDB, SQL Server, Oracle

LSM-Trees

LSM-trees (Log-Structured Merge-trees) batch writes in memory, then flush sorted runs to disk.

Write -> Memtable (RAM) -> SSTable (Disk) -> Compacted SSTables
             |
          WAL (for durability)

Write operation:

Append to write-ahead log (durability)
Insert into in-memory buffer (memtable)
When buffer fills, flush to disk as sorted file (SSTable)
Background process merges SSTables

Read operation:

Check memtable first
Check SSTables from newest to oldest
Use bloom filters to skip files that do not contain the key

Characteristics:

Writes are sequential (faster)
Suitable for write-heavy workloads
Better compression (sorted data compresses well)

Limitation: Reads may check multiple files. Compaction uses disk bandwidth.

Used by: Cassandra, RocksDB, LevelDB, HBase

Selection Criteria

Workload	Recommended	Rationale
Read-heavy OLTP	B-tree	Consistent read latency
Write-heavy logging	LSM-tree	Sequential writes
Mixed workload	Depends	Test both
Analytics	Columnar	Different access pattern

General guidance:

Traditional web applications with more reads than writes: B-tree (PostgreSQL, MySQL)
Event ingestion, logs, time-series data: LSM-tree (Cassandra, RocksDB)

Indexes

Indexes are separate data structures that enable row retrieval without full table scans.

Primary index: Determines how the actual data is stored. In a clustered index, rows are physically sorted by the primary key.

Secondary index: A separate structure pointing to the primary data. Multiple secondary indexes are possible, but each one increases write overhead.

Covering index: Includes extra columns to answer queries from the index alone, without accessing the main table.

Trade-off: Every index speeds up reads but slows down writes.

Column-Oriented Storage

For analytics, row-based storage (B-tree or LSM) is suboptimal.

Row-oriented storage:

Row 1: [user_id, name, email, created_at, last_login, ...]
Row 2: [user_id, name, email, created_at, last_login, ...]

Column-oriented storage:

user_ids:    [1, 2, 3, 4, 5, ...]
names:       [alice, bob, carol, ...]
emails:      [a@x.com, b@x.com, ...]

For a query like SELECT AVG(age) FROM users, a row store reads every field for every user. A column store reads only the age column. For tables with 100 columns and billions of rows, this difference is significant.

Used by: Redshift, BigQuery, Snowflake, ClickHouse

In-Memory Databases

Keeping data in RAM provides significant speed improvements (RAM is approximately 100,000x faster than SSD for random access). In-memory databases still write to disk for durability.

Redis: Periodic snapshots + append-only file (AOF)
VoltDB: Replication + snapshots
Memcached: No durability (pure cache)

Limitation: Storage capacity is limited to available memory.

Summary

Engine Type	Read Performance	Write Performance	Use Case
B-trees	Consistent	Random writes	Traditional OLTP
LSM-trees	Variable	Sequential writes	Write-heavy workloads
Column stores	Analytics-optimized	Not for transactions	Data warehousing
In-memory	Fastest	Fastest	Cache, real-time

Primary Storage Engine Types​

B-Trees​

LSM-Trees​

Selection Criteria​

Indexes​

Column-Oriented Storage​

In-Memory Databases​

Summary​

Table of Contents