Skip to main content

Databases

Database selection depends on data structure, access patterns, consistency requirements, and scale. This section covers SQL and NoSQL databases, indexing, scaling strategies, and the CAP theorem.

SQL vs NoSQL

SQL (Relational) Databases

Examples: PostgreSQL, MySQL, SQLite

SQL databases store data in structured tables with defined schemas and support ACID transactions.

FeatureDescription
Schema enforcementData consistency through defined schemas
ACID transactionsAtomicity, Consistency, Isolation, Durability
JOINsQuery data across related tables
Standard query languageSQL is widely understood

Appropriate use cases:

  • Data with relationships between entities (users, orders, items)
  • Transactions requiring ACID guarantees (financial systems)
  • Ad-hoc query requirements (analytics, debugging)
  • General-purpose applications where access patterns are not yet defined

NoSQL Databases

NoSQL encompasses multiple database types, each suited for different use cases.

TypeExamplesUse Case
DocumentMongoDB, CouchDBNested/hierarchical data, flexible schemas
Key-ValueRedis, DynamoDBFast lookups by key
Wide-ColumnCassandra, HBaseTime-series data, high write throughput
GraphNeo4j, NeptuneRelationship-centric data (social graphs, recommendations)

Appropriate use cases:

  • Horizontal write scaling requirements
  • Data that does not fit tabular structure
  • Known, simple access patterns
  • Applications that do not require JOINs or complex queries

Database Indexing

Indexes improve query performance by avoiding full table scans. Without indexes, the database must scan every row to find matching records. Indexes add write overhead because each write operation must update the index.

B-Tree Indexes

B-tree indexes are the default index type. They support equality comparisons, range queries, and sorting operations.

Hash Indexes

Hash indexes provide O(1) lookup performance for exact match queries (WHERE id = 123). They do not support range queries or sorting.

Composite Indexes

Composite indexes span multiple columns. Column order determines which queries can use the index.

Example: An index on (user_id, status) supports queries filtering on both columns or just user_id, but not queries filtering only on status.

QueryIndex Used
WHERE user_id = 1 AND status = 'pending'Yes
WHERE user_id = 1Yes
WHERE status = 'pending'No

The index is ordered by the first column, then the second. Queries filtering only on the second column cannot use the index efficiently.

Database Scaling

Vertical Scaling

Vertical scaling increases capacity by upgrading to more powerful hardware (more CPU, RAM, faster storage).

AspectDescription
AdvantagesSimple implementation, no code changes required
DisadvantagesHardware limits, single point of failure, cost increases non-linearly

Read Replicas

Read replicas distribute read traffic across multiple database copies while the primary handles writes.

Loading diagram...

Replication lag consideration: Replicas may be slightly behind the primary. A user who writes data and immediately reads may not see their own write if the read hits a replica that has not yet received the update.

Sharding (Horizontal Partitioning)

Sharding distributes data across multiple databases based on a shard key.

StrategyDescriptionAdvantagesDisadvantages
Range-basedPartition by value rangesSimple to understandHot spots if data is not evenly distributed
Hash-basedHash the key, mod by shard countEven distributionCannot perform range queries across shards
Directory-basedLookup table maps keys to shardsMaximum flexibilityAdditional lookup required for each request

Sharding considerations:

  • JOINs across shards are not possible or are expensive
  • Rebalancing shards requires data migration
  • Foreign key constraints cannot span shards
  • Application complexity increases significantly

Recommendation: Delay sharding until vertical scaling and read replicas are insufficient. Sharding adds complexity that is difficult to remove.

CAP Theorem

The CAP theorem states that a distributed database can provide at most two of three guarantees:

PropertyDescription
ConsistencyEvery read returns the most recent write
AvailabilityEvery request receives a response
Partition ToleranceThe system continues operating during network partitions

Network partitions are unavoidable in distributed systems. During a partition, the system must choose between consistency (reject requests) or availability (serve potentially stale data).

DatabaseCAP Classification
PostgreSQLCP
MongoDBCP (default)
CassandraAP
DynamoDBConfigurable

Design Guidelines

GuidelineDescription
Define access patternsUnderstand query requirements before selecting a database
Start with PostgreSQLRelational databases handle more use cases than commonly assumed
Index strategicallyMissing indexes degrade read performance; excessive indexes degrade write performance
Plan for growthDesign to support future scaling without requiring immediate implementation
Match consistency to requirementsFinancial systems require strong consistency; social feeds can tolerate eventual consistency