Databases

Database selection depends on data structure, access patterns, consistency requirements, and scale. This section covers SQL and NoSQL databases, indexing, scaling strategies, and the CAP theorem.

SQL vs NoSQL

SQL (Relational) Databases

Examples: PostgreSQL, MySQL, SQLite

SQL databases store data in structured tables with defined schemas and support ACID transactions.

Feature	Description
Schema enforcement	Data consistency through defined schemas
ACID transactions	Atomicity, Consistency, Isolation, Durability
JOINs	Query data across related tables
Standard query language	SQL is widely understood

Appropriate use cases:

Data with relationships between entities (users, orders, items)
Transactions requiring ACID guarantees (financial systems)
Ad-hoc query requirements (analytics, debugging)
General-purpose applications where access patterns are not yet defined

NoSQL Databases

NoSQL encompasses multiple database types, each suited for different use cases.

Type	Examples	Use Case
Document	MongoDB, CouchDB	Nested/hierarchical data, flexible schemas
Key-Value	Redis, DynamoDB	Fast lookups by key
Wide-Column	Cassandra, HBase	Time-series data, high write throughput
Graph	Neo4j, Neptune	Relationship-centric data (social graphs, recommendations)

Appropriate use cases:

Horizontal write scaling requirements
Data that does not fit tabular structure
Known, simple access patterns
Applications that do not require JOINs or complex queries

Database Indexing

Indexes improve query performance by avoiding full table scans. Without indexes, the database must scan every row to find matching records. Indexes add write overhead because each write operation must update the index.

B-Tree Indexes

B-tree indexes are the default index type. They support equality comparisons, range queries, and sorting operations.

Hash Indexes

Hash indexes provide O(1) lookup performance for exact match queries (WHERE id = 123). They do not support range queries or sorting.

Composite Indexes

Composite indexes span multiple columns. Column order determines which queries can use the index.

Example: An index on (user_id, status) supports queries filtering on both columns or just user_id, but not queries filtering only on status.

Query	Index Used
`WHERE user_id = 1 AND status = 'pending'`	Yes
`WHERE user_id = 1`	Yes
`WHERE status = 'pending'`	No

The index is ordered by the first column, then the second. Queries filtering only on the second column cannot use the index efficiently.

Database Scaling

Vertical Scaling

Vertical scaling increases capacity by upgrading to more powerful hardware (more CPU, RAM, faster storage).

Aspect	Description
Advantages	Simple implementation, no code changes required
Disadvantages	Hardware limits, single point of failure, cost increases non-linearly

Read Replicas

Read replicas distribute read traffic across multiple database copies while the primary handles writes.

Loading diagram...

Replication lag consideration: Replicas may be slightly behind the primary. A user who writes data and immediately reads may not see their own write if the read hits a replica that has not yet received the update.

Sharding (Horizontal Partitioning)

Sharding distributes data across multiple databases based on a shard key.

Strategy	Description	Advantages	Disadvantages
Range-based	Partition by value ranges	Simple to understand	Hot spots if data is not evenly distributed
Hash-based	Hash the key, mod by shard count	Even distribution	Cannot perform range queries across shards
Directory-based	Lookup table maps keys to shards	Maximum flexibility	Additional lookup required for each request

Sharding considerations:

JOINs across shards are not possible or are expensive
Rebalancing shards requires data migration
Foreign key constraints cannot span shards
Application complexity increases significantly

Recommendation: Delay sharding until vertical scaling and read replicas are insufficient. Sharding adds complexity that is difficult to remove.

CAP Theorem

The CAP theorem states that a distributed database can provide at most two of three guarantees:

Property	Description
Consistency	Every read returns the most recent write
Availability	Every request receives a response
Partition Tolerance	The system continues operating during network partitions

Network partitions are unavoidable in distributed systems. During a partition, the system must choose between consistency (reject requests) or availability (serve potentially stale data).

Database	CAP Classification
PostgreSQL	CP
MongoDB	CP (default)
Cassandra	AP
DynamoDB	Configurable

Design Guidelines

Guideline	Description
Define access patterns	Understand query requirements before selecting a database
Start with PostgreSQL	Relational databases handle more use cases than commonly assumed
Index strategically	Missing indexes degrade read performance; excessive indexes degrade write performance
Plan for growth	Design to support future scaling without requiring immediate implementation
Match consistency to requirements	Financial systems require strong consistency; social feeds can tolerate eventual consistency

SQL vs NoSQL​

SQL (Relational) Databases​

NoSQL Databases​

Database Indexing​

B-Tree Indexes​

Hash Indexes​

Composite Indexes​

Database Scaling​

Vertical Scaling​

Read Replicas​

Sharding (Horizontal Partitioning)​

CAP Theorem​

Design Guidelines​

Table of Contents