System Design Concept Questions

Theory and concept questions for backend system design interviews covering distributed systems fundamentals.

Scalability

Q1: Vertical vs horizontal scaling

Vertical (Scale Up): Larger machine (more CPU, RAM, storage)

Simpler, no code changes
Hardware limits
Single point of failure

Horizontal (Scale Out): More machines

Theoretically unlimited scale
Requires distributed architecture
More complex (load balancing, data partitioning)

Most systems use both: scale up until cost-prohibitive, then scale out.

Q2: Single point of failure (SPOF) elimination

SPOF: Component whose failure causes entire system failure.

Elimination strategies:

Redundancy: Multiple instances of critical components
Load balancers: Distribute traffic, provide failover
Database replication: Primary-replica setup
Multi-AZ/region: Geographic redundancy
Graceful degradation: System continues with reduced functionality

Q3: Read replicas

Read replicas: Copies of primary database that handle read queries.

Benefits:

Offload reads: Primary handles writes only
Reduce latency: Place replicas closer to users
Improve availability: Replicas can be promoted if primary fails

Considerations:

Replication lag: Reads may return stale data
Write bottleneck remains on primary
Replica failover handling required

Use case: Read-heavy workloads, tolerance for slight staleness.

CAP Theorem and Consistency

Q4: CAP theorem

In a distributed system, only 2 of 3 properties can be guaranteed:

Consistency: All nodes see same data at same time
Availability: Every request gets a response (not error)
Partition Tolerance: System works despite network failures

Network partitions occur, so the choice is between:

CP: Consistent but may be unavailable during partition (bank transactions)
AP: Available but may return stale data (social media feed)

Q5: Strong vs eventual consistency

Strong consistency: After write completes, all reads return the new value

Easier to reason about
Higher latency, lower availability
Use for: Financial transactions, inventory

Eventual consistency: Reads may return stale data, but will converge

Lower latency, higher availability
More complex client handling
Use for: Social media, analytics, caching

Many systems offer tunable consistency (e.g., Cassandra's consistency levels).

Q6: PACELC theorem

Extension of CAP addressing normal operation:

If Partition:

Choose Availability or Consistency (same as CAP)

Else (no partition):

Choose Latency or Consistency

Examples:

DynamoDB: PA/EL (available during partition, low latency otherwise)
Traditional RDBMS: PC/EC (consistent always, higher latency)
Cassandra: Tunable (configurable per query)

Database Design

Q7: SQL vs NoSQL selection

Factor	SQL	NoSQL
Schema	Fixed, structured	Flexible, schema-less
Relationships	Complex joins	Denormalized, embedded
Transactions	ACID guaranteed	Usually eventual consistency
Scaling	Vertical primarily	Horizontal by design
Query flexibility	Ad-hoc queries	Limited query patterns

Choose SQL: Complex queries, transactions, data integrity critical Choose NoSQL: High scale, flexible schema, specific access patterns

Q8: Database sharding strategies

1. Range-based sharding:

Partition by value ranges (users A-M, N-Z)
Advantage: Range queries efficient
Disadvantage: Hotspots if data not uniform

2. Hash-based sharding:

Hash(key) mod N determines shard
Advantage: Even distribution
Disadvantage: Range queries across all shards

3. Directory-based sharding:

Lookup service maps keys to shards
Advantage: Flexible, can rebalance
Disadvantage: Lookup service is SPOF, bottleneck

4. Geographic sharding:

Data stored by region
Advantage: Low latency, data locality compliance
Disadvantage: Cross-region queries complex

Q9: Denormalization

Denormalization: Adding redundancy to improve read performance.

Examples:

Storing count in parent table instead of counting children
Duplicating user name in posts table to avoid join
Precomputing aggregations

Appropriate when:

Read-heavy workloads
Joins are too expensive
Slight inconsistency acceptable

Trade-offs:

Writes become more complex (update multiple places)
Data inconsistency risk
More storage

Q10: Database indexes

Index: Data structure that speeds up queries at cost of write performance.

B-tree index (default):

Suitable for: Range queries, equality, sorting
Columns in WHERE, JOIN, ORDER BY

Hash index:

Suitable for: Exact equality only
O(1) lookup

Composite index:

Multiple columns, leftmost prefix rule
(a, b, c) works for queries on (a), (a, b), (a, b, c), not (b) or (c)

Trade-offs:

Slower writes (must update index)
Storage overhead
Too many indexes hurt write performance

Caching

Q11: Cache invalidation strategies

1. TTL (Time-to-Live):

Cache expires after fixed time
Simple, works for eventually consistent data
May serve stale data until expiry

2. Write-through:

Write to cache and DB simultaneously
Cache always fresh
Higher write latency

3. Write-behind (write-back):

Write to cache, async write to DB
Low write latency
Risk of data loss

4. Cache-aside:

Application manages cache (check cache -> miss -> read DB -> update cache)
Most flexible
Application complexity

Q12: Cache stampede prevention

Cache stampede: Many requests hit the database simultaneously when cache expires.

Prevention strategies:

Locking: Only one request fetches from DB, others wait
Probabilistic early expiration: Randomly refresh before TTL
Background refresh: Async job refreshes cache before expiry
Fallback to stale: Serve stale data while refreshing
Request coalescing: Collapse duplicate requests

Q13: Redis vs Memcached

Feature	Redis	Memcached
Data structures	Strings, lists, sets, hashes, sorted sets	Strings only
Persistence	Yes (RDB, AOF)	No
Replication	Built-in	No
Clustering	Redis Cluster	Client-side sharding
Pub/Sub	Yes	No
Transactions	Yes (MULTI)	No

Choose Redis: Need data structures, persistence, or advanced features Choose Memcached: Simple caching, slightly lower latency

Message Queues

Q14: Message queue use cases

Use cases:

Async processing: Non-blocking slow operations (email, notifications)
Decoupling: Services do not need to know about each other
Load leveling: Smooth out traffic spikes
Reliability: Persist messages if consumer down
Fan-out: One message to multiple consumers

Not needed when:

Synchronous response required
Simple request-response
Tight latency requirements

Q15: Message queue vs event stream

Aspect	Queue (SQS, RabbitMQ)	Stream (Kafka)
Consumption	Message deleted after consume	Messages retained, replayable
Consumers	Competing consumers (one gets message)	Consumer groups (each group gets all messages)
Ordering	FIFO within queue	Ordered within partition
Retention	Until consumed	Time-based (days/weeks)
Use case	Task distribution	Event sourcing, audit log, streaming

Q16: Exactly-once message processing

Exactly-once is difficult. Approaches:

1. Idempotent consumers:

Process message multiple times, same result
Use unique message ID to detect duplicates
Most practical approach

2. Transactional outbox:

Write to DB and outbox table in same transaction
Separate process reads outbox, publishes to queue
Guarantees delivery without duplication

3. Two-phase commit:

Coordinate DB and queue in transaction
Complex, performance impact
Not commonly used

Practical approach: At-least-once delivery + idempotent consumers.

Load Balancing

Q17: Load balancing algorithms

Algorithm	Mechanism	Use Case
Round Robin	Rotate through servers	Servers are identical
Weighted Round Robin	More requests to stronger servers	Heterogeneous servers
Least Connections	Send to server with fewest active connections	Long-lived connections
IP Hash	Hash client IP to server	Session affinity needed
Random	Random server	Simple, decent distribution

Layer 4 (TCP) vs Layer 7 (HTTP):

L4: Faster, fewer features
L7: Content-based routing, SSL termination, caching

Q18: Consistent hashing

Problem: Adding/removing servers causes massive redistribution with modulo hashing.

Solution:

Map both servers and keys to a ring (hash space)
Walk clockwise from key position to find server
Adding server only moves keys between neighbors
Virtual nodes improve distribution

Only K/N keys move when adding/removing a node (vs all keys with modulo).

Used in: DynamoDB, Cassandra, CDNs.

Availability and Reliability

Q19: Availability vs reliability

Availability: System is operational when needed

Measured as uptime percentage (99.9% = 8.76 hours down/year)
Focus on reducing downtime

Reliability: System performs correctly over time

Measured as MTBF (Mean Time Between Failures)
Focus on reducing failures

A system can be available but unreliable (up but giving wrong answers). A system can be reliable but unavailable (works perfectly when running, but often down).

Q20: The nines

Availability	Annual Downtime
99% (two nines)	3.65 days
99.9% (three nines)	8.76 hours
99.99% (four nines)	52.6 minutes
99.999% (five nines)	5.26 minutes

Each nine is 10x harder to achieve. Five nines requires:

Redundancy at every level
Automated failover
Zero-downtime deployments
Geographic distribution

Q21: Circuit breaker pattern

Prevents cascading failures when a service is down.

States:

Closed: Requests flow normally, count failures
Open: After threshold failures, reject requests immediately
Half-open: After timeout, allow test requests

Benefits:

Fails fast (no waiting for timeouts)
Prevents overwhelming failing service
Allows recovery time

API Design

Q22: REST vs GraphQL vs gRPC

Aspect	REST	GraphQL	gRPC
Protocol	HTTP	HTTP	HTTP/2
Format	JSON	JSON	Protocol Buffers
Typing	Optional (OpenAPI)	Strong schema	Strong (protobuf)
Over-fetching	Common	No (client specifies)	No
Caching	HTTP caching works	Complex (POST)	Complex
Use case	Public APIs, CRUD	Mobile apps, complex UIs	Internal microservices

Q23: API versioning

1. URL path versioning:

/api/v1/users, /api/v2/users
Clear, easy caching
Not "pure" REST

2. Header versioning:

Accept: application/vnd.api+json;version=2
Clean URLs
Harder to test

3. Query parameter:

/api/users?version=2
Simple
Not cacheable

URL versioning is most common and practical.

Q24: Rate limiting

Rate limiting: Restrict number of requests from a client.

Algorithms:

Token bucket: Tokens refill at fixed rate, request consumes token
Leaky bucket: Requests queue, process at fixed rate
Fixed window: Count requests per time window
Sliding window: Rolling window for smoother limiting

Implementation:

Redis for distributed systems (INCR with TTL)
Return 429 Too Many Requests
Include Retry-After header
Different limits for different tiers/endpoints

Microservices

Q25: Microservices vs monolith

Aspect	Monolith	Microservices
Complexity	Lower	Higher (network, deployment)
Deployment	All or nothing	Independent
Scaling	Scale entire app	Scale individual services
Data consistency	Transactions easy	Distributed transactions difficult
Team organization	Single team	Multiple autonomous teams
Debugging	Stack traces	Distributed tracing needed

Start with monolith, extract microservices when:

Team grows beyond what one codebase supports
Different scaling needs for components
Need independent deployment

Q26: Distributed transactions

1. Saga pattern:

Sequence of local transactions with compensating actions
If step fails, execute compensations for completed steps
Eventual consistency

2. Two-phase commit (2PC):

Coordinator asks participants to prepare
If all ready, commit; else abort
Blocking, performance impact

3. Outbox pattern:

Write to local DB + outbox table atomically
Separate process publishes events
Reliable event delivery

Best practice: Avoid distributed transactions. Design services with bounded contexts.

Q27: Service discovery

Service discovery: How services find each other's network locations.

Client-side discovery:

Client queries registry, selects instance
More control, but client complexity
Example: Netflix Eureka

Server-side discovery:

Load balancer queries registry
Client simpler, but extra hop
Example: AWS ALB, Kubernetes

Registry options: etcd, Consul, ZooKeeper, DNS-based (Kubernetes)

Needed because: Container IPs change, instances scale dynamically.

Security

Q28: API security

Authentication:
- API keys (simple, limited)
- JWT tokens (stateless, self-contained)
- OAuth 2.0 (delegated access)
Authorization:
- RBAC (Role-Based Access Control)
- Check permissions on every request
Transport:
- HTTPS everywhere
- TLS 1.2+ minimum
Input validation:
- Never trust client input
- Parameterized queries (prevent SQL injection)
- Sanitize output (prevent XSS)
Rate limiting and throttling
Logging and monitoring:
- Audit sensitive operations
- Detect anomalies

Q29: OAuth 2.0 flows

Authorization Code (most secure):

User redirected to auth server
User authenticates, gets authorization code
Backend exchanges code for access token
Backend uses token for API calls

Client Credentials:

Machine-to-machine (no user)
Client directly requests token with credentials

Implicit (deprecated):

Token returned directly in redirect
Not secure (token exposed in URL)

Refresh tokens:

Long-lived token to get new access tokens
Access tokens should be short-lived

Monitoring and Observability

Q30: Three pillars of observability

1. Logs:

Record of discrete events
Structured logs (JSON) for parsing
Centralized (ELK, CloudWatch)

2. Metrics:

Numeric measurements over time
Aggregatable (counters, gauges, histograms)
Prometheus, CloudWatch, Datadog

3. Traces:

Request path through distributed system
Correlation IDs across services
Jaeger, X-Ray, Zipkin

Together they answer:

What happened? (logs)
How much/how fast? (metrics)
Where was time spent? (traces)

Scalability​

Q1: Vertical vs horizontal scaling​

Q2: Single point of failure (SPOF) elimination​

Q3: Read replicas​

CAP Theorem and Consistency​

Q4: CAP theorem​

Q5: Strong vs eventual consistency​

Q6: PACELC theorem​

Database Design​

Q7: SQL vs NoSQL selection​

Q8: Database sharding strategies​

Q9: Denormalization​

Q10: Database indexes​

Caching​

Q11: Cache invalidation strategies​

Q12: Cache stampede prevention​

Q13: Redis vs Memcached​

Message Queues​

Q14: Message queue use cases​

Q15: Message queue vs event stream​

Q16: Exactly-once message processing​

Load Balancing​

Q17: Load balancing algorithms​

Q18: Consistent hashing​

Availability and Reliability​

Q19: Availability vs reliability​

Q20: The nines​

Q21: Circuit breaker pattern​

API Design​

Q22: REST vs GraphQL vs gRPC​

Q23: API versioning​

Q24: Rate limiting​

Microservices​

Q25: Microservices vs monolith​

Q26: Distributed transactions​

Q27: Service discovery​

Security​

Q28: API security​

Q29: OAuth 2.0 flows​

Monitoring and Observability​

Q30: Three pillars of observability​

Table of Contents