Design a Spreadsheet Application

Concepts: Virtual Grid Rendering, Formula Parser (Lexer/AST), Dependency Graph, Cell References, Undo/Redo Stack, Collaborative Editing (CRDT/OT), Canvas vs DOM Rendering, Clipboard API, Keyboard Shortcuts

Requirements

Functional Requirements

Requirement	Description
Cell grid	Rows and columns of editable cells
Cell content	Support text, numbers, and formulas
Formula support	Functions (SUM, AVERAGE) and cell references
Cell formatting	Bold, colors, borders
Multiple sheets	Tabs for separate worksheets
Real-time collaboration	Multiple users editing simultaneously
Import/export	CSV and Excel format support

Non-Functional Requirements

Requirement	Target	Rationale
Grid size	1M+ cells	Enterprise use cases
Render time	< 100ms	Perceived instant response
Formula calculation	< 500ms	Interactive editing experience
Memory	< 500MB	Browser stability

Technical Challenges

Rendering Large Grids

A spreadsheet with 10,000 rows and 100 columns contains 1 million cells. Rendering this many DOM nodes causes browser failure.

Solution: Virtual scrolling

Render only cells within the visible viewport plus a buffer zone.

Loading diagram...

Cell Dependencies

Cell value changes propagate to dependent cells.

Example:

A1 = 5
B1 = A1 * 2 (depends on A1)
C1 = B1 + 10 (depends on B1)

Solution: Dependency graph

Loading diagram...

When A1 changes:

Topological sort identifies all dependents
Recalculate cells in dependency order
Update display

Formula Parsing

Input: =SUM(A1:A10) + B5 * 2

Parsing steps:

Tokenize formula string
Build Abstract Syntax Tree (AST)
Evaluate with current cell values
Handle errors (circular references, invalid syntax)

Loading diagram...

Architecture

Loading diagram...

Data Model

Sparse Storage

Store only cells containing data, not empty cells.

Inefficient approach: Creating a dense array of arrays for all possible cells consumes memory proportional to the grid dimensions, even for mostly empty spreadsheets.

Efficient approach: Use an object keyed by cell addresses (such as "A1", "B2"). Only cells with content exist in the structure. Empty cells have no representation, reducing memory usage to proportional to the number of filled cells.

Cell Data Structure

Each cell stores:

rawValue: The user's input (text or number)
formula: The formula string if the input starts with "="
computedValue: The evaluated result of the formula or raw value
style: Formatting properties (bold, italic, colors, borders)
type: Classification as text, number, formula, or error

State Structure

State organization: The top-level state contains:

sheets: Object of sheet data keyed by ID, each containing cells (sparse), column widths, and row heights
activeSheetId: Currently displayed sheet
selection: Start, end, and active cell positions for range selection
clipboard: Copied cell data for paste operations
history: Past and future state arrays for undo/redo functionality

Formula Engine

Tokenizer

Implementation approach: Iterate through the formula string character by character. Skip the leading equals sign. Identify token types by their starting characters:

Cell references start with uppercase letters and may include a colon for ranges
Numbers start with digits and may include decimal points
Operators are single characters (+, -, *, /, parentheses)
Function names are sequences of uppercase letters

Build each token by consuming characters until a different type is encountered. Return an array of token objects with type and value properties.

Built-in Functions

Implement common spreadsheet functions:

SUM: Flatten input ranges and sum all numeric values
AVERAGE: Sum divided by count of numeric values
MAX/MIN: Return largest/smallest value from flattened input
COUNT: Count of numeric values in the range
IF: Evaluate condition and return true or false value
CONCAT: Join all values into a single string

Dependency Tracking

DependencyGraph class: Maintain two Maps: one tracking which cells each cell depends on (dependencies), and one tracking which cells depend on each cell (dependents).

Setting dependencies: When a cell's formula changes, remove the cell from the dependent lists of its old dependencies, then add it to the dependent lists of its new dependencies.

Getting dependents: Perform a depth-first traversal to find all cells that directly or transitively depend on a given cell. Return cells in topological order for proper recalculation sequence.

Cycle detection: Before allowing a formula, check if it would create a circular reference by traversing the dependency chain. If the starting cell is encountered during traversal, a cycle exists.

Virtual Grid Implementation

Calculate Visible Range

Implementation approach: Create a custom hook that calculates visible row and column ranges based on scroll position, viewport size, and cell dimensions. Use memoization to avoid recalculating on every render. Find the first visible row by searching for the row at the scroll position, and the last visible row at scroll position plus viewport height. Add buffer rows and columns (such as 5 rows and 2 columns) above and below the visible area to prevent flashing during fast scrolling.

Render Visible Cells

Implementation approach: Iterate through the visible range of rows and columns. For each cell, convert the column index to a letter and combine with the row number to create the cell ID (such as "A1"). Look up cell data from the sparse data structure. Render each cell with absolute positioning, calculating left and top positions by summing widths and heights of preceding cells. Use default dimensions for cells without custom sizes. Return a container with relative positioning to serve as the positioning context for absolutely-positioned cells.

Selection and Editing

Selection States

Mode	Description
single	One cell selected
range	Rectangle of cells
multi	Multiple non-contiguous ranges (Ctrl+click)
editing	Actively typing in cell

Key	Action
Arrow keys	Move selection
Tab	Move right
Enter	Move down, confirm edit
Escape	Cancel edit
Ctrl+C/V	Copy/paste
Ctrl+Z/Y	Undo/redo
F2	Edit current cell

Copy/Paste

Implementation approach: For copy operations, iterate through the selected range, collecting computed values from each cell. Join values within each row with tab characters, and join rows with newline characters to create a tab-separated format compatible with other spreadsheet applications. Use the Clipboard API (navigator.clipboard.writeText) to copy the formatted string. For paste operations, parse the incoming text by splitting on newlines and tabs, then update cells starting from the active cell position.

Performance Optimizations

Batch Updates

Inefficient approach: Dispatching individual update actions for each cell triggers multiple state updates and re-renders.

Efficient approach: Batch all cell updates into a single action that updates multiple cells at once, triggering only one state update and re-render.

Memoize Cell Components

Wrap the Cell component with React.memo and provide a custom comparison function. Only re-render when cell data or style actually changes. This prevents re-rendering cells whose content has not changed when the visible range updates.

Web Workers for Formula Calculation

Main thread: Create a Web Worker for formula calculations. When a cell changes, send the cell data and changed cell ID to the worker. Listen for messages from the worker and dispatch the updated cells to state.

Worker thread: Listen for messages containing cells and the changed cell ID. Calculate all dependent cells using the dependency graph. Post the updated cell values back to the main thread.

This approach keeps the UI responsive during complex recalculations.

Throttle Recalculation

Wrap the recalculation function in a debounce with a delay (such as 100ms). This prevents excessive recalculations during rapid typing. Memoize the debounced function to maintain a stable reference.

Real-Time Collaboration

Operational Transform

When multiple users edit simultaneously, operations must be transformed to maintain consistency.

Scenario	Transformation
User A inserts at position 5	Apply at position 5
User B inserts at position 3 (arrives after A)	Transform to position 3 (before A's position)
User A inserts at position 5 (arrives after B)	Transform to position 6 (shifted by B's insert)

CRDT Alternative

Conflict-free Replicated Data Types (CRDTs) allow concurrent editing without central coordination. For spreadsheets, cells can be modeled as Last-Writer-Wins registers.

Approach	Advantages	Disadvantages
Operational Transform	Precise conflict resolution	Server coordination required
CRDT	No server coordination	More complex data structures

Design Trade-offs

Decision	Options	Recommendation
Rendering	DOM per cell vs Canvas vs Virtual	Virtual DOM for flexibility
Storage	Dense array vs Sparse object	Sparse object for memory efficiency
Formula parsing	Regex vs Parser	Parser with AST for extensibility
Recalculation	Full vs Incremental	Incremental via dependency graph
Heavy computation	Main thread vs Web Worker	Web Worker for non-blocking UI
Collaboration	Polling vs OT vs CRDT	OT or CRDT for real-time updates

Design a Spreadsheet Application

Requirements

Functional Requirements

Non-Functional Requirements

Technical Challenges

Rendering Large Grids

Cell Dependencies

Formula Parsing

Architecture

Data Model

Sparse Storage

Cell Data Structure

State Structure

Formula Engine

Tokenizer

Built-in Functions

Dependency Tracking

Virtual Grid Implementation

Calculate Visible Range

Render Visible Cells

Selection and Editing

Selection States

Keyboard Navigation

Copy/Paste

Performance Optimizations

Batch Updates

Memoize Cell Components

Web Workers for Formula Calculation

Throttle Recalculation

Real-Time Collaboration

Operational Transform

CRDT Alternative

Design Trade-offs

Table of Contents

Requirements​

Functional Requirements​

Non-Functional Requirements​

Technical Challenges​

Rendering Large Grids​

Cell Dependencies​

Formula Parsing​

Architecture​

Data Model​

Sparse Storage​

Cell Data Structure​

State Structure​

Formula Engine​

Tokenizer​

Built-in Functions​

Dependency Tracking​

Virtual Grid Implementation​

Calculate Visible Range​

Render Visible Cells​

Selection and Editing​

Selection States​

Keyboard Navigation​

Copy/Paste​

Performance Optimizations​

Batch Updates​

Memoize Cell Components​

Web Workers for Formula Calculation​

Throttle Recalculation​

Real-Time Collaboration​

Operational Transform​

CRDT Alternative​

Design Trade-offs​

Table of Contents

Requirements

Functional Requirements

Non-Functional Requirements

Technical Challenges

Rendering Large Grids

Cell Dependencies

Formula Parsing

Architecture

Data Model

Sparse Storage

Cell Data Structure

State Structure

Formula Engine

Tokenizer

Built-in Functions

Dependency Tracking

Virtual Grid Implementation

Calculate Visible Range

Render Visible Cells

Selection and Editing

Selection States

Keyboard Navigation

Copy/Paste

Performance Optimizations

Batch Updates

Memoize Cell Components

Web Workers for Formula Calculation

Throttle Recalculation

Real-Time Collaboration

Operational Transform

CRDT Alternative

Design Trade-offs