Skip to main content

Design a Spreadsheet Application

Concepts: Virtual Grid Rendering, Formula Parser (Lexer/AST), Dependency Graph, Cell References, Undo/Redo Stack, Collaborative Editing (CRDT/OT), Canvas vs DOM Rendering, Clipboard API, Keyboard Shortcuts

Requirements

Functional Requirements

RequirementDescription
Cell gridRows and columns of editable cells
Cell contentSupport text, numbers, and formulas
Formula supportFunctions (SUM, AVERAGE) and cell references
Cell formattingBold, colors, borders
Multiple sheetsTabs for separate worksheets
Real-time collaborationMultiple users editing simultaneously
Import/exportCSV and Excel format support

Non-Functional Requirements

RequirementTargetRationale
Grid size1M+ cellsEnterprise use cases
Render time< 100msPerceived instant response
Formula calculation< 500msInteractive editing experience
Memory< 500MBBrowser stability

Technical Challenges

Rendering Large Grids

A spreadsheet with 10,000 rows and 100 columns contains 1 million cells. Rendering this many DOM nodes causes browser failure.

Solution: Virtual scrolling

Render only cells within the visible viewport plus a buffer zone.

Loading diagram...

Cell Dependencies

Cell value changes propagate to dependent cells.

Example:

  • A1 = 5
  • B1 = A1 * 2 (depends on A1)
  • C1 = B1 + 10 (depends on B1)

Solution: Dependency graph

Loading diagram...

When A1 changes:

  1. Topological sort identifies all dependents
  2. Recalculate cells in dependency order
  3. Update display

Formula Parsing

Input: =SUM(A1:A10) + B5 * 2

Parsing steps:

  1. Tokenize formula string
  2. Build Abstract Syntax Tree (AST)
  3. Evaluate with current cell values
  4. Handle errors (circular references, invalid syntax)
Loading diagram...

Architecture

Loading diagram...

Data Model

Sparse Storage

Store only cells containing data, not empty cells.

Inefficient approach: Creating a dense array of arrays for all possible cells consumes memory proportional to the grid dimensions, even for mostly empty spreadsheets.

Efficient approach: Use an object keyed by cell addresses (such as "A1", "B2"). Only cells with content exist in the structure. Empty cells have no representation, reducing memory usage to proportional to the number of filled cells.

Cell Data Structure

Each cell stores:

  • rawValue: The user's input (text or number)
  • formula: The formula string if the input starts with "="
  • computedValue: The evaluated result of the formula or raw value
  • style: Formatting properties (bold, italic, colors, borders)
  • type: Classification as text, number, formula, or error

State Structure

State organization: The top-level state contains:

  • sheets: Object of sheet data keyed by ID, each containing cells (sparse), column widths, and row heights
  • activeSheetId: Currently displayed sheet
  • selection: Start, end, and active cell positions for range selection
  • clipboard: Copied cell data for paste operations
  • history: Past and future state arrays for undo/redo functionality

Formula Engine

Tokenizer

Implementation approach: Iterate through the formula string character by character. Skip the leading equals sign. Identify token types by their starting characters:

  • Cell references start with uppercase letters and may include a colon for ranges
  • Numbers start with digits and may include decimal points
  • Operators are single characters (+, -, *, /, parentheses)
  • Function names are sequences of uppercase letters

Build each token by consuming characters until a different type is encountered. Return an array of token objects with type and value properties.

Built-in Functions

Implement common spreadsheet functions:

  • SUM: Flatten input ranges and sum all numeric values
  • AVERAGE: Sum divided by count of numeric values
  • MAX/MIN: Return largest/smallest value from flattened input
  • COUNT: Count of numeric values in the range
  • IF: Evaluate condition and return true or false value
  • CONCAT: Join all values into a single string

Dependency Tracking

DependencyGraph class: Maintain two Maps: one tracking which cells each cell depends on (dependencies), and one tracking which cells depend on each cell (dependents).

Setting dependencies: When a cell's formula changes, remove the cell from the dependent lists of its old dependencies, then add it to the dependent lists of its new dependencies.

Getting dependents: Perform a depth-first traversal to find all cells that directly or transitively depend on a given cell. Return cells in topological order for proper recalculation sequence.

Cycle detection: Before allowing a formula, check if it would create a circular reference by traversing the dependency chain. If the starting cell is encountered during traversal, a cycle exists.

Virtual Grid Implementation

Calculate Visible Range

Implementation approach: Create a custom hook that calculates visible row and column ranges based on scroll position, viewport size, and cell dimensions. Use memoization to avoid recalculating on every render. Find the first visible row by searching for the row at the scroll position, and the last visible row at scroll position plus viewport height. Add buffer rows and columns (such as 5 rows and 2 columns) above and below the visible area to prevent flashing during fast scrolling.

Render Visible Cells

Implementation approach: Iterate through the visible range of rows and columns. For each cell, convert the column index to a letter and combine with the row number to create the cell ID (such as "A1"). Look up cell data from the sparse data structure. Render each cell with absolute positioning, calculating left and top positions by summing widths and heights of preceding cells. Use default dimensions for cells without custom sizes. Return a container with relative positioning to serve as the positioning context for absolutely-positioned cells.

Selection and Editing

Selection States

ModeDescription
singleOne cell selected
rangeRectangle of cells
multiMultiple non-contiguous ranges (Ctrl+click)
editingActively typing in cell

Keyboard Navigation

KeyAction
Arrow keysMove selection
TabMove right
EnterMove down, confirm edit
EscapeCancel edit
Ctrl+C/VCopy/paste
Ctrl+Z/YUndo/redo
F2Edit current cell

Copy/Paste

Implementation approach: For copy operations, iterate through the selected range, collecting computed values from each cell. Join values within each row with tab characters, and join rows with newline characters to create a tab-separated format compatible with other spreadsheet applications. Use the Clipboard API (navigator.clipboard.writeText) to copy the formatted string. For paste operations, parse the incoming text by splitting on newlines and tabs, then update cells starting from the active cell position.

Performance Optimizations

Batch Updates

Inefficient approach: Dispatching individual update actions for each cell triggers multiple state updates and re-renders.

Efficient approach: Batch all cell updates into a single action that updates multiple cells at once, triggering only one state update and re-render.

Memoize Cell Components

Wrap the Cell component with React.memo and provide a custom comparison function. Only re-render when cell data or style actually changes. This prevents re-rendering cells whose content has not changed when the visible range updates.

Web Workers for Formula Calculation

Main thread: Create a Web Worker for formula calculations. When a cell changes, send the cell data and changed cell ID to the worker. Listen for messages from the worker and dispatch the updated cells to state.

Worker thread: Listen for messages containing cells and the changed cell ID. Calculate all dependent cells using the dependency graph. Post the updated cell values back to the main thread.

This approach keeps the UI responsive during complex recalculations.

Throttle Recalculation

Wrap the recalculation function in a debounce with a delay (such as 100ms). This prevents excessive recalculations during rapid typing. Memoize the debounced function to maintain a stable reference.

Real-Time Collaboration

Operational Transform

When multiple users edit simultaneously, operations must be transformed to maintain consistency.

ScenarioTransformation
User A inserts at position 5Apply at position 5
User B inserts at position 3 (arrives after A)Transform to position 3 (before A's position)
User A inserts at position 5 (arrives after B)Transform to position 6 (shifted by B's insert)

CRDT Alternative

Conflict-free Replicated Data Types (CRDTs) allow concurrent editing without central coordination. For spreadsheets, cells can be modeled as Last-Writer-Wins registers.

ApproachAdvantagesDisadvantages
Operational TransformPrecise conflict resolutionServer coordination required
CRDTNo server coordinationMore complex data structures

Design Trade-offs

DecisionOptionsRecommendation
RenderingDOM per cell vs Canvas vs VirtualVirtual DOM for flexibility
StorageDense array vs Sparse objectSparse object for memory efficiency
Formula parsingRegex vs ParserParser with AST for extensibility
RecalculationFull vs IncrementalIncremental via dependency graph
Heavy computationMain thread vs Web WorkerWeb Worker for non-blocking UI
CollaborationPolling vs OT vs CRDTOT or CRDT for real-time updates