Design a Spreadsheet Application
Concepts: Virtual Grid Rendering, Formula Parser (Lexer/AST), Dependency Graph, Cell References, Undo/Redo Stack, Collaborative Editing (CRDT/OT), Canvas vs DOM Rendering, Clipboard API, Keyboard Shortcuts
Requirements
Functional Requirements
| Requirement | Description |
|---|---|
| Cell grid | Rows and columns of editable cells |
| Cell content | Support text, numbers, and formulas |
| Formula support | Functions (SUM, AVERAGE) and cell references |
| Cell formatting | Bold, colors, borders |
| Multiple sheets | Tabs for separate worksheets |
| Real-time collaboration | Multiple users editing simultaneously |
| Import/export | CSV and Excel format support |
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Grid size | 1M+ cells | Enterprise use cases |
| Render time | < 100ms | Perceived instant response |
| Formula calculation | < 500ms | Interactive editing experience |
| Memory | < 500MB | Browser stability |
Technical Challenges
Rendering Large Grids
A spreadsheet with 10,000 rows and 100 columns contains 1 million cells. Rendering this many DOM nodes causes browser failure.
Solution: Virtual scrolling
Render only cells within the visible viewport plus a buffer zone.
Cell Dependencies
Cell value changes propagate to dependent cells.
Example:
A1 = 5B1 = A1 * 2(depends on A1)C1 = B1 + 10(depends on B1)
Solution: Dependency graph
When A1 changes:
- Topological sort identifies all dependents
- Recalculate cells in dependency order
- Update display
Formula Parsing
Input: =SUM(A1:A10) + B5 * 2
Parsing steps:
- Tokenize formula string
- Build Abstract Syntax Tree (AST)
- Evaluate with current cell values
- Handle errors (circular references, invalid syntax)
Architecture
Data Model
Sparse Storage
Store only cells containing data, not empty cells.
Inefficient approach: Creating a dense array of arrays for all possible cells consumes memory proportional to the grid dimensions, even for mostly empty spreadsheets.
Efficient approach: Use an object keyed by cell addresses (such as "A1", "B2"). Only cells with content exist in the structure. Empty cells have no representation, reducing memory usage to proportional to the number of filled cells.
Cell Data Structure
Each cell stores:
- rawValue: The user's input (text or number)
- formula: The formula string if the input starts with "="
- computedValue: The evaluated result of the formula or raw value
- style: Formatting properties (bold, italic, colors, borders)
- type: Classification as text, number, formula, or error
State Structure
State organization: The top-level state contains:
- sheets: Object of sheet data keyed by ID, each containing cells (sparse), column widths, and row heights
- activeSheetId: Currently displayed sheet
- selection: Start, end, and active cell positions for range selection
- clipboard: Copied cell data for paste operations
- history: Past and future state arrays for undo/redo functionality
Formula Engine
Tokenizer
Implementation approach: Iterate through the formula string character by character. Skip the leading equals sign. Identify token types by their starting characters:
- Cell references start with uppercase letters and may include a colon for ranges
- Numbers start with digits and may include decimal points
- Operators are single characters (+, -, *, /, parentheses)
- Function names are sequences of uppercase letters
Build each token by consuming characters until a different type is encountered. Return an array of token objects with type and value properties.
Built-in Functions
Implement common spreadsheet functions:
- SUM: Flatten input ranges and sum all numeric values
- AVERAGE: Sum divided by count of numeric values
- MAX/MIN: Return largest/smallest value from flattened input
- COUNT: Count of numeric values in the range
- IF: Evaluate condition and return true or false value
- CONCAT: Join all values into a single string
Dependency Tracking
DependencyGraph class: Maintain two Maps: one tracking which cells each cell depends on (dependencies), and one tracking which cells depend on each cell (dependents).
Setting dependencies: When a cell's formula changes, remove the cell from the dependent lists of its old dependencies, then add it to the dependent lists of its new dependencies.
Getting dependents: Perform a depth-first traversal to find all cells that directly or transitively depend on a given cell. Return cells in topological order for proper recalculation sequence.
Cycle detection: Before allowing a formula, check if it would create a circular reference by traversing the dependency chain. If the starting cell is encountered during traversal, a cycle exists.
Virtual Grid Implementation
Calculate Visible Range
Implementation approach: Create a custom hook that calculates visible row and column ranges based on scroll position, viewport size, and cell dimensions. Use memoization to avoid recalculating on every render. Find the first visible row by searching for the row at the scroll position, and the last visible row at scroll position plus viewport height. Add buffer rows and columns (such as 5 rows and 2 columns) above and below the visible area to prevent flashing during fast scrolling.
Render Visible Cells
Implementation approach: Iterate through the visible range of rows and columns. For each cell, convert the column index to a letter and combine with the row number to create the cell ID (such as "A1"). Look up cell data from the sparse data structure. Render each cell with absolute positioning, calculating left and top positions by summing widths and heights of preceding cells. Use default dimensions for cells without custom sizes. Return a container with relative positioning to serve as the positioning context for absolutely-positioned cells.
Selection and Editing
Selection States
| Mode | Description |
|---|---|
| single | One cell selected |
| range | Rectangle of cells |
| multi | Multiple non-contiguous ranges (Ctrl+click) |
| editing | Actively typing in cell |
Keyboard Navigation
| Key | Action |
|---|---|
| Arrow keys | Move selection |
| Tab | Move right |
| Enter | Move down, confirm edit |
| Escape | Cancel edit |
| Ctrl+C/V | Copy/paste |
| Ctrl+Z/Y | Undo/redo |
| F2 | Edit current cell |
Copy/Paste
Implementation approach: For copy operations, iterate through the selected range, collecting computed values from each cell. Join values within each row with tab characters, and join rows with newline characters to create a tab-separated format compatible with other spreadsheet applications. Use the Clipboard API (navigator.clipboard.writeText) to copy the formatted string. For paste operations, parse the incoming text by splitting on newlines and tabs, then update cells starting from the active cell position.
Performance Optimizations
Batch Updates
Inefficient approach: Dispatching individual update actions for each cell triggers multiple state updates and re-renders.
Efficient approach: Batch all cell updates into a single action that updates multiple cells at once, triggering only one state update and re-render.
Memoize Cell Components
Wrap the Cell component with React.memo and provide a custom comparison function. Only re-render when cell data or style actually changes. This prevents re-rendering cells whose content has not changed when the visible range updates.
Web Workers for Formula Calculation
Main thread: Create a Web Worker for formula calculations. When a cell changes, send the cell data and changed cell ID to the worker. Listen for messages from the worker and dispatch the updated cells to state.
Worker thread: Listen for messages containing cells and the changed cell ID. Calculate all dependent cells using the dependency graph. Post the updated cell values back to the main thread.
This approach keeps the UI responsive during complex recalculations.
Throttle Recalculation
Wrap the recalculation function in a debounce with a delay (such as 100ms). This prevents excessive recalculations during rapid typing. Memoize the debounced function to maintain a stable reference.
Real-Time Collaboration
Operational Transform
When multiple users edit simultaneously, operations must be transformed to maintain consistency.
| Scenario | Transformation |
|---|---|
| User A inserts at position 5 | Apply at position 5 |
| User B inserts at position 3 (arrives after A) | Transform to position 3 (before A's position) |
| User A inserts at position 5 (arrives after B) | Transform to position 6 (shifted by B's insert) |
CRDT Alternative
Conflict-free Replicated Data Types (CRDTs) allow concurrent editing without central coordination. For spreadsheets, cells can be modeled as Last-Writer-Wins registers.
| Approach | Advantages | Disadvantages |
|---|---|---|
| Operational Transform | Precise conflict resolution | Server coordination required |
| CRDT | No server coordination | More complex data structures |
Design Trade-offs
| Decision | Options | Recommendation |
|---|---|---|
| Rendering | DOM per cell vs Canvas vs Virtual | Virtual DOM for flexibility |
| Storage | Dense array vs Sparse object | Sparse object for memory efficiency |
| Formula parsing | Regex vs Parser | Parser with AST for extensibility |
| Recalculation | Full vs Incremental | Incremental via dependency graph |
| Heavy computation | Main thread vs Web Worker | Web Worker for non-blocking UI |
| Collaboration | Polling vs OT vs CRDT | OT or CRDT for real-time updates |