Skip to main content

Design Search Autocomplete

Design a typeahead/autocomplete system that suggests search queries as users type.

Related Concepts: Trie Data Structure, Prefix Matching, Ranking Algorithm, Caching (Edge/CDN), Sampling for Analytics, MapReduce for Aggregation, Sharding by Prefix, Real-Time Trending Updates

Step 1: Requirements and Scope

Functional Requirements

  • As users type, return top 10 suggestions matching the prefix
  • Suggestions ranked by popularity/relevance
  • Support multiple languages
  • Handle trending queries (fast updates)
  • Personalization (optional)

Non-Functional Requirements

RequirementTargetRationale
Latency< 100ms (ideally < 50ms)Must feel instant
Availability99.99%Core search feature
Scalability10B queries/dayMassive search volume
FreshnessMinutes for trendingCurrent events matter

Scale Estimation

  • 10 billion searches per day (~120K/second)
  • 5 million unique queries in index
  • Average query length: 20 characters
  • ~5 keystrokes per search (with autocomplete)
  • Suggestion requests: 5 x 120K = 600K/second

Step 2: The Core Challenge

Users expect suggestions instantly as they type. With millions of queries and sub-100ms latency requirements, how do you efficiently find and rank matching prefixes?

Loading diagram...

Step 3: Data Structure - Trie

A trie (prefix tree) is ideal for prefix matching.

Trie Structure

Loading diagram...

Trie Lookup Complexity

OperationComplexityExample
Find prefix nodeO(L)L = prefix length
Get all descendantsO(N)N = nodes in subtree
Get top KO(N log K)Need to traverse all

Problem: Finding Top K is Slow

Traversing the entire subtree for each keystroke is too slow.

Solution: Cache top suggestions at each node.

Loading diagram...

Now lookup is O(L) where L = prefix length.

Step 4: High-Level Architecture

Loading diagram...

Step 5: Query Flow

Request Path

Loading diagram...

Response Time Breakdown

StepLatencyNotes
Network (client to LB)10-50msVaries by location
Load balancer< 1msFast routing
Trie lookup< 1msIn-memory, O(L)
Trending merge< 1msRedis read
Network (LB to client)10-50msVaries
Total~50-100msTarget: < 100ms

Step 6: Data Collection Pipeline

Pipeline Architecture

Loading diagram...

Aggregation Query

StepInputOutput
Group by queryRaw logsQuery to count
Filter recentAll timeLast 7 days
Filter spamAll queriesClean queries
NormalizeMixed caseLowercase
Top NAll queriesTop 5M by count

Trie Building Process

StepActionComplexity
1Insert all queriesO(Q x L)
2Compute top-K at each nodeO(N) DFS traversal
3Serialize to binary formatO(N)
4Compress~50% size reduction

Real-Time Updates

Loading diagram...
OperationCommandPurpose
Increment scoreZINCRBY trending:app 1 "app store"Count recent queries
Get topZREVRANGE trending:app 0 9Top 10 trending
Expire oldZREMRANGEBYSCORE trending:app 0 {old_timestamp}Remove stale
QueryBase ScoreTrending BoostFinal Score
"apple stock"100+50 (trending)150
"apple"5000500
"apple event"20+200 (trending)220

Step 8: Scaling Strategies

Trie Size Estimation

FactorValue
Unique queries5 million
Avg query length20 characters
Node overhead~100 bytes (pointers, cached suggestions)
Estimated size500 MB - 2 GB

Conclusion: Fits in memory on a single server.

Sharding (If Needed)

Loading diagram...
Sharding StrategyProsCons
By first characterSimple routingUneven distribution (more 's' than 'x')
By hash of prefixEven distributionNeed to query multiple shards
Full replicationSimple, fastHigher memory cost

Recommendation: Full replication (trie is small enough)

Replication for Availability

ConfigurationPurpose
3+ replicas per regionHigh availability
Multi-region deploymentLow latency globally
Read-only replicasHorizontal scaling

Step 9: Optimizations

Client-Side Optimizations

OptimizationImplementationBenefit
DebouncingWait 100ms after typing stops80% fewer requests
PrefetchingRequest likely next prefixPerceived zero latency
Local cachingCache recent suggestionsOffline support

Server-Side Optimizations

OptimizationImplementationBenefit
Bloom filterSkip prefixes with no matchesSave trie traversal
CompressionCompress trie nodes50% memory reduction
CDN cachingCache popular prefixes at edgeLower latency

Filtering Inappropriate Content

Filter TypeImplementation
ProfanityBlocklist lookup
SpamPattern detection
Personal infoRegex (SSN, phone)
LegalCountry-specific blocks

Step 10: Multi-Language Support

Language-Specific Tries

Loading diagram...

Special Considerations

LanguageChallengeSolution
ChineseNo word boundariesCharacter-based trie
JapaneseMultiple scriptsSeparate tries per script
ArabicRight-to-leftRender correctly
EmojiSpecial charactersUnicode handling

Step 11: Personalization

Personal Suggestions

Loading diagram...

Personalization Signals

SignalWeightSource
Recent searchesHighUser's last 100 queries
Click historyMediumWhat user clicked before
LocationMediumLocal businesses, news
DemographicsLowAge, interests (if known)

Real-World Systems

SystemScaleNotable Features
Google5B searches/dayML ranking, personalized
AmazonProduct searchCategory-aware suggestions
YouTubeVideo searchThumbnail previews
SpotifyMusic searchArtist/song/playlist mixed
ElasticsearchOpen sourceCompletion suggester

Summary: Key Design Decisions

DecisionOptionsRecommendation
Data structureTrie, Inverted indexTrie with cached top-K
StorageMemory, Redis, ElasticsearchIn-memory trie
UpdatesBatch only, Real-timeBatch + streaming for trending
ShardingBy prefix, Full replicationFull replication (trie is small)
RankingFrequency, MLFrequency with trending boost
PersonalizationNone, FullOptional layer on top