Skip to main content

Design a Unique ID Generator

Related Concepts: Snowflake ID Architecture, Clock Synchronization (NTP), Bit Manipulation, Distributed Coordination, Database Auto-Increment Limitations, Time-Based Ordering, Worker ID Assignment

Design a system that generates unique identifiers across a distributed system without requiring coordination between servers.

Step 1: Requirements and Scope

Functional Requirements

  • Generate globally unique IDs
  • IDs should be sortable by time (newer IDs are larger)
  • IDs should be numeric (64 bits)
  • No coordination required between ID generators

Non-Functional Requirements

RequirementTargetRationale
Uniqueness100% guaranteedDuplicate IDs cause data corruption
Latency< 1msID generation on critical path
Throughput10K+ IDs/sec/serverHigh-volume systems
Availability99.999%Core infrastructure
OrderingTime-sortableEfficient indexing, debugging

Scale Estimation

  • 100,000 IDs per second system-wide
  • 1000 servers generating IDs
  • 100 IDs per second per server average
  • Burst: 10,000 IDs per second per server

Step 2: Why Common Solutions Fall Short

Auto-Increment Database

Loading diagram...
ProblemImpact
Single point of failureDatabase down = no IDs
Network latency1-10ms per ID
ScalingSharding auto-increment is complex

UUID (128-bit)

ProblemImpact
128 bitsTwice the storage needed
Not time-sortableRandom ordering hurts indexes
Not numericContains letters (a-f)
Poor index localityRandom distribution = random I/O

Ticket Server (Batches)

Loading diagram...
AdvantagesDisadvantages
Simple conceptTicket server is SPOF
No clock dependencyNeed multiple ticket servers
Sequential within rangeRange exhaustion handling

Step 3: The Snowflake Approach

Twitter's Snowflake generates 64-bit IDs with embedded structure.

Bit Layout

Loading diagram...

Component Details

ComponentBitsRangePurpose
Sign10Keeps ID positive
Timestamp4169 yearsTime since custom epoch
Machine ID100-1023Unique server identifier
Sequence120-4095Counter per millisecond

Capacity Analysis

MetricCalculationResult
Time range2^41 milliseconds~69 years
Max machines2^101,024 servers
IDs per ms per machine2^124,096 IDs
IDs per second per machine4,096 x 1,0004 million
Total system capacity1,024 x 4M4 billion/sec

ID Generation Flow

Loading diagram...

Step 4: Alternative Approaches

Multi-Leader Auto-Increment

Loading diagram...
AdvantagesDisadvantages
Uses existing DB infrastructureAdding servers requires reconfiguration
Simple conceptNot time-sortable
No external dependenciesGaps when servers fail

ULID (Universally Unique Lexicographically Sortable ID)

Loading diagram...
AdvantagesDisadvantages
No machine ID needed128 bits (not 64)
Time-sortableAlphanumeric output
No coordinationTheoretical collision risk

Comparison Table

ApproachBitsSortableCoordinationThroughputClock Dependent
Auto-increment64YesPer insertLowNo
UUID v4128NoNoneHighNo
UUID v7128YesNoneHighYes
Snowflake64YesSetup onlyVery HighYes
Ticket server64Within batchPer batchHighNo
ULID128YesNoneHighYes

Step 5: Clock Synchronization

The Problem

Loading diagram...

Solutions

ProblemSolutionTradeoff
Clock backwardReject / wait until caught upBrief unavailability
Clock driftMonitor NTP syncOperational overhead
Clock forwardCap maximum timestampMay pause generation

Handling Backward Clock

StrategyBehaviorWhen to Use
RejectThrow error, fail fastStrict consistency needed
WaitBlock until clock catches upSmall jumps (< 1s)
Logical incrementUse last_timestamp + 1Availability priority

Step 6: Machine ID Assignment

Options

Loading diagram...
MethodAdvantagesDisadvantagesUse Case
Config fileSimple, explicitManual managementStatic clusters
ZookeeperDynamic, handles failuresExtra dependencyLarge clusters
IP-basedNo coordinationCollision riskStateless containers
Cloud metadataCloud-nativeVendor-specificCloud deployments

Zookeeper Assignment Flow

Loading diagram...

Step 7: System Architecture

Loading diagram...

Deployment Options

OptionArchitectureUse Case
LibraryEmbedded in each serviceSimple, low latency
ServiceDedicated ID serviceCentralized management
SidecarContainer per app instanceKubernetes environments

Step 8: Customizing Bit Allocation

Alternative Layouts

Use CaseTimestampDatacenterMachineSequence
Standard41 bits-10 bits12 bits
Multi-DC41 bits5 bits5 bits12 bits
High volume39 bits-10 bits14 bits
Long lifespan43 bits-8 bits12 bits

Tradeoffs

More bits for...GainCost
TimestampLonger lifespanFewer machines or sequence
Machine IDMore serversShorter lifespan or sequence
SequenceHigher throughputFewer machines or shorter lifespan
DatacenterMulti-regionFewer machines

Production Examples

SystemID FormatNotable Features
Twitter Snowflake64-bitOriginal implementation, open-sourced
Instagram64-bitSharded PostgreSQL-based
Discord64-bit SnowflakeSame as Twitter, epoch 2015
MongoDB ObjectId96-bitTimestamp + machine + PID + counter
Cassandra TimeUUID128-bitUUID v1 variant
StripePrefixed IDsch_1234... for readability

Summary: Key Design Decisions

DecisionOptionsRecommendation
ID formatUUID, Snowflake, ULIDSnowflake for 64-bit sortable
Bit allocationStandard vs customStandard unless specific needs
Clock handlingReject vs wait vs incrementWait for small jumps, reject large
Machine IDConfig, Zookeeper, IP-basedZookeeper for dynamic scaling
DeploymentLibrary vs serviceLibrary for simplicity
EpochUnix epoch vs customCustom to maximize timestamp range