Design YouTube
Design a video streaming platform that lets users upload, watch, and share videos.
Related Concepts: Video Encoding/Transcoding (FFmpeg), Adaptive Bitrate Streaming (HLS/DASH), CDN Distribution, Blob Storage (S3), Asynchronous Processing (Message Queue), Chunked Upload, Thumbnail Generation, Metadata Indexing
Step 1: Requirements and Scope
Functional Requirements
- Users can upload videos
- Users can watch videos (streaming playback)
- Users can search for videos
- Users can like, comment, share videos
- Support multiple video qualities (360p, 720p, 1080p, 4K)
- Recommendations (optional)
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Latency | < 200ms start time | User experience |
| Availability | 99.99% | Core feature |
| Consistency | Eventual for metadata | Video data is immutable |
| Durability | No video loss | Content is valuable |
Scale Estimation
YouTube-scale numbers:
- 2 billion monthly active users
- 500 hours of video uploaded per minute
- 1 billion video views per day
- Average video length: 5 minutes
- Storage formats: 360p, 720p, 1080p, 4K
Upload bandwidth:
- 500 hours/minute x 60 minutes = 30,000 hours uploaded per hour
- At 720p (1.5 Mbps): ~5.4 GB/hour per video
- Total: ~162 TB uploaded per hour
Storage (rough):
- Store multiple resolutions per video
- 1080p video: ~3 GB/hour
- With multiple formats: ~10 GB/hour per video
- Growing by ~300+ PB per year
Step 2: High-Level Architecture
Key Components:
- CDN: Delivers video content to users globally
- Upload API: Handles video uploads with resumable uploads
- Encoding Pipeline: Transcodes videos to multiple formats
- Blob Storage: Stores raw and processed video files
- Metadata Service: Handles video info, user data, etc.
Step 3: Video Upload Flow
Requirements
Users upload large files (potentially hours of 4K video). The upload must be:
- Resumable (network failures happen)
- Validated (no corrupt files)
- Processed asynchronously (encoding takes time)
Upload Flow
Resumable Uploads
Large files need chunked, resumable uploads:
| Step | Action | Purpose |
|---|---|---|
| 1 | Client requests upload | Get upload_id and signed URL |
| 2 | Client uploads in chunks | Typically 5-10 MB chunks |
| 3 | Server tracks progress | Each chunk acknowledged |
| 4 | On failure, resume from last chunk | Avoid restarting entire upload |
| 5 | Client confirms completion | Trigger processing |
Pre-signed URLs
| Approach | Pros | Cons |
|---|---|---|
| Upload through API | Simple | API servers become bottleneck |
| Pre-signed URL to S3 | Scalable, direct | More complex client |
Recommendation: Pre-signed URLs. Let clients upload directly to object storage.
Step 4: Video Encoding Pipeline
Purpose
Users upload videos in various formats (MP4, MOV, AVI, MKV...) and resolutions. The system must:
- Normalize to standard formats
- Create multiple quality levels
- Optimize for streaming
Encoding Pipeline
Output Formats
| Resolution | Bitrate | Use Case |
|---|---|---|
| 360p | 0.4 Mbps | Mobile data saver |
| 720p | 1.5 Mbps | Standard mobile |
| 1080p | 4 Mbps | Desktop, good connection |
| 4K | 15 Mbps | High-end devices |
Parallel Encoding
A 10-minute video takes ~10 minutes to encode sequentially.
Solution: Segment-based parallel encoding
| Approach | Time for 10 min video | Workers |
|---|---|---|
| Sequential | ~10 minutes | 1 |
| Parallel (5 workers) | ~2 minutes | 5 |
| Parallel (10 workers) | ~1 minute | 10 |
Encoding Infrastructure
| Option | Pros | Cons |
|---|---|---|
| Self-managed EC2/GCE | Full control | Ops overhead |
| AWS Elastic Transcoder | Managed, scalable | Cost at scale |
| Custom encoding farm | Optimized for workload | Complex |
YouTube's approach: Custom encoding infrastructure (Borg) for cost efficiency at scale.
Step 5: Video Storage
Storage Tiers
Not all videos are accessed equally. Optimize storage costs:
| Tier | Storage Type | Access Pattern | Cost |
|---|---|---|---|
| Hot | SSD / Standard S3 | Frequent (popular videos) | $$$ |
| Warm | HDD / S3 IA | Occasional | $$ |
| Cold | Glacier / Archive | Rare (old videos) | $ |
Storage Organization
Videos are organized in a hierarchical folder structure. At the top level, each video has a folder identified by its video_id. Within each video folder:
- A raw subfolder contains the original uploaded file
- An encoded subfolder contains subfolders for each resolution (360p, 720p, 1080p), with each resolution folder holding numbered segment files plus the HLS/DASH manifest file
- A thumbnails subfolder contains the default thumbnail and preview frames
Content Addressing
Use content-addressed storage for deduplication:
| Approach | How It Works | Savings |
|---|---|---|
| Video-level | Hash entire video | Low (slight re-encodes differ) |
| Segment-level | Hash each segment | Medium (common intros/outros) |
| Block-level | Hash small blocks | High (requires more compute) |
Step 6: Video Streaming
Streaming Protocols
| Protocol | How It Works | Use Case |
|---|---|---|
| HLS | HTTP-based, chunks | iOS, Safari, default choice |
| DASH | HTTP-based, adaptive | Cross-platform, YouTube uses |
| RTMP | Persistent connection | Legacy, live streaming |
Adaptive Bitrate Streaming
The player automatically switches quality based on network conditions.
Manifest File
The HLS manifest (m3u8 file) lists all available quality levels with their bandwidth requirements and resolutions. It contains entries for each resolution option: 360p at approximately 400 Kbps for low-bandwidth connections, 720p at 1.5 Mbps for standard quality, and 1080p at 4 Mbps for high-definition playback. Each entry points to that resolution's segment playlist, allowing the player to switch between quality levels based on network conditions.
Step 7: Content Delivery Network (CDN)
CDN Architecture
Without CDN: User in Tokyo requests video stored in US -> 200ms+ latency
With CDN: Video cached at Tokyo edge server -> 20ms latency
CDN Caching Strategy
| Content Type | Cache Duration | Rationale |
|---|---|---|
| Popular videos | Days-weeks | Frequently accessed |
| Long-tail videos | Hours | May not be accessed again |
| Thumbnails | Weeks | Small, frequently shown |
| Manifests | Minutes | May be updated |
Multi-CDN Strategy
YouTube uses multiple CDNs:
- Google's private network (most traffic)
- ISP peering (cache inside ISP networks)
- Commercial CDNs (backup/overflow)
| Benefit | Description |
|---|---|
| Redundancy | CDN outage does not take down service |
| Cost optimization | Route to cheapest option |
| Performance | Choose fastest for each user |
Step 8: Metadata and Search
Metadata Schema
The videos table stores core video metadata:
- video_id: Unique identifier (primary key)
- user_id: Uploader's account
- title and description: User-provided content
- duration_seconds: Video length
- upload_time: When the video was uploaded
- status: Processing state (processing, ready, or failed)
- view_count and like_count: Engagement metrics
The video_formats table tracks encoded versions with a composite primary key of video_id and resolution. Each row stores the resolution (like "720p"), bitrate, and storage path for that encoded version.
View Count Problem
Naive approach: UPDATE videos SET view_count = view_count + 1
At YouTube scale (1B views/day), this creates massive database contention.
Solution: Batch counting
Search Implementation
| Component | Technology | Purpose |
|---|---|---|
| Primary search | Elasticsearch | Full-text search on titles, descriptions |
| Autocomplete | Trie/Prefix tree | Instant suggestions |
| Trending | Redis | Fast access to popular searches |
Step 9: Handling Failures
Upload Failures
| Failure | Detection | Recovery |
|---|---|---|
| Network timeout | Client timeout | Resume from last chunk |
| Corrupt chunk | Checksum mismatch | Retry chunk upload |
| Storage failure | Write error | Retry to different region |
Encoding Failures
| Failure | Detection | Recovery |
|---|---|---|
| Worker crash | Heartbeat timeout | Re-queue segment |
| Corrupt output | Validation check | Re-encode |
| Resource exhaustion | OOM error | Smaller segments |
Playback Failures
| Failure | Detection | Recovery |
|---|---|---|
| CDN cache miss | 404 response | Fetch from origin |
| Quality unavailable | Manifest lookup | Fall back to lower quality |
| Network degradation | Buffering events | Switch to lower bitrate |
Step 10: Cost Optimization
Video platforms are expensive. Key cost drivers:
| Cost Area | At YouTube Scale | Optimization |
|---|---|---|
| Storage | 700+ PB | Tiered storage, dedup |
| CDN/Bandwidth | Massive | Private network, peering |
| Encoding | 500 hrs/min uploaded | Efficient codecs, parallelization |
| Compute | Transcoding, ML | Spot instances, efficient scheduling |
Codec Evolution
| Codec | Bitrate Savings | Adoption |
|---|---|---|
| H.264 | Baseline | Universal |
| H.265/HEVC | 25-50% vs H.264 | Growing |
| VP9 | 30-50% vs H.264 | YouTube default |
| AV1 | 30% vs VP9 | New standard |
YouTube aggressively pushes VP9/AV1 to reduce bandwidth costs.
Real-World Systems
| Company | Notable Design Choice |
|---|---|
| YouTube | VP9/AV1 codecs, private CDN (Google's network), bigtable for metadata |
| Netflix | Per-title encoding (each video gets optimal settings), Open Connect CDN |
| Twitch | Optimized for live (lower latency transcoding), HLS |
| TikTok | Short-form optimized, aggressive caching, quick startup |
Summary: Key Design Decisions
| Decision | Options | Recommendation |
|---|---|---|
| Upload method | Through API, Direct to storage | Pre-signed URLs to S3 |
| Encoding | Sequential, Parallel | Parallel segment encoding |
| Streaming protocol | HLS, DASH, RTMP | HLS/DASH with adaptive bitrate |
| Storage | Single tier, Tiered | Tiered (hot/warm/cold) |
| CDN | Single CDN, Multi-CDN | Multi-CDN with private network |
| View counting | Real-time, Batched | Batched with Redis buffer |