Design a Fraud Detection System
Design a machine learning system to detect fraudulent credit card transactions in real-time.
Requirements
Functional:
- Score transactions in real-time
- Decide: approve, decline, or challenge
- Support manual review workflow
- Learn from feedback
Non-functional:
- Latency < 100ms per transaction
- Process 10,000+ transactions per second
- 99.9% availability
- High precision (minimize false declines)
Metrics
Model Metrics
| Metric | Description | Target |
|---|---|---|
| Precision | Fraud caught / Flagged as fraud | > 90% |
| Recall | Fraud caught / Total fraud | > 70% |
| AUC-ROC | Overall discrimination | > 0.95 |
| False Positive Rate | Good transactions declined | < 0.5% |
Business Metrics
| Metric | Description |
|---|---|
| Fraud Loss Rate | Fraud $ / Total $ |
| False Decline Rate | Good $ declined / Total good $ |
| Operational Cost | Cost of manual review |
| Customer Friction | Challenges sent to customers |
Architecture
Loading diagram...
Feature Engineering
Transaction Features
| Feature | Type | Description |
|---|---|---|
| amount | Numerical | Transaction amount |
| amount_zscore | Numerical | Amount vs user's typical |
| merchant_category | Categorical | MCC code |
| transaction_type | Categorical | Online, in-store, ATM |
| card_present | Binary | Physical card used |
| cvv_provided | Binary | CVV entered |
| international | Binary | Cross-border transaction |
User Profile Features
| Feature | Type | Description |
|---|---|---|
| avg_transaction_amount | Numerical | Historical average |
| transaction_frequency | Numerical | Transactions per week |
| typical_merchants | Embedding | Usual merchant types |
| typical_locations | Embedding | Usual transaction locations |
| account_age | Numerical | Days since account opened |
| historical_fraud | Binary | Previous fraud on account |
Behavioral Features
| Feature | Type | Description |
|---|---|---|
| time_since_last_txn | Numerical | Minutes since last transaction |
| velocity_1h | Numerical | Transactions in last hour |
| velocity_24h | Numerical | Transactions in last 24 hours |
| amount_velocity_1h | Numerical | Amount spent in last hour |
| new_merchant | Binary | First time at this merchant |
| new_location | Binary | First time in this location |
| distance_from_last | Numerical | Miles from last transaction |
Contextual Features
| Feature | Type | Description |
|---|---|---|
| hour_of_day | Categorical | Transaction hour |
| day_of_week | Categorical | Transaction day |
| is_weekend | Binary | Weekend transaction |
| device_fingerprint | Hash | Device identifier |
| ip_risk_score | Numerical | IP reputation |
Model Architecture
Ensemble Approach
Loading diagram...
Training with Imbalanced Data
Fraud is rare (under 1% of transactions). Techniques:
| Technique | Description | Pros | Cons |
|---|---|---|---|
| Class Weights | Penalize misclassifying fraud more heavily (e.g., 100:1) | Simple, no data change | Can cause overfitting |
| SMOTE | Generate synthetic fraud samples by interpolation | Balances training data | May create unrealistic samples |
| Undersampling | Reduce legitimate samples to match fraud count | Fast training | Loses legitimate patterns |
| Balanced Bagging | Train multiple models on balanced subsets | Robust, preserves all data | More complex |
Recommendation: Start with class weights, add SMOTE if needed.
Serving
Low-Latency Architecture
Loading diagram...
Decision Engine
Score-based decisions with context-aware thresholds:
| Score Range | Low Value (under $500) | High Value ($500+) | Very High ($1000+) |
|---|---|---|---|
| 0.9 - 1.0 | Decline | Decline | Decline |
| 0.7 - 0.9 | Decline | Decline | Decline |
| 0.5 - 0.7 | Approve | Challenge (3D Secure) | Challenge |
| 0.3 - 0.5 | Approve | Approve | Challenge |
| 0.0 - 0.3 | Approve | Approve | Approve |
Hard Rules (override ML):
- Amount exceeds credit limit -> Decline
- Card in blacklist -> Decline
- Merchant in blacklist -> Decline
Feedback Loop
Labeling Strategy
| Label | Source | Timing | Confidence |
|---|---|---|---|
| Confirmed Fraud | |||
| Chargeback fraud | Customer dispute | 30-90 days | High |
| Bank confirmed | Issuing bank | 1-7 days | Very High |
| User reported | User notification | Hours | High |
| Confirmed Legitimate | |||
| No dispute 90 days | Time passed | 90 days | High |
| User confirmed | User verification | Hours | Very High |
| Merchant verified | Merchant callback | 1-7 days | High |
Continuous Learning
Loading diagram...
Monitoring
Real-time Dashboards
| Metric | Purpose | Target |
|---|---|---|
| Fraud score distribution | Model calibration | Normal distribution |
| Decision distribution | Approve/decline/challenge rates | Decline < 5% |
| Latency percentiles | Service health | p99 < 50ms |
| False positive rate estimate | Customer experience | < 0.5% |
| Feature drift | Model staleness | Within 2 std deviations |
Alerts
| Alert | Condition | Severity |
|---|---|---|
| High decline rate | Decline > 5% | High |
| Model latency | p99 > 100ms | Critical |
| Score drift | Distribution shift | Medium |
Reference
| Topic | Description |
|---|---|
| Precision vs recall trade-off | Catching more fraud increases false declines. Balance based on business cost. |
| Latency vs accuracy | Faster decisions require fewer features. Trade-off based on transaction value. |
| Automation vs review | Full automation scales but misses edge cases. Reserve review for uncertain decisions. |
| Generic vs specific models | One model is simpler. Specialized models (card-present vs online) may improve accuracy. |