Design a Fraud Detection System
Concepts tested: Class imbalance, anomaly detection, rule-based vs ML hybrid, velocity features, real-time scoring, precision-recall trade-offs, cost-sensitive learning, model explainability
Problem Statement
Design a fraud detection system for payments, account takeover, or fake account prevention.
Clarification Questions
| Question | Design Impact |
|---|---|
| Fraud type | Feature engineering approach |
| Available actions | Block, allow, review, step-up authentication |
| Latency requirement | Real-time vs batch architecture |
| Cost structure | Loss from fraud vs loss from false declines |
| Available data | Transaction history, device, behavioral, network |
Problem Characteristics
Class Imbalance
Fraud rates are typically low:
- Credit card fraud: 0.1% of transactions
- Account takeover: 0.01% of logins
- Fake accounts: 1-5% of signups
Implication: Standard accuracy metrics are misleading. A model predicting "not fraud" always achieves 99.9% accuracy while detecting zero fraud.
Adversarial Environment
Fraudsters actively adapt to detection methods:
- Study detection patterns and develop workarounds
- Share techniques in fraud communities
- Use automation (bots, scripts)
- Continuously evolve tactics
Error Costs
| Decision | Error Impact |
|---|---|
| Block legitimate user | Lost revenue, negative experience, churn |
| Allow fraudster | Direct financial loss |
| Manual review backlog | Operational cost, delayed decisions |
System Architecture
Real-Time Scoring Pipeline:
- Feature Enrichment: Add derived features to the transaction
- Rules Engine: Apply deterministic rules
- ML Models: Run supervised, anomaly detection, and graph-based models
- Output: Combined score with explanation
Decision Engine:
- Score below 20: ALLOW the transaction
- Score 20-70: Send to REVIEW queue for manual inspection
- Score 70 or above: BLOCK the transaction
Async Pipeline (batch analysis):
- Pattern Mining: Discover new fraud tactics from historical data
- Network Analysis: Identify fraud rings through connection analysis
- Label Feedback: Update models with confirmed fraud/non-fraud labels
Feature Engineering
Transaction Features
| Feature Category | Examples |
|---|---|
| Transaction details | Amount, merchant category, payment method |
| Amount patterns | Average transaction size, deviation from typical |
| Velocity | Transactions in last 1h/24h/7d |
| Geographic | Distance from typical location, high-risk countries |
| Temporal | Time of day, day of week, holiday |
User Behavior Features
| Feature | Signal |
|---|---|
| Device fingerprint | New vs known device |
| Login patterns | Unusual login times, locations |
| Session behavior | Click patterns, navigation speed |
| Account age | New accounts are higher risk |
| Historical fraud | Past fraud/disputes on account |
Aggregated Features
Velocity features are computed by aggregating transaction history over various time windows:
| Feature | Description |
|---|---|
| Transaction count (1h/24h) | Number of transactions in the last 1 hour or 24 hours |
| Transaction amount (1h) | Total transaction amount in the last hour |
| Distinct merchants (24h) | Number of unique merchants in last 24 hours |
| Distinct devices (7d) | Number of unique devices used in last 7 days |
| Average transaction amount (30d) | Historical average transaction size |
| Amount vs average ratio | Current transaction amount divided by 30-day average |
Network/Graph Features
Fraudsters often operate in networks.
| Feature | Description |
|---|---|
| Shared device count | Number of users sharing same device |
| Shared address count | Number of accounts at same address |
| Shared payment method | Number of accounts using same card |
| Network cluster risk | Aggregate risk of connected entities |
Model Approaches
1. Rule-Based System
A rule-based fraud detection system evaluates transactions against predefined conditions:
| Rule | Points | Trigger Condition |
|---|---|---|
| High amount new account | +30 | Account less than 7 days old AND transaction over $500 |
| Unusual location | +20 | Distance from typical location exceeds 1000 miles |
| High velocity | +25 | More than 5 transactions in the last hour |
The system returns a cumulative score and a list of triggered rule reasons.
| Characteristic | Assessment |
|---|---|
| Interpretability | High |
| Deployment speed | Fast |
| Training data required | None |
| Generalization | Limited |
| Evasion resistance | Low |
2. Supervised Learning
Train on labeled fraud examples.
Training approach:
- Apply SMOTE (Synthetic Minority Over-sampling Technique) to handle class imbalance, generating synthetic fraud examples
- Train a Gradient Boosting Classifier with regularization parameters (max_depth, min_samples_leaf) to prevent overfitting
- Use predict_proba to get fraud probability scores for new transactions
Imbalance Handling Techniques:
| Technique | Method |
|---|---|
| Oversampling (SMOTE) | Generate synthetic minority examples |
| Undersampling | Reduce majority class examples |
| Class weights | Penalize minority misclassification more heavily |
| Anomaly detection | Train on normal behavior only |
3. Anomaly Detection
Detect unusual patterns without labeled fraud data.
Isolation Forest approach:
- Train on normal (non-fraud) transactions only
- Set contamination parameter to expected anomaly rate (e.g., 1%)
- The model learns to isolate unusual patterns
- Score new transactions: negative scores indicate anomalies (more isolated points)
| Method | Application |
|---|---|
| Isolation Forest | General anomaly detection |
| Autoencoders | High-dimensional data |
| DBSCAN clustering | Finding fraud clusters |
| Z-score | Simple threshold-based detection |
4. Ensemble Approach
Combine multiple signals for robust detection.
Scoring process:
- Evaluate rule-based score and collect triggered reasons
- Get ML model probability and scale to 0-100
- Get anomaly score from Isolation Forest and normalize to 0-100
- Combine with weighted average: 30% rule score + 50% ML score + 20% anomaly score
- Return final score along with component scores and reasons for explainability
Evaluation Metrics
Classification Metrics
| Metric | Definition | Business Meaning |
|---|---|---|
| Precision | TP / (TP + FP) | Percentage of blocked transactions that were fraud |
| Recall | TP / (TP + FN) | Percentage of fraud that was caught |
| False Positive Rate | FP / (FP + TN) | Percentage of legitimate users blocked |
Business Metrics
| Metric | Formula |
|---|---|
| Loss prevented | Fraud amount blocked * recall |
| False decline cost | Legitimate blocked * avg value * margin |
| Review cost | Transactions reviewed * cost per review |
| Net benefit | Loss prevented - false decline cost - review cost |
Threshold Optimization
Cost-based threshold selection:
Define costs for each outcome: fraud loss (e.g., $100), false decline (e.g., $5), and review cost (e.g., $2).
For each candidate threshold:
- Calculate true positives (fraud correctly blocked), false positives (legitimate blocked), and false negatives (fraud missed)
- Compute net benefit: (fraud prevented x fraud_loss) - (missed fraud x fraud_loss) - (false declines x false_decline_cost)
- Select the threshold that maximizes net benefit
This approach optimizes for business outcomes rather than statistical metrics.
Real-Time Considerations
Latency Requirements
| Use Case | Latency Requirement |
|---|---|
| Payment authorization | < 100ms |
| Account login | < 200ms |
| Account creation | < 500ms |
Feature Store Architecture
| Feature Type | Update Frequency | Examples |
|---|---|---|
| Batch features | Hourly/daily | 30-day average, historical fraud rate |
| Streaming features | Real-time | Transaction count last 1h, distinct merchants 24h |
| Request features | At request time | Distance from last transaction, device match |
Model Lifecycle
Label Collection
| Source | Delay | Accuracy |
|---|---|---|
| Chargebacks | 30-90 days | High |
| Customer reports | 1-7 days | Medium |
| Manual review | Hours-days | High |
| Rule triggers | Immediate | Low-medium |
Retraining Triggers
- Performance degradation detected
- New fraud pattern identified
- Scheduled interval (weekly/monthly)
Explainability
Fraud decisions require explanation for manual review, customer disputes, and regulatory compliance.
SHAP (SHapley Additive exPlanations) approach:
- Create a TreeExplainer for the trained model
- Calculate SHAP values for each feature in the transaction
- Identify top contributing features (e.g., transaction_velocity_1h contributed 25% to the score, new_device contributed 20%)
- Present explanations alongside the fraud score for transparency
Summary
| Decision | Options | Recommendation |
|---|---|---|
| Approach | Rules, ML, Anomaly, Hybrid | Hybrid (rules + ML + anomaly) |
| Class imbalance | SMOTE, weights, undersampling | Combination (SMOTE + class weights) |
| Model | Logistic, GBM, Neural | GBM (interpretable, handles tabular) |
| Real-time features | Pre-compute, on-demand | Feature store (batch + streaming) |
| Threshold | Fixed, dynamic | Dynamic (optimize for business cost) |
| Explainability | None, SHAP, Rules | SHAP for ML, rule reasons for rules |