Skip to main content

Design a Fraud Detection System

Concepts tested: Class imbalance, anomaly detection, rule-based vs ML hybrid, velocity features, real-time scoring, precision-recall trade-offs, cost-sensitive learning, model explainability

Problem Statement

Design a fraud detection system for payments, account takeover, or fake account prevention.

Clarification Questions

QuestionDesign Impact
Fraud typeFeature engineering approach
Available actionsBlock, allow, review, step-up authentication
Latency requirementReal-time vs batch architecture
Cost structureLoss from fraud vs loss from false declines
Available dataTransaction history, device, behavioral, network

Problem Characteristics

Class Imbalance

Fraud rates are typically low:

  • Credit card fraud: 0.1% of transactions
  • Account takeover: 0.01% of logins
  • Fake accounts: 1-5% of signups

Implication: Standard accuracy metrics are misleading. A model predicting "not fraud" always achieves 99.9% accuracy while detecting zero fraud.

Adversarial Environment

Fraudsters actively adapt to detection methods:

  • Study detection patterns and develop workarounds
  • Share techniques in fraud communities
  • Use automation (bots, scripts)
  • Continuously evolve tactics

Error Costs

DecisionError Impact
Block legitimate userLost revenue, negative experience, churn
Allow fraudsterDirect financial loss
Manual review backlogOperational cost, delayed decisions

System Architecture

Real-Time Scoring Pipeline:

  1. Feature Enrichment: Add derived features to the transaction
  2. Rules Engine: Apply deterministic rules
  3. ML Models: Run supervised, anomaly detection, and graph-based models
  4. Output: Combined score with explanation

Decision Engine:

  • Score below 20: ALLOW the transaction
  • Score 20-70: Send to REVIEW queue for manual inspection
  • Score 70 or above: BLOCK the transaction

Async Pipeline (batch analysis):

  • Pattern Mining: Discover new fraud tactics from historical data
  • Network Analysis: Identify fraud rings through connection analysis
  • Label Feedback: Update models with confirmed fraud/non-fraud labels

Feature Engineering

Transaction Features

Feature CategoryExamples
Transaction detailsAmount, merchant category, payment method
Amount patternsAverage transaction size, deviation from typical
VelocityTransactions in last 1h/24h/7d
GeographicDistance from typical location, high-risk countries
TemporalTime of day, day of week, holiday

User Behavior Features

FeatureSignal
Device fingerprintNew vs known device
Login patternsUnusual login times, locations
Session behaviorClick patterns, navigation speed
Account ageNew accounts are higher risk
Historical fraudPast fraud/disputes on account

Aggregated Features

Velocity features are computed by aggregating transaction history over various time windows:

FeatureDescription
Transaction count (1h/24h)Number of transactions in the last 1 hour or 24 hours
Transaction amount (1h)Total transaction amount in the last hour
Distinct merchants (24h)Number of unique merchants in last 24 hours
Distinct devices (7d)Number of unique devices used in last 7 days
Average transaction amount (30d)Historical average transaction size
Amount vs average ratioCurrent transaction amount divided by 30-day average

Network/Graph Features

Fraudsters often operate in networks.

FeatureDescription
Shared device countNumber of users sharing same device
Shared address countNumber of accounts at same address
Shared payment methodNumber of accounts using same card
Network cluster riskAggregate risk of connected entities

Model Approaches

1. Rule-Based System

A rule-based fraud detection system evaluates transactions against predefined conditions:

RulePointsTrigger Condition
High amount new account+30Account less than 7 days old AND transaction over $500
Unusual location+20Distance from typical location exceeds 1000 miles
High velocity+25More than 5 transactions in the last hour

The system returns a cumulative score and a list of triggered rule reasons.

CharacteristicAssessment
InterpretabilityHigh
Deployment speedFast
Training data requiredNone
GeneralizationLimited
Evasion resistanceLow

2. Supervised Learning

Train on labeled fraud examples.

Training approach:

  1. Apply SMOTE (Synthetic Minority Over-sampling Technique) to handle class imbalance, generating synthetic fraud examples
  2. Train a Gradient Boosting Classifier with regularization parameters (max_depth, min_samples_leaf) to prevent overfitting
  3. Use predict_proba to get fraud probability scores for new transactions

Imbalance Handling Techniques:

TechniqueMethod
Oversampling (SMOTE)Generate synthetic minority examples
UndersamplingReduce majority class examples
Class weightsPenalize minority misclassification more heavily
Anomaly detectionTrain on normal behavior only

3. Anomaly Detection

Detect unusual patterns without labeled fraud data.

Isolation Forest approach:

  1. Train on normal (non-fraud) transactions only
  2. Set contamination parameter to expected anomaly rate (e.g., 1%)
  3. The model learns to isolate unusual patterns
  4. Score new transactions: negative scores indicate anomalies (more isolated points)
MethodApplication
Isolation ForestGeneral anomaly detection
AutoencodersHigh-dimensional data
DBSCAN clusteringFinding fraud clusters
Z-scoreSimple threshold-based detection

4. Ensemble Approach

Combine multiple signals for robust detection.

Scoring process:

  1. Evaluate rule-based score and collect triggered reasons
  2. Get ML model probability and scale to 0-100
  3. Get anomaly score from Isolation Forest and normalize to 0-100
  4. Combine with weighted average: 30% rule score + 50% ML score + 20% anomaly score
  5. Return final score along with component scores and reasons for explainability

Evaluation Metrics

Classification Metrics

MetricDefinitionBusiness Meaning
PrecisionTP / (TP + FP)Percentage of blocked transactions that were fraud
RecallTP / (TP + FN)Percentage of fraud that was caught
False Positive RateFP / (FP + TN)Percentage of legitimate users blocked

Business Metrics

MetricFormula
Loss preventedFraud amount blocked * recall
False decline costLegitimate blocked * avg value * margin
Review costTransactions reviewed * cost per review
Net benefitLoss prevented - false decline cost - review cost

Threshold Optimization

Cost-based threshold selection:

Define costs for each outcome: fraud loss (e.g., $100), false decline (e.g., $5), and review cost (e.g., $2).

For each candidate threshold:

  1. Calculate true positives (fraud correctly blocked), false positives (legitimate blocked), and false negatives (fraud missed)
  2. Compute net benefit: (fraud prevented x fraud_loss) - (missed fraud x fraud_loss) - (false declines x false_decline_cost)
  3. Select the threshold that maximizes net benefit

This approach optimizes for business outcomes rather than statistical metrics.

Real-Time Considerations

Latency Requirements

Use CaseLatency Requirement
Payment authorization< 100ms
Account login< 200ms
Account creation< 500ms

Feature Store Architecture

Feature TypeUpdate FrequencyExamples
Batch featuresHourly/daily30-day average, historical fraud rate
Streaming featuresReal-timeTransaction count last 1h, distinct merchants 24h
Request featuresAt request timeDistance from last transaction, device match

Model Lifecycle

Label Collection

SourceDelayAccuracy
Chargebacks30-90 daysHigh
Customer reports1-7 daysMedium
Manual reviewHours-daysHigh
Rule triggersImmediateLow-medium

Retraining Triggers

  • Performance degradation detected
  • New fraud pattern identified
  • Scheduled interval (weekly/monthly)

Explainability

Fraud decisions require explanation for manual review, customer disputes, and regulatory compliance.

SHAP (SHapley Additive exPlanations) approach:

  1. Create a TreeExplainer for the trained model
  2. Calculate SHAP values for each feature in the transaction
  3. Identify top contributing features (e.g., transaction_velocity_1h contributed 25% to the score, new_device contributed 20%)
  4. Present explanations alongside the fraud score for transparency

Summary

DecisionOptionsRecommendation
ApproachRules, ML, Anomaly, HybridHybrid (rules + ML + anomaly)
Class imbalanceSMOTE, weights, undersamplingCombination (SMOTE + class weights)
ModelLogistic, GBM, NeuralGBM (interpretable, handles tabular)
Real-time featuresPre-compute, on-demandFeature store (batch + streaming)
ThresholdFixed, dynamicDynamic (optimize for business cost)
ExplainabilityNone, SHAP, RulesSHAP for ML, rule reasons for rules