Skip to main content

Design a Fraud Detection System

Design a machine learning system to detect fraudulent credit card transactions in real-time.

Requirements

Functional:

  • Score transactions in real-time
  • Decide: approve, decline, or challenge
  • Support manual review workflow
  • Learn from feedback

Non-functional:

  • Latency < 100ms per transaction
  • Process 10,000+ transactions per second
  • 99.9% availability
  • High precision (minimize false declines)

Metrics

Model Metrics

MetricDescriptionTarget
PrecisionFraud caught / Flagged as fraud> 90%
RecallFraud caught / Total fraud> 70%
AUC-ROCOverall discrimination> 0.95
False Positive RateGood transactions declined< 0.5%

Business Metrics

MetricDescription
Fraud Loss RateFraud $ / Total $
False Decline RateGood $ declined / Total good $
Operational CostCost of manual review
Customer FrictionChallenges sent to customers

Architecture

Loading diagram...

Feature Engineering

Transaction Features

FeatureTypeDescription
amountNumericalTransaction amount
amount_zscoreNumericalAmount vs user's typical
merchant_categoryCategoricalMCC code
transaction_typeCategoricalOnline, in-store, ATM
card_presentBinaryPhysical card used
cvv_providedBinaryCVV entered
internationalBinaryCross-border transaction

User Profile Features

FeatureTypeDescription
avg_transaction_amountNumericalHistorical average
transaction_frequencyNumericalTransactions per week
typical_merchantsEmbeddingUsual merchant types
typical_locationsEmbeddingUsual transaction locations
account_ageNumericalDays since account opened
historical_fraudBinaryPrevious fraud on account

Behavioral Features

FeatureTypeDescription
time_since_last_txnNumericalMinutes since last transaction
velocity_1hNumericalTransactions in last hour
velocity_24hNumericalTransactions in last 24 hours
amount_velocity_1hNumericalAmount spent in last hour
new_merchantBinaryFirst time at this merchant
new_locationBinaryFirst time in this location
distance_from_lastNumericalMiles from last transaction

Contextual Features

FeatureTypeDescription
hour_of_dayCategoricalTransaction hour
day_of_weekCategoricalTransaction day
is_weekendBinaryWeekend transaction
device_fingerprintHashDevice identifier
ip_risk_scoreNumericalIP reputation

Model Architecture

Ensemble Approach

Loading diagram...

Training with Imbalanced Data

Fraud is rare (under 1% of transactions). Techniques:

TechniqueDescriptionProsCons
Class WeightsPenalize misclassifying fraud more heavily (e.g., 100:1)Simple, no data changeCan cause overfitting
SMOTEGenerate synthetic fraud samples by interpolationBalances training dataMay create unrealistic samples
UndersamplingReduce legitimate samples to match fraud countFast trainingLoses legitimate patterns
Balanced BaggingTrain multiple models on balanced subsetsRobust, preserves all dataMore complex

Recommendation: Start with class weights, add SMOTE if needed.

Serving

Low-Latency Architecture

Loading diagram...

Decision Engine

Score-based decisions with context-aware thresholds:

Score RangeLow Value (under $500)High Value ($500+)Very High ($1000+)
0.9 - 1.0DeclineDeclineDecline
0.7 - 0.9DeclineDeclineDecline
0.5 - 0.7ApproveChallenge (3D Secure)Challenge
0.3 - 0.5ApproveApproveChallenge
0.0 - 0.3ApproveApproveApprove

Hard Rules (override ML):

  • Amount exceeds credit limit -> Decline
  • Card in blacklist -> Decline
  • Merchant in blacklist -> Decline

Feedback Loop

Labeling Strategy

LabelSourceTimingConfidence
Confirmed Fraud
Chargeback fraudCustomer dispute30-90 daysHigh
Bank confirmedIssuing bank1-7 daysVery High
User reportedUser notificationHoursHigh
Confirmed Legitimate
No dispute 90 daysTime passed90 daysHigh
User confirmedUser verificationHoursVery High
Merchant verifiedMerchant callback1-7 daysHigh

Continuous Learning

Loading diagram...

Monitoring

Real-time Dashboards

MetricPurposeTarget
Fraud score distributionModel calibrationNormal distribution
Decision distributionApprove/decline/challenge ratesDecline < 5%
Latency percentilesService healthp99 < 50ms
False positive rate estimateCustomer experience< 0.5%
Feature driftModel stalenessWithin 2 std deviations

Alerts

AlertConditionSeverity
High decline rateDecline > 5%High
Model latencyp99 > 100msCritical
Score driftDistribution shiftMedium

Reference

TopicDescription
Precision vs recall trade-offCatching more fraud increases false declines. Balance based on business cost.
Latency vs accuracyFaster decisions require fewer features. Trade-off based on transaction value.
Automation vs reviewFull automation scales but misses edge cases. Reserve review for uncertain decisions.
Generic vs specific modelsOne model is simpler. Specialized models (card-present vs online) may improve accuracy.