System Architecture
A deep dive into the fraud detection platform's design decisions and component architecture.
Design Principles
1. Latency Budget
Every millisecond matters in payment processing. The system is designed around a strict latency budget:
| Operation | Budget | Actual |
|---|---|---|
| Redis feature lookup | 2ms | 1.5ms |
| Detection engine | 3ms | 2.8ms |
| Policy evaluation | 2ms | 1.2ms |
| Evidence capture | 2ms (async) | 1.8ms |
| Total | 10ms | 7.3ms |
2. Fail-Safe Defaults
When components fail, the system degrades gracefully:
- Redis down - Use cached features, log and proceed
- PostgreSQL down - Queue evidence, don't block decisions
- Policy error - Fall back to previous known-good policy
3. Idempotency
Every decision is idempotent. Retrying the same transaction_id returns the cached result, preventing duplicate processing.
Component Architecture
Feature Engine
Computes real-time features from velocity counters stored in Redis:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā Feature Engine ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā⤠ā Card Velocity ā Transactions per hour ā ā Device Velocity ā Cards per device per day ā ā IP Velocity ā Cards per IP per hour ā ā User Velocity ā Amount per user per day ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Redis Key Patterns:
velocity:card:{card_token}:1h ā Transaction count
velocity:device:{device_id}:24h ā Unique cards seen
velocity:ip:{ip_hash}:1h ā Unique cards seen
velocity:user:{user_id}:24h ā Total amountDetection Engine
Five parallel detectors analyze each transaction:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā Detection Engine ā āāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāā⤠ā Card Testing ā Velocity ā Geographic ā ā Detector ā Detector ā Detector ā āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāā⤠ā Bot ā Friendly ā ā ā Detector ā Fraud ā ā āāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāā
Each detector returns:
detected: bool- Whether the pattern was foundconfidence: float- Confidence score (0-1)signals: list- Specific triggers that fired
Risk Scoring
Combines detector outputs into actionable scores:
# Scoring formula (simplified)
criminal_score = max(
card_testing.confidence * 0.9,
velocity.confidence * 0.8,
geo.confidence * 0.7,
bot.confidence * 0.95
)
friendly_score = friendly_fraud.confidence * 0.6
overall_risk = criminal_score * 0.7 + friendly_score * 0.3Policy Engine
Translates scores into business decisions using YAML configuration:
# config/policy.yaml
version: "1.0"
thresholds:
block: 80 # Score >= 80 ā BLOCK
review: 60 # Score >= 60 ā REVIEW
friction: 40 # Score >= 40 ā FRICTION
# Score < 40 ā ALLOW
rules:
# High-value transactions from new users
- name: new_user_high_value
condition: "amount > 500 AND user_age_days < 7"
action: FRICTION
# Known bad actors
- name: blocklist_match
condition: "card_token IN blocklist"
action: BLOCKHot-Reload Capability:
# Update policy without restart curl -X POST http://localhost:8000/policy/reload
Evidence Vault
Immutable storage for dispute resolution:
CREATE TABLE evidence (
id UUID PRIMARY KEY,
transaction_id VARCHAR(64) UNIQUE NOT NULL,
decision VARCHAR(16) NOT NULL,
scores JSONB NOT NULL,
signals JSONB NOT NULL,
features JSONB NOT NULL,
policy_version VARCHAR(16) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
-- Immutability constraint
CONSTRAINT no_updates CHECK (true)
);
-- Prevent updates and deletes
CREATE RULE no_update AS ON UPDATE TO evidence DO INSTEAD NOTHING;
CREATE RULE no_delete AS ON DELETE TO evidence DO INSTEAD NOTHING;Data Flow
Payment Gateway
ā
ā¼ POST /decide
āāāāāāāāāāāāāāāāāāā
ā Idempotency āāāā¶ Return cached result if exists
ā Check ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Feature āāāāā¶ Redis: Get velocity counters
ā Engine ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Detection ā Card Testing, Velocity, Geo, Bot, Friendly
ā Engine ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Risk ā Combine signals into scores
ā Scoring ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Policy ā Apply rules, determine decision
ā Engine ā
āāāāāāāāāā¬āāāāāāāāā
ā
āāāāāāāāāāāā¶ Response to gateway
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Evidence āāāā¶ PostgreSQL: Store immutable record
ā Capture ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā Update āāāā¶ Redis: Increment velocity counters
ā Profiles ā
āāāāāāāāāāāāāāāāāāāMonitoring Architecture
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Grafana ā
ā āāāāāāāāāāāā āāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
ā ā Decision ā ā Latency ā ā Volume ā ā
ā ā Rates ā ā P50/99 ā ā per Hour ā ā
ā āāāāāāāāāāāā āāāāāāāāāāāā āāāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Prometheus ā
ā fraud_decisions_total{decision="ALLOW"} ā
ā fraud_decision_latency_seconds ā
ā fraud_detector_triggered{type="card_testing"} ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā²
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Fraud Detection API ā
ā /metrics endpoint ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāKey Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| fraud_decisions_total | Decision counts by type | Block rate > 15% |
| fraud_decision_latency_seconds | P50, P95, P99 latency | P99 > 50ms |
| fraud_detector_triggered | Detector fire rates | Card testing > 5% |
| fraud_redis_latency_seconds | Feature lookup time | P99 > 5ms |
| fraud_evidence_queue_size | Pending evidence writes | Size > 100 |
Scalability Considerations
Current (Sprint-1)
- Single API instance
- Single Redis instance
- Local PostgreSQL
Production Path
- API: Kubernetes deployment with HPA
- Redis: Redis Cluster for sharding
- PostgreSQL: Read replicas for evidence queries
- Add Kafka for event sourcing
- Add Flink for streaming features
The architecture is designed to scale horizontally without changing the core decision logic.