System Architecture

A deep dive into the fraud detection platform's design decisions and component architecture.

Design Principles

1. Latency Budget

Every millisecond matters in payment processing. The system is designed around a strict latency budget:

Operation	Budget	Actual
Redis feature lookup	2ms	1.5ms
Detection engine	3ms	2.8ms
Policy evaluation	2ms	1.2ms
Evidence capture	2ms (async)	1.8ms
Total	10ms	7.3ms

2. Fail-Safe Defaults

When components fail, the system degrades gracefully:

Redis down - Use cached features, log and proceed
PostgreSQL down - Queue evidence, don't block decisions
Policy error - Fall back to previous known-good policy

3. Idempotency

Every decision is idempotent. Retrying the same transaction_id returns the cached result, preventing duplicate processing.

Component Architecture

Feature Engine

Computes real-time features from velocity counters stored in Redis:

┌─────────────────────────────────────────────────┐
│                 Feature Engine                   │
├─────────────────────────────────────────────────┤
│  Card Velocity    │ Transactions per hour       │
│  Device Velocity  │ Cards per device per day    │
│  IP Velocity      │ Cards per IP per hour       │
│  User Velocity    │ Amount per user per day     │
└─────────────────────────────────────────────────┘

Redis Key Patterns:

velocity:card:{card_token}:1h    → Transaction count
velocity:device:{device_id}:24h → Unique cards seen
velocity:ip:{ip_hash}:1h        → Unique cards seen
velocity:user:{user_id}:24h     → Total amount

Detection Engine

Five parallel detectors analyze each transaction:

┌─────────────────────────────────────────────────┐
│               Detection Engine                   │
├──────────────┬──────────────┬──────────────────┤
│ Card Testing │   Velocity   │  Geographic      │
│   Detector   │   Detector   │   Detector       │
├──────────────┼──────────────┼──────────────────┤
│     Bot      │   Friendly   │                  │
│   Detector   │    Fraud     │                  │
└──────────────┴──────────────┴──────────────────┘

Each detector returns:

detected: bool - Whether the pattern was found
confidence: float - Confidence score (0-1)
signals: list - Specific triggers that fired

Risk Scoring

Combines detector outputs into actionable scores:

# Scoring formula (simplified)
criminal_score = max(
    card_testing.confidence * 0.9,
    velocity.confidence * 0.8,
    geo.confidence * 0.7,
    bot.confidence * 0.95
)

friendly_score = friendly_fraud.confidence * 0.6

overall_risk = criminal_score * 0.7 + friendly_score * 0.3

Policy Engine

Translates scores into business decisions using YAML configuration:

# config/policy.yaml
version: "1.0"

thresholds:
  block: 80      # Score >= 80 → BLOCK
  review: 60     # Score >= 60 → REVIEW
  friction: 40   # Score >= 40 → FRICTION
  # Score < 40 → ALLOW

rules:
  # High-value transactions from new users
  - name: new_user_high_value
    condition: "amount > 500 AND user_age_days < 7"
    action: FRICTION

  # Known bad actors
  - name: blocklist_match
    condition: "card_token IN blocklist"
    action: BLOCK

Hot-Reload Capability:

# Update policy without restart
curl -X POST http://localhost:8000/policy/reload

Evidence Vault

Immutable storage for dispute resolution:

CREATE TABLE evidence (
    id UUID PRIMARY KEY,
    transaction_id VARCHAR(64) UNIQUE NOT NULL,
    decision VARCHAR(16) NOT NULL,
    scores JSONB NOT NULL,
    signals JSONB NOT NULL,
    features JSONB NOT NULL,
    policy_version VARCHAR(16) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),

    -- Immutability constraint
    CONSTRAINT no_updates CHECK (true)
);

-- Prevent updates and deletes
CREATE RULE no_update AS ON UPDATE TO evidence DO INSTEAD NOTHING;
CREATE RULE no_delete AS ON DELETE TO evidence DO INSTEAD NOTHING;

Data Flow

Payment Gateway
      │
      ▼ POST /decide
┌─────────────────┐
│   Idempotency   │──▶ Return cached result if exists
│     Check       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│    Feature      │◀──▶ Redis: Get velocity counters
│    Engine       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Detection     │ Card Testing, Velocity, Geo, Bot, Friendly
│    Engine       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│     Risk        │ Combine signals into scores
│    Scoring      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│    Policy       │ Apply rules, determine decision
│    Engine       │
└────────┬────────┘
         │
         ├──────────▶ Response to gateway
         │
         ▼
┌─────────────────┐
│   Evidence      │──▶ PostgreSQL: Store immutable record
│    Capture      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│    Update       │──▶ Redis: Increment velocity counters
│   Profiles      │
└─────────────────┘

Monitoring Architecture

┌─────────────────────────────────────────────────┐
│                   Grafana                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │
│  │ Decision │  │ Latency  │  │   Volume     │  │
│  │  Rates   │  │  P50/99  │  │  per Hour    │  │
│  └──────────┘  └──────────┘  └──────────────┘  │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│                 Prometheus                       │
│  fraud_decisions_total{decision="ALLOW"}        │
│  fraud_decision_latency_seconds                 │
│  fraud_detector_triggered{type="card_testing"}  │
└─────────────────────────────────────────────────┘
                      ▲
                      │
┌─────────────────────────────────────────────────┐
│               Fraud Detection API               │
│            /metrics endpoint                    │
└─────────────────────────────────────────────────┘

Key Metrics

Metric	Description	Alert Threshold
fraud_decisions_total	Decision counts by type	Block rate > 15%
fraud_decision_latency_seconds	P50, P95, P99 latency	P99 > 50ms
fraud_detector_triggered	Detector fire rates	Card testing > 5%
fraud_redis_latency_seconds	Feature lookup time	P99 > 5ms
fraud_evidence_queue_size	Pending evidence writes	Size > 100

Scalability Considerations

Current (Sprint-1)

Single API instance
Single Redis instance
Local PostgreSQL

Production Path

API: Kubernetes deployment with HPA
Redis: Redis Cluster for sharding
PostgreSQL: Read replicas for evidence queries
Add Kafka for event sourcing
Add Flink for streaming features

The architecture is designed to scale horizontally without changing the core decision logic.