Telco Payment Fraud Detection Platform

Principal-Level, Production-Grade Design Document

Phase 1: Real-Time Payment Fraud Decisioning for Telecom / MSP


Overview

This documentation provides a comprehensive, implementation-ready design for a Telco Payment Fraud Detection Platform. The design targets payment fraud in Telco/MSP environments including SIM farm attacks, device resale fraud, card testing, and account takeover via SIM swap. Built to survive real attackers, regulatory scrutiny, business pressure, and production failure.

Target Architecture

  • Real-time decisioning: <200ms end-to-end latency
  • Exactly-once semantics: Idempotent processing, no duplicate effects
  • Net revenue optimization: Profit-based thresholds, not just fraud blocking
  • Adversarial resilience: Probing detection, threshold rotation, safe mode
  • Governance-ready: PCI/PII boundaries, audit trails, evidence immutability

Document Structure

Part 1: Technology Stack & Architecture

  • Battle-hardened technology stack with constraint mappings
  • System architecture diagram with data flows
  • Event types and idempotency design
  • Feedback loop and label hygiene
  • Model lifecycle and rollback paths
  • Failure and attack response controls
  • Latency budget breakdown

Part 2: External Entities, Data Schemas & Features

  • Entity profiling architecture (User, Device, Card, IP, Service)
  • Redis data structures and key patterns
  • Canonical PaymentEvent schema
  • Complete feature catalog with formulas
  • Streaming feature computation (Flink pseudo-code)
  • BIN/issuer intelligence integration
  • PII/PCI compliance boundaries
  • Dispute network integration

Part 3: Detection Logic & Policy Engine

  • Criminal fraud detection
    • Card testing / BIN attack detection
    • Velocity attack detection
    • Geographic anomaly detection
    • Bot / automation detection
  • Friendly fraud detection
    • Historical abuse scoring
    • Behavioral consistency analysis
  • Combined risk scoring with ML integration
  • Policy engine architecture
    • YAML configuration model
    • OPA Rego policies
    • Profit-based threshold optimization
  • Champion/challenger framework

Part 4: Evidence Pipeline, Disputes & Economics

  • Evidence vault architecture
    • Complete evidence schema
    • Capture service implementation
    • Immutability enforcement
  • Dispute pipeline
    • Chargeback ingestion and linking
    • Representment automation
    • Dispute outcome processing
  • Training data pipeline
    • Labeled dataset generation
    • Point-in-time feature retrieval
  • Economic optimization service
    • Approval-loss trade-off analysis
    • Risk budget management
    • Business user interface (API)
  • Key performance metrics

Part 5: Testing, Validation, Monitoring & Checklist

  • Offline validation and replay testing
    • Historical replay framework
    • Model validation pipeline
  • Pre-production acceptance criteria
    • Sprint-1 go/no-go checklist
    • Load testing configuration
  • Production monitoring and alerting
    • Grafana dashboard configuration
    • Alert rules (Prometheus/Alertmanager)
    • Metrics collection code
  • Sprint-1 implementation checklist
    • Infrastructure setup
    • Core services
    • Data pipelines
    • Model and policy
    • Monitoring and observability
    • Testing and validation
    • Documentation
    • Go-live checklist

Part 6: Sprint-1 Implementation Guide

  • Working MVP implementation
    • FastAPI decision endpoint
    • Redis velocity counters
    • PostgreSQL evidence storage
  • Detection scenarios covered
    • Card testing attacks
    • Velocity attacks
    • Geographic anomalies
    • Bot/automation detection
    • Friendly fraud scoring
  • Policy configuration
    • Score thresholds
    • Built-in rules
    • Blocklists and allowlists
  • Getting started guide
    • Docker Compose setup
    • Environment configuration
    • API reference

Part 7: Demo Dashboard

  • Professional Streamlit dashboard for demos
  • Transaction simulator with attack presets
    • Normal transactions
    • Card testing attacks
    • Velocity attacks
    • Geographic anomalies
    • Bot/emulator attacks
    • Friendly fraud scenarios
  • Score visualization
    • Interactive gauge charts
    • Color-coded risk levels
    • Detailed score breakdowns
  • Analytics dashboard
    • Decision distribution charts
    • Hourly volume graphs
    • Latency monitoring
  • Decision history from PostgreSQL
  • Policy inspector with YAML viewer

Quick Start

Sprint-1 Scope

Sprint-1 delivers a minimal viable production design for the core payment fraud slice:

  1. Real-time decision API (<200ms)
  2. Velocity features (card, device, IP, user)
  3. Criminal fraud detection (card testing, velocity, geo, bot)
  4. Friendly fraud scoring (historical abuse, behavioral)
  5. Configurable policy engine
  6. Immutable evidence vault
  7. Basic chargeback ingestion
  8. Production monitoring and alerting

Key Technologies

ComponentChoice
StreamingApache Kafka
Stream ProcessingApache Flink
Fast StateRedis Cluster
Feature StoreFeast + Delta Lake
Policy EngineOpen Policy Agent (OPA)
Model ServingSeldon Core / KServe
Evidence StoragePostgreSQL + S3
ObservabilityPrometheus + Grafana

Non-Negotiable Constraints

  • Latency: <200ms end-to-end (P99)
  • Idempotency: Exactly-once business effects
  • Availability: 99.9% uptime
  • PCI Compliance: No raw PAN in fraud platform

Key Metrics

MetricTarget
Approval Rate>92%
Fraud Detection Rate>70% of known fraud
False Positive Rate<10% of blocks
P99 Latency<200ms
Dispute Win Rate>50%

Phase 2 (Future)

  • IRSF (International Revenue Share Fraud) - Enhanced detection
  • Account Takeover (ATO) - SIM swap correlation
  • Subscription Fraud - Multi-account abuse
  • Batch & Long-Horizon Analytics
  • Automated representment
  • Advanced ML (graph neural networks, sequence models for fraud rings)

Author

Uday Tamma

Principal-level design document for Telecom/MSP fraud detection platform.

Document Version: 1.1
Last Updated: January 2026