Telco Payment Fraud Detection Platform
Principal-Level, Production-Grade Design Document
Phase 1: Real-Time Payment Fraud Decisioning for Telecom / MSP
Overview
This documentation provides a comprehensive, implementation-ready design for a Telco Payment Fraud Detection Platform. The design targets payment fraud in Telco/MSP environments including SIM farm attacks, device resale fraud, card testing, and account takeover via SIM swap. Built to survive real attackers, regulatory scrutiny, business pressure, and production failure.
Target Architecture
- Real-time decisioning: <200ms end-to-end latency
- Exactly-once semantics: Idempotent processing, no duplicate effects
- Net revenue optimization: Profit-based thresholds, not just fraud blocking
- Adversarial resilience: Probing detection, threshold rotation, safe mode
- Governance-ready: PCI/PII boundaries, audit trails, evidence immutability
Document Structure
Part 1: Technology Stack & Architecture
- Battle-hardened technology stack with constraint mappings
- System architecture diagram with data flows
- Event types and idempotency design
- Feedback loop and label hygiene
- Model lifecycle and rollback paths
- Failure and attack response controls
- Latency budget breakdown
Part 2: External Entities, Data Schemas & Features
- Entity profiling architecture (User, Device, Card, IP, Service)
- Redis data structures and key patterns
- Canonical PaymentEvent schema
- Complete feature catalog with formulas
- Streaming feature computation (Flink pseudo-code)
- BIN/issuer intelligence integration
- PII/PCI compliance boundaries
- Dispute network integration
Part 3: Detection Logic & Policy Engine
- Criminal fraud detection
- Card testing / BIN attack detection
- Velocity attack detection
- Geographic anomaly detection
- Bot / automation detection
- Friendly fraud detection
- Historical abuse scoring
- Behavioral consistency analysis
- Combined risk scoring with ML integration
- Policy engine architecture
- YAML configuration model
- OPA Rego policies
- Profit-based threshold optimization
- Champion/challenger framework
Part 4: Evidence Pipeline, Disputes & Economics
- Evidence vault architecture
- Complete evidence schema
- Capture service implementation
- Immutability enforcement
- Dispute pipeline
- Chargeback ingestion and linking
- Representment automation
- Dispute outcome processing
- Training data pipeline
- Labeled dataset generation
- Point-in-time feature retrieval
- Economic optimization service
- Approval-loss trade-off analysis
- Risk budget management
- Business user interface (API)
- Key performance metrics
Part 5: Testing, Validation, Monitoring & Checklist
- Offline validation and replay testing
- Historical replay framework
- Model validation pipeline
- Pre-production acceptance criteria
- Sprint-1 go/no-go checklist
- Load testing configuration
- Production monitoring and alerting
- Grafana dashboard configuration
- Alert rules (Prometheus/Alertmanager)
- Metrics collection code
- Sprint-1 implementation checklist
- Infrastructure setup
- Core services
- Data pipelines
- Model and policy
- Monitoring and observability
- Testing and validation
- Documentation
- Go-live checklist
Part 6: Sprint-1 Implementation Guide
- Working MVP implementation
- FastAPI decision endpoint
- Redis velocity counters
- PostgreSQL evidence storage
- Detection scenarios covered
- Card testing attacks
- Velocity attacks
- Geographic anomalies
- Bot/automation detection
- Friendly fraud scoring
- Policy configuration
- Score thresholds
- Built-in rules
- Blocklists and allowlists
- Getting started guide
- Docker Compose setup
- Environment configuration
- API reference
Part 7: Demo Dashboard
- Professional Streamlit dashboard for demos
- Transaction simulator with attack presets
- Normal transactions
- Card testing attacks
- Velocity attacks
- Geographic anomalies
- Bot/emulator attacks
- Friendly fraud scenarios
- Score visualization
- Interactive gauge charts
- Color-coded risk levels
- Detailed score breakdowns
- Analytics dashboard
- Decision distribution charts
- Hourly volume graphs
- Latency monitoring
- Decision history from PostgreSQL
- Policy inspector with YAML viewer
Quick Start
Sprint-1 Scope
Sprint-1 delivers a minimal viable production design for the core payment fraud slice:
- Real-time decision API (<200ms)
- Velocity features (card, device, IP, user)
- Criminal fraud detection (card testing, velocity, geo, bot)
- Friendly fraud scoring (historical abuse, behavioral)
- Configurable policy engine
- Immutable evidence vault
- Basic chargeback ingestion
- Production monitoring and alerting
Key Technologies
| Component | Choice |
|---|---|
| Streaming | Apache Kafka |
| Stream Processing | Apache Flink |
| Fast State | Redis Cluster |
| Feature Store | Feast + Delta Lake |
| Policy Engine | Open Policy Agent (OPA) |
| Model Serving | Seldon Core / KServe |
| Evidence Storage | PostgreSQL + S3 |
| Observability | Prometheus + Grafana |
Non-Negotiable Constraints
- Latency: <200ms end-to-end (P99)
- Idempotency: Exactly-once business effects
- Availability: 99.9% uptime
- PCI Compliance: No raw PAN in fraud platform
Key Metrics
| Metric | Target |
|---|---|
| Approval Rate | >92% |
| Fraud Detection Rate | >70% of known fraud |
| False Positive Rate | <10% of blocks |
| P99 Latency | <200ms |
| Dispute Win Rate | >50% |
Phase 2 (Future)
- IRSF (International Revenue Share Fraud) - Enhanced detection
- Account Takeover (ATO) - SIM swap correlation
- Subscription Fraud - Multi-account abuse
- Batch & Long-Horizon Analytics
- Automated representment
- Advanced ML (graph neural networks, sequence models for fraud rings)
Author
Uday Tamma
Principal-level design document for Telecom/MSP fraud detection platform.
Document Version: 1.1
Last Updated: January 2026