Telco Payment Fraud Detection Platform

Executive Overview

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST

Problem & Context

Business Context

A mid-size Telco/MSP processes ~150M payment authorization attempts per year across prepaid top-ups, postpaid billing, device financing, and value-added services. The platform handles SIM activations, device upgrades, mobile top-ups, and international service enablement.

Challenge	Current State	Business Impact
Annual Fraud Loss	$2.4M+ (1.8% of payment volume)	Direct P&L hit
False Positive Rate	18% of blocks are legitimate	$800K+ lost revenue annually
Decision Latency	2-3 seconds (batch scoring)	Poor UX, cart abandonment
Manual Review Volume	12% of transactions	$400K+ ops cost, 4-hour SLA
Chargeback Win Rate	22%	Recoverable losses left on table

Root Cause Analysis

Batch-based detection cannot catch velocity attacks that complete in minutes
Static rules cannot adapt to evolving fraud patterns (SIM farms, device resale rings)
Insufficient evidence capture leads to losing winnable disputes
No profit-based thresholds results in over-blocking legitimate customers

Goals & Constraints

Target Metrics

Metric	Current	Target	Constraint
Approval Rate	88%	>92%	Cannot drop below 90%
Fraud Rate	1.8%	<0.8%	Industry benchmark
P99 Latency	2,300ms	<200ms	Hard SLA requirement
Manual Review	12%	<3%	Ops budget constraint
Dispute Win Rate	22%	>50%	Evidence quality dependent
False Positive Rate	18%	<10%	Customer experience KPI

Non-Negotiable Constraints

<200ms P99 latency - Payments cannot wait for fraud decisions
Exactly-once semantics - No duplicate charges or blocks
PCI/PII compliance - No raw PAN in fraud platform
99.9% availability - Revenue-critical path

Internal Latency Budget

Component	Budget	Actual
Feature lookup (Redis)	50ms	50ms
Detection engine	30ms	20ms
Risk scoring	20ms	20ms
Policy evaluation	15ms	10ms
Evidence capture (async)	30ms	20ms
Total E2E	<200ms	106ms

Solution at a Glance

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Payment Gateway                         │
└─────────────────────────────┬───────────────────────────────┘
                              │ POST /decide (<200ms)
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Fraud Detection API                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ Feature  │  │Detection │  │  Risk    │  │  Policy  │    │
│  │ Engine   │  │ Engine   │  │ Scoring  │  │  Engine  │    │
│  │  (50ms)  │  │  (20ms)  │  │  (20ms)  │  │  (10ms)  │    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘    │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
        │             │             │             │
   ┌────▼────┐   ┌────▼────┐                 ┌────▼────┐
   │  Redis  │   │ Detect  │                 │  YAML   │
   │Velocity │   │5 Signal │                 │Hot-Load │
   │Counters │   │ Types   │                 │ Config  │
   └─────────┘   └─────────┘                 └─────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   PostgreSQL    │
                    │ Evidence Vault  │
                    └─────────────────┘

Key Design Choices

Decision	Choice	Rationale
Streaming vs. Batch	Real-time API	Velocity attacks complete in minutes
ML vs. Rules (Phase 1)	Rule-based with ML hooks	Faster to market, interpretable
Feature Store	Redis velocity counters	Sub-ms lookups, sliding windows
Policy Engine	YAML + hot-reload	Business can adjust without deploys
Evidence Storage	PostgreSQL (immutable)	Dispute representment requirement

Detection Coverage (5 Signal Types)

Card Testing - Rapid small transactions, BIN probing, decline patterns
Velocity Attacks - Multi-card device, multi-device card, IP clustering
Geographic Anomaly - Country mismatch, impossible travel, datacenter IPs
Bot/Automation - Emulators, rooted devices, Tor exit nodes
Friendly Fraud - Historical chargebacks, refund abuse patterns

Phased Roadmap

Phase 1: MVP (Sprint 1-2) - COMPLETE

Real-time decisioning foundation with rule-based detection.

Deliverables:

Decision API with <200ms P99 latency
5 detection signal types
Redis velocity counters (card, device, IP, user)
YAML policy engine with hot-reload
Immutable evidence vault
Prometheus/Grafana monitoring
45+ unit tests, load tested to 1000+ RPS

Current Status: MVP complete, ready for shadow deployment

Phase 2: Hybrid ML + Experiments (Sprint 3-4)

Layer ML scoring while maintaining policy control.

Deliverables:

XGBoost/LightGBM criminal fraud model
Champion/challenger experiment framework
Historical replay for threshold simulation
Economic optimization UI for business users
Automated chargeback ingestion and labeling

ML Model Specification:

Features: 25+ velocity + behavioral + entity features
Labels: Chargebacks linked with 120-day maturity window
Training: Weekly retraining with point-in-time features
Deployment: Shadow mode first, then 10% traffic ramp

Phase 3: Scale & External Signals (Sprint 5-6)

Production hardening and expanded detection.

Deliverables:

Multi-region deployment (Redis Cluster, PostgreSQL replicas, Kafka event sourcing)
External signal integration: TC40/SAFE issuer alerts, Ethoca/Verifi network alerts, BIN intelligence and device reputation, Consortium fraud data
Enhanced analyst tooling (case management, bulk actions, playbooks)
IRSF detection for international calls
SIM swap correlation for ATO detection

Impact Summary

Projected Before/After Metrics

Metric	Before	After (Phase 1)	After (Phase 2)	Methodology
Approval Rate	88%	91%	93%	Threshold optimization
Fraud Rate	1.80%	1.20%	0.75%	Velocity detection
P99 Latency	2,300ms	106ms	120ms	Measured in load test
Manual Review	12%	5%	2%	Automation + confidence
False Positives	18%	12%	8%	Better signals
Dispute Win Rate	22%	40%	55%	Evidence capture

Financial Impact Model

Line Item	Annual Impact
Fraud loss reduction (1.05% improvement)	+$1,400,000
False positive recovery (6% improvement)	+$300,000
Ops cost reduction (10% less manual review)	+$200,000
Dispute win improvement (+28% win rate)	+$150,000
Net Annual Benefit	+$2,050,000

Risk Assessment

Risk	Likelihood	Impact	Mitigation
Redis failure	Low	High	Fallback to safe mode, cached features
ML model drift	Medium	Medium	Weekly retraining, PSI monitoring
Threshold misconfiguration	Medium	High	Replay testing, gradual rollout
Attack pattern evolution	High	Medium	Champion/challenger experiments
Integration delays	Medium	Medium	Shadow mode allows parallel testing

Executive Recommendation

Proceed with Phase 2 deployment based on:

Phase 1 MVP meets all technical SLAs (106ms P99 vs 200ms target)
Load testing validates 1000+ RPS capacity (4x current peak)
Rule-based detection provides immediate value while ML matures
Evidence capture infrastructure enables dispute win rate improvement
Hot-reload policy allows business-led threshold tuning

Next Actions:

Shadow deployment to production traffic (week 1)
ML model training with labeled historical data (weeks 1-2)
Champion/challenger framework implementation (weeks 2-3)
10% traffic experiment with ML scoring (week 4)

This document is intended for VP/Director-level stakeholders. For technical details, see the Technical Overview section.

← Overview TPM Execution Strategy →