Telco Payment Fraud Detection Platform

Executive Overview

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST


Problem & Context

Business Context

A mid-size Telco/MSP processes ~150M payment authorization attempts per year across prepaid top-ups, postpaid billing, device financing, and value-added services. The platform handles SIM activations, device upgrades, mobile top-ups, and international service enablement.

ChallengeCurrent StateBusiness Impact
Annual Fraud Loss$2.4M+ (1.8% of payment volume)Direct P&L hit
False Positive Rate18% of blocks are legitimate$800K+ lost revenue annually
Decision Latency2-3 seconds (batch scoring)Poor UX, cart abandonment
Manual Review Volume12% of transactions$400K+ ops cost, 4-hour SLA
Chargeback Win Rate22%Recoverable losses left on table

Root Cause Analysis

  1. Batch-based detection cannot catch velocity attacks that complete in minutes
  2. Static rules cannot adapt to evolving fraud patterns (SIM farms, device resale rings)
  3. Insufficient evidence capture leads to losing winnable disputes
  4. No profit-based thresholds results in over-blocking legitimate customers

Goals & Constraints

Target Metrics

MetricCurrentTargetConstraint
Approval Rate88%>92%Cannot drop below 90%
Fraud Rate1.8%<0.8%Industry benchmark
P99 Latency2,300ms<200msHard SLA requirement
Manual Review12%<3%Ops budget constraint
Dispute Win Rate22%>50%Evidence quality dependent
False Positive Rate18%<10%Customer experience KPI

Non-Negotiable Constraints

  • <200ms P99 latency - Payments cannot wait for fraud decisions
  • Exactly-once semantics - No duplicate charges or blocks
  • PCI/PII compliance - No raw PAN in fraud platform
  • 99.9% availability - Revenue-critical path

Internal Latency Budget

ComponentBudgetActual
Feature lookup (Redis)50ms50ms
Detection engine30ms20ms
Risk scoring20ms20ms
Policy evaluation15ms10ms
Evidence capture (async)30ms20ms
Total E2E<200ms106ms

Solution at a Glance

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Payment Gateway                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ POST /decide (<200ms)
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Fraud Detection API                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Feature  β”‚  β”‚Detection β”‚  β”‚  Risk    β”‚  β”‚  Policy  β”‚    β”‚
β”‚  β”‚ Engine   β”‚  β”‚ Engine   β”‚  β”‚ Scoring  β”‚  β”‚  Engine  β”‚    β”‚
β”‚  β”‚  (50ms)  β”‚  β”‚  (20ms)  β”‚  β”‚  (20ms)  β”‚  β”‚  (10ms)  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚             β”‚             β”‚             β”‚
   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”                 β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
   β”‚  Redis  β”‚   β”‚ Detect  β”‚                 β”‚  YAML   β”‚
   β”‚Velocity β”‚   β”‚5 Signal β”‚                 β”‚Hot-Load β”‚
   β”‚Counters β”‚   β”‚ Types   β”‚                 β”‚ Config  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   PostgreSQL    β”‚
                    β”‚ Evidence Vault  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Design Choices

DecisionChoiceRationale
Streaming vs. BatchReal-time APIVelocity attacks complete in minutes
ML vs. Rules (Phase 1)Rule-based with ML hooksFaster to market, interpretable
Feature StoreRedis velocity countersSub-ms lookups, sliding windows
Policy EngineYAML + hot-reloadBusiness can adjust without deploys
Evidence StoragePostgreSQL (immutable)Dispute representment requirement

Detection Coverage (5 Signal Types)

  1. Card Testing - Rapid small transactions, BIN probing, decline patterns
  2. Velocity Attacks - Multi-card device, multi-device card, IP clustering
  3. Geographic Anomaly - Country mismatch, impossible travel, datacenter IPs
  4. Bot/Automation - Emulators, rooted devices, Tor exit nodes
  5. Friendly Fraud - Historical chargebacks, refund abuse patterns

Phased Roadmap

Phase 1: MVP (Sprint 1-2) - COMPLETE

Real-time decisioning foundation with rule-based detection.

Deliverables:

  • Decision API with <200ms P99 latency
  • 5 detection signal types
  • Redis velocity counters (card, device, IP, user)
  • YAML policy engine with hot-reload
  • Immutable evidence vault
  • Prometheus/Grafana monitoring
  • 45+ unit tests, load tested to 1000+ RPS

Current Status: MVP complete, ready for shadow deployment

Phase 2: Hybrid ML + Experiments (Sprint 3-4)

Layer ML scoring while maintaining policy control.

Deliverables:

  • XGBoost/LightGBM criminal fraud model
  • Champion/challenger experiment framework
  • Historical replay for threshold simulation
  • Economic optimization UI for business users
  • Automated chargeback ingestion and labeling

ML Model Specification:

  • Features: 25+ velocity + behavioral + entity features
  • Labels: Chargebacks linked with 120-day maturity window
  • Training: Weekly retraining with point-in-time features
  • Deployment: Shadow mode first, then 10% traffic ramp

Phase 3: Scale & External Signals (Sprint 5-6)

Production hardening and expanded detection.

Deliverables:

  • Multi-region deployment (Redis Cluster, PostgreSQL replicas, Kafka event sourcing)
  • External signal integration: TC40/SAFE issuer alerts, Ethoca/Verifi network alerts, BIN intelligence and device reputation, Consortium fraud data
  • Enhanced analyst tooling (case management, bulk actions, playbooks)
  • IRSF detection for international calls
  • SIM swap correlation for ATO detection

Impact Summary

Projected Before/After Metrics

MetricBeforeAfter (Phase 1)After (Phase 2)Methodology
Approval Rate88%91%93%Threshold optimization
Fraud Rate1.80%1.20%0.75%Velocity detection
P99 Latency2,300ms106ms120msMeasured in load test
Manual Review12%5%2%Automation + confidence
False Positives18%12%8%Better signals
Dispute Win Rate22%40%55%Evidence capture

Financial Impact Model

Line ItemAnnual Impact
Fraud loss reduction (1.05% improvement)+$1,400,000
False positive recovery (6% improvement)+$300,000
Ops cost reduction (10% less manual review)+$200,000
Dispute win improvement (+28% win rate)+$150,000
Net Annual Benefit+$2,050,000

Risk Assessment

RiskLikelihoodImpactMitigation
Redis failureLowHighFallback to safe mode, cached features
ML model driftMediumMediumWeekly retraining, PSI monitoring
Threshold misconfigurationMediumHighReplay testing, gradual rollout
Attack pattern evolutionHighMediumChampion/challenger experiments
Integration delaysMediumMediumShadow mode allows parallel testing

Executive Recommendation

Proceed with Phase 2 deployment based on:

  1. Phase 1 MVP meets all technical SLAs (106ms P99 vs 200ms target)
  2. Load testing validates 1000+ RPS capacity (4x current peak)
  3. Rule-based detection provides immediate value while ML matures
  4. Evidence capture infrastructure enables dispute win rate improvement
  5. Hot-reload policy allows business-led threshold tuning

Next Actions:

  1. Shadow deployment to production traffic (week 1)
  2. ML model training with labeled historical data (weeks 1-2)
  3. Champion/challenger framework implementation (weeks 2-3)
  4. 10% traffic experiment with ML scoring (week 4)

This document is intended for VP/Director-level stakeholders. For technical details, see the Technical Overview section.