Telco Payment Fraud Detection Platform
Executive Overview
Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST
Problem & Context
Business Context
A mid-size Telco/MSP processes ~150M payment authorization attempts per year across prepaid top-ups, postpaid billing, device financing, and value-added services. The platform handles SIM activations, device upgrades, mobile top-ups, and international service enablement.
| Challenge | Current State | Business Impact |
|---|---|---|
| Annual Fraud Loss | $2.4M+ (1.8% of payment volume) | Direct P&L hit |
| False Positive Rate | 18% of blocks are legitimate | $800K+ lost revenue annually |
| Decision Latency | 2-3 seconds (batch scoring) | Poor UX, cart abandonment |
| Manual Review Volume | 12% of transactions | $400K+ ops cost, 4-hour SLA |
| Chargeback Win Rate | 22% | Recoverable losses left on table |
Root Cause Analysis
- Batch-based detection cannot catch velocity attacks that complete in minutes
- Static rules cannot adapt to evolving fraud patterns (SIM farms, device resale rings)
- Insufficient evidence capture leads to losing winnable disputes
- No profit-based thresholds results in over-blocking legitimate customers
Goals & Constraints
Target Metrics
| Metric | Current | Target | Constraint |
|---|---|---|---|
| Approval Rate | 88% | >92% | Cannot drop below 90% |
| Fraud Rate | 1.8% | <0.8% | Industry benchmark |
| P99 Latency | 2,300ms | <200ms | Hard SLA requirement |
| Manual Review | 12% | <3% | Ops budget constraint |
| Dispute Win Rate | 22% | >50% | Evidence quality dependent |
| False Positive Rate | 18% | <10% | Customer experience KPI |
Non-Negotiable Constraints
- <200ms P99 latency - Payments cannot wait for fraud decisions
- Exactly-once semantics - No duplicate charges or blocks
- PCI/PII compliance - No raw PAN in fraud platform
- 99.9% availability - Revenue-critical path
Internal Latency Budget
| Component | Budget | Actual |
|---|---|---|
| Feature lookup (Redis) | 50ms | 50ms |
| Detection engine | 30ms | 20ms |
| Risk scoring | 20ms | 20ms |
| Policy evaluation | 15ms | 10ms |
| Evidence capture (async) | 30ms | 20ms |
| Total E2E | <200ms | 106ms |
Solution at a Glance
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Payment Gateway β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β POST /decide (<200ms)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Fraud Detection API β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Feature β βDetection β β Risk β β Policy β β
β β Engine β β Engine β β Scoring β β Engine β β
β β (50ms) β β (20ms) β β (20ms) β β (10ms) β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
βββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββΌβββββββββββ
β β β β
ββββββΌβββββ ββββββΌβββββ ββββββΌβββββ
β Redis β β Detect β β YAML β
βVelocity β β5 Signal β βHot-Load β
βCounters β β Types β β Config β
βββββββββββ βββββββββββ βββββββββββ
β
βΌ
βββββββββββββββββββ
β PostgreSQL β
β Evidence Vault β
βββββββββββββββββββKey Design Choices
| Decision | Choice | Rationale |
|---|---|---|
| Streaming vs. Batch | Real-time API | Velocity attacks complete in minutes |
| ML vs. Rules (Phase 1) | Rule-based with ML hooks | Faster to market, interpretable |
| Feature Store | Redis velocity counters | Sub-ms lookups, sliding windows |
| Policy Engine | YAML + hot-reload | Business can adjust without deploys |
| Evidence Storage | PostgreSQL (immutable) | Dispute representment requirement |
Detection Coverage (5 Signal Types)
- Card Testing - Rapid small transactions, BIN probing, decline patterns
- Velocity Attacks - Multi-card device, multi-device card, IP clustering
- Geographic Anomaly - Country mismatch, impossible travel, datacenter IPs
- Bot/Automation - Emulators, rooted devices, Tor exit nodes
- Friendly Fraud - Historical chargebacks, refund abuse patterns
Phased Roadmap
Phase 1: MVP (Sprint 1-2) - COMPLETE
Real-time decisioning foundation with rule-based detection.
Deliverables:
- Decision API with <200ms P99 latency
- 5 detection signal types
- Redis velocity counters (card, device, IP, user)
- YAML policy engine with hot-reload
- Immutable evidence vault
- Prometheus/Grafana monitoring
- 45+ unit tests, load tested to 1000+ RPS
Current Status: MVP complete, ready for shadow deployment
Phase 2: Hybrid ML + Experiments (Sprint 3-4)
Layer ML scoring while maintaining policy control.
Deliverables:
- XGBoost/LightGBM criminal fraud model
- Champion/challenger experiment framework
- Historical replay for threshold simulation
- Economic optimization UI for business users
- Automated chargeback ingestion and labeling
ML Model Specification:
- Features: 25+ velocity + behavioral + entity features
- Labels: Chargebacks linked with 120-day maturity window
- Training: Weekly retraining with point-in-time features
- Deployment: Shadow mode first, then 10% traffic ramp
Phase 3: Scale & External Signals (Sprint 5-6)
Production hardening and expanded detection.
Deliverables:
- Multi-region deployment (Redis Cluster, PostgreSQL replicas, Kafka event sourcing)
- External signal integration: TC40/SAFE issuer alerts, Ethoca/Verifi network alerts, BIN intelligence and device reputation, Consortium fraud data
- Enhanced analyst tooling (case management, bulk actions, playbooks)
- IRSF detection for international calls
- SIM swap correlation for ATO detection
Impact Summary
Projected Before/After Metrics
| Metric | Before | After (Phase 1) | After (Phase 2) | Methodology |
|---|---|---|---|---|
| Approval Rate | 88% | 91% | 93% | Threshold optimization |
| Fraud Rate | 1.80% | 1.20% | 0.75% | Velocity detection |
| P99 Latency | 2,300ms | 106ms | 120ms | Measured in load test |
| Manual Review | 12% | 5% | 2% | Automation + confidence |
| False Positives | 18% | 12% | 8% | Better signals |
| Dispute Win Rate | 22% | 40% | 55% | Evidence capture |
Financial Impact Model
| Line Item | Annual Impact |
|---|---|
| Fraud loss reduction (1.05% improvement) | +$1,400,000 |
| False positive recovery (6% improvement) | +$300,000 |
| Ops cost reduction (10% less manual review) | +$200,000 |
| Dispute win improvement (+28% win rate) | +$150,000 |
| Net Annual Benefit | +$2,050,000 |
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Redis failure | Low | High | Fallback to safe mode, cached features |
| ML model drift | Medium | Medium | Weekly retraining, PSI monitoring |
| Threshold misconfiguration | Medium | High | Replay testing, gradual rollout |
| Attack pattern evolution | High | Medium | Champion/challenger experiments |
| Integration delays | Medium | Medium | Shadow mode allows parallel testing |
Executive Recommendation
Proceed with Phase 2 deployment based on:
- Phase 1 MVP meets all technical SLAs (106ms P99 vs 200ms target)
- Load testing validates 1000+ RPS capacity (4x current peak)
- Rule-based detection provides immediate value while ML matures
- Evidence capture infrastructure enables dispute win rate improvement
- Hot-reload policy allows business-led threshold tuning
Next Actions:
- Shadow deployment to production traffic (week 1)
- ML model training with labeled historical data (weeks 1-2)
- Champion/challenger framework implementation (weeks 2-3)
- 10% traffic experiment with ML scoring (week 4)
This document is intended for VP/Director-level stakeholders. For technical details, see the Technical Overview section.