AI/ML Roadmap & Current Status
Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST
Current Implementation Status
Phase 1: Rule-Driven Detection (COMPLETE)
The MVP implementation uses a rule-based detection engine with hooks for ML integration. This was a deliberate design choice to:
- Deliver value faster - Rules can be tuned immediately without training data
- Ensure interpretability - Every decision has explainable reasons
- Establish infrastructure - Feature pipeline, evidence capture, and policy engine ready for ML
What Is Implemented
| Component | Status | Description |
|---|---|---|
| Feature Engine | Complete | Redis velocity counters with sliding windows |
| Detection Engine | Complete | 5 detector types (card testing, velocity, geo, bot, friendly) |
| Risk Scoring | Complete | Rule-based combination of detector signals |
| Policy Engine | Complete | YAML configuration with hot-reload |
| Evidence Vault | Complete | Immutable storage with feature snapshots |
| Metrics Pipeline | Complete | Prometheus metrics for all components |
| Load Testing | Complete | Validated 1000+ RPS at 106ms P99 |
Detection Logic (Current)
# Simplified scoring formula (rule-based)
criminal_score = max(
card_testing.confidence * 0.9, # Card testing patterns
velocity.confidence * 0.8, # Velocity rule triggers
geo_anomaly.confidence * 0.7, # Geographic issues
bot_detection.confidence * 0.95 # Automation signals
)
friendly_score = friendly_fraud.confidence * 0.6
# Policy thresholds (configurable)
if criminal_score >= 0.85 or friendly_score >= 0.95:
return BLOCK
elif criminal_score >= 0.60 or friendly_score >= 0.70:
return FRICTION
elif criminal_score >= 0.40 or friendly_score >= 0.50:
return REVIEW
else:
return ALLOWPhase 2: Hybrid ML + Rules
ML Model Specification
Criminal Fraud Model
| Attribute | Specification |
|---|---|
| Algorithm | XGBoost (primary), LightGBM (challenger) |
| Objective | Binary classification (is_criminal_fraud) |
| Training Window | 90 days of transactions with 120-day label maturity |
| Retraining Frequency | Weekly (automated pipeline) |
| Feature Count | 25+ features |
| Target AUC | >0.85 |
| Latency Budget | <25ms P99 |
Feature List
Velocity Features (Real-time from Redis):
| Feature | Description | Window |
|---|---|---|
| card_attempts_10m | Transaction attempts on card | 10 min |
| card_attempts_1h | Transaction attempts on card | 1 hour |
| card_attempts_24h | Transaction attempts on card | 24 hours |
| device_distinct_cards_1h | Unique cards on device | 1 hour |
| device_distinct_cards_24h | Unique cards on device | 24 hours |
| ip_distinct_cards_1h | Unique cards from IP | 1 hour |
| user_total_amount_24h | Total spend by user | 24 hours |
| card_decline_rate_1h | Decline rate for card | 1 hour |
Entity Features (From profiles):
| Feature | Description | Source |
|---|---|---|
| card_age_hours | Time since card first seen | Redis |
| device_age_hours | Time since device first seen | Redis |
| user_account_age_days | Account creation age | Profile |
| user_chargeback_count_lifetime | Historical chargebacks | Profile |
| user_chargeback_rate_90d | Recent chargeback rate | Profile |
| card_distinct_devices_30d | Devices using this card | Redis |
| card_distinct_users_30d | Users using this card | Redis |
Transaction Features (From event):
| Feature | Description | Computation |
|---|---|---|
| amount_usd | Transaction amount | Direct |
| amount_zscore | Amount vs user average | (amount - avg) / std |
| is_new_card_for_user | First time card used | Boolean |
| is_new_device_for_user | First time device used | Boolean |
| hour_of_day | Local time hour | Timezone adjusted |
| is_weekend | Weekend flag | Boolean |
Device/Network Features:
| Feature | Description | Source |
|---|---|---|
| is_emulator | Device is emulated | Fingerprint |
| is_rooted | Device is rooted/jailbroken | Fingerprint |
| is_datacenter_ip | IP from cloud provider | IP intelligence |
| is_vpn | VPN detected | IP intelligence |
| is_tor | Tor exit node | IP intelligence |
| ip_risk_score | Third-party IP score | External API |
Label Source
Positive Labels (Fraud):
- Confirmed fraud chargebacks (reason codes 10.1-10.5)
- TC40/SAFE issuer alerts with fraud type indicators
- Manual review disposition: "confirmed fraud"
Negative Labels (Legitimate):
- Authorizations without chargebacks or fraud alerts after 120-day aging
- Manual review disposition: "legitimate"
Exclusions (Not Used for Training):
- Partial refunds and service complaints
- Ambiguous disputes (friendly fraud candidates)
- Transactions with incomplete feature capture
Label Maturity: 120 days from transaction - Reason: Chargebacks can arrive up to 120 days post-transaction - Consequence: Training data is always 4 months behind - Mitigation: Use issuer alerts (TC40/SAFE) for earlier signal
Retraining Strategy
Weekly Pipeline (Automated): 1. Extract transactions from T-120d to T-30d 2. Join with chargeback outcomes 3. Retrieve point-in-time features from evidence vault 4. Train new model version 5. Validate against holdout (last 7 days) 6. If AUC drop < 2%: Register as challenger 7. If AUC drop >= 2%: Alert DS team, use previous model Monthly Review (Manual): 1. Compare champion vs challenger performance 2. Analyze feature importance drift 3. Review false positive cases 4. Decide: promote challenger or retrain with adjustments
Champion/Challenger Framework
Experiment Architecture
Traffic Routing:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancer β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ
βChampion β βChallengerβ β Holdout β
β (80%) β β (15%) β β (5%) β
β Model A β β Model B β βRules Onlyβ
βββββββββββ βββββββββββ βββββββββββ
Routing: Deterministic hash on auth_id (reproducible)Experiment Metrics
| Metric | Champion | Challenger | Threshold |
|---|---|---|---|
| Approval Rate | 91.2% | Must be within 1% | -1% to +2% |
| Fraud Rate (30d lag) | 1.15% | Must improve | <1.15% |
| P99 Latency | 106ms | Must be within 20% | <127ms |
| False Positive Rate | 12% | Must improve | <12% |
Promotion Criteria
Promote Challenger if ALL true: 1. Running for >= 14 days 2. Sample size >= 100,000 transactions 3. Fraud rate improved by >= 5% (statistically significant) 4. Approval rate within 1% of champion 5. No latency degradation 6. No anomalies in score distribution Rollback Challenger if ANY true: 1. Fraud rate increased by > 10% 2. Approval rate dropped by > 3% 3. P99 latency exceeded 150ms 4. Error rate exceeded 0.5%
Replay Framework
Purpose
Historical replay enables:
- Threshold simulation - Test new thresholds on historical data
- Model validation - Compare model predictions to known outcomes
- Policy change estimation - Quantify impact before deployment
Implementation
async def replay(
start_date: datetime,
end_date: datetime,
policy_config: Optional[dict] = None,
model_version: Optional[str] = None
) -> ReplayResults:
"""
Replay historical transactions with optional config changes.
Key: Uses point-in-time features from evidence vault,
NOT current features (which would cause look-ahead bias).
"""
for transaction in get_historical_transactions(start_date, end_date):
# Get features AS THEY WERE at transaction time
features = get_features_at_time(
transaction.auth_id,
transaction.timestamp
)
# Score with specified model/policy
new_decision = score_and_decide(
transaction,
features,
model_version,
policy_config
)
# Compare to actual outcome
actual_fraud = was_transaction_fraud(transaction.auth_id)
# Record for analysis
results.append({
"original_decision": transaction.original_decision,
"new_decision": new_decision,
"actual_fraud": actual_fraud
})
return analyze_results(results)Simulation Use Cases
| Use Case | Input | Output |
|---|---|---|
| Threshold change | New threshold values | Approval rate delta, fraud caught delta |
| Model comparison | Model version A vs B | AUC difference, FP rate difference |
| Rule addition | New rule definition | Transactions affected, score changes |
| Seasonal analysis | Date range comparison | Pattern differences by period |
Phase 3: Advanced ML (Future)
Planned Enhancements
| Enhancement | Timeline | Description |
|---|---|---|
| Graph Neural Network | Phase 3 | Detect fraud rings via card-device-user connections |
| Sequence Model | Phase 3 | LSTM/Transformer for transaction sequence patterns |
| Anomaly Detection | Phase 3 | Isolation Forest for unknown attack patterns |
| Real-time Retraining | Phase 3 | Online learning for rapid adaptation |
| External Signals | Phase 3 | TC40/SAFE, Ethoca, BIN intelligence, device reputation |
External Signal Integration (Phase 3)
| Signal | Source | Use Case |
|---|---|---|
| TC40/SAFE | Issuer alerts | Early fraud signal before chargeback |
| Ethoca/Verifi | Network alerts | Pre-dispute notification |
| BIN Intelligence | Card networks | Card risk scoring, country of issuance |
| Device Reputation | Third-party SDK | Known bad devices, emulator detection |
| IP Intelligence | MaxMind/similar | Proxy, VPN, datacenter detection |
| Consortium Data | Industry shared | Cross-merchant fraud patterns |
Guardrails Against Over-Complexity
The roadmap explicitly avoids:
- Deploying highly complex or opaque models without clear interpretability paths
- Adding model variants that materially increase operational risk without proportional fraud/loss benefit
- Creeping scope into generic "AI everywhere" without clear problem statements
- Replacing human judgment in edge cases where model confidence is low
ML remains a tool, not the centerpiece - the platform's value is in robust decisioning, evidence, and economics, with ML reinforcing (not replacing) those pillars.
Graph-Based Fraud Ring Detection
Entity Graph: Nodes: Cards, Devices, Users, IPs Edges: Transaction relationships Fraud Ring Indicators: - Cluster of cards sharing devices - Star pattern (one device, many cards) - Circular payments between accounts - Velocity spikes in connected subgraph Implementation: - Neo4j for graph storage - Graph embedding for ML features - Community detection for ring identification
Architecture for ML Integration
Current Architecture (ML-Ready)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
β (FastAPI) β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
βFeature Engine β β Detection β β Policy Engine β
β (Redis) β β Engine β β (YAML) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β β
β βββββββββ΄ββββββββ β
β β Currently β β
β β Rule-Based β β
β β β β
β β [ML HOOK] ββββββββββββββ
β β Phase 2 β
β βββββββββββββββββ
β
βΌ
βββββββββββββββββ
βEvidence Vault β
β(Feature Store)β
βββββββββββββββββPhase 2 Architecture (With ML)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
βFeature Engine β β Scoring β β Policy Engine β
β (Redis) β β Service β β (YAML) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β β
β ββββββββββββΌβββββββββββ β
β βΌ βΌ βΌ β
β ββββββββββ ββββββββββ ββββββββββ β
β β Rule β β ML β βEnsembleβ β
β β Engine β β Model β β Layer βββββ
β ββββββββββ ββββββββββ ββββββββββ
β β
β ββββββββββββ΄βββββββββββ
β βΌ βΌ
β ββββββββββ ββββββββββ
β βChampionβ βChallngrβ
β β Model β β Model β
β ββββββββββ ββββββββββ
β
βΌ
βββββββββββββββββ
βEvidence Vault β
β+ ML Features β
βββββββββββββββββEnsemble Scoring
class EnsembleScoringService:
"""Combine rule-based and ML scores."""
def score(self, features: dict) -> RiskScore:
# Rule-based score (always runs)
rule_score = self.rule_engine.score(features)
# ML score (Phase 2+)
if self.ml_enabled:
ml_score = self.ml_model.predict(features)
else:
ml_score = None
# Ensemble combination
if ml_score is not None:
# Weighted average with rules as safety net
combined = (
ml_score * 0.70 + # ML carries more weight
rule_score * 0.30 # Rules as backstop
)
# Hard overrides (rules always win for certain signals)
if features.get("is_emulator"):
combined = max(combined, 0.95)
if features.get("blocklist_match"):
combined = 1.0
else:
combined = rule_score
return RiskScore(
combined=combined,
rule_score=rule_score,
ml_score=ml_score,
model_version=self.model_version
)Timeline Summary
| Phase | Scope | Status | Timeline |
|---|---|---|---|
| Phase 1 | Rule-based MVP | Complete | Done |
| Phase 2a | ML model training | Not started | Weeks 1-2 |
| Phase 2b | Champion/challenger | Not started | Weeks 2-3 |
| Phase 2c | ML in production | Not started | Week 4+ |
| Phase 3 | Advanced ML | Future | TBD |
This document consolidates the AI/ML strategy with explicit current status, avoiding the impression that ML is already deployed when it is planned for Phase 2.