AI/ML Roadmap & Current Status

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST


Current Implementation Status

Phase 1: Rule-Driven Detection (COMPLETE)

The MVP implementation uses a rule-based detection engine with hooks for ML integration. This was a deliberate design choice to:

  1. Deliver value faster - Rules can be tuned immediately without training data
  2. Ensure interpretability - Every decision has explainable reasons
  3. Establish infrastructure - Feature pipeline, evidence capture, and policy engine ready for ML

What Is Implemented

ComponentStatusDescription
Feature EngineCompleteRedis velocity counters with sliding windows
Detection EngineComplete5 detector types (card testing, velocity, geo, bot, friendly)
Risk ScoringCompleteRule-based combination of detector signals
Policy EngineCompleteYAML configuration with hot-reload
Evidence VaultCompleteImmutable storage with feature snapshots
Metrics PipelineCompletePrometheus metrics for all components
Load TestingCompleteValidated 1000+ RPS at 106ms P99

Detection Logic (Current)

# Simplified scoring formula (rule-based)
criminal_score = max(
    card_testing.confidence * 0.9,    # Card testing patterns
    velocity.confidence * 0.8,         # Velocity rule triggers
    geo_anomaly.confidence * 0.7,     # Geographic issues
    bot_detection.confidence * 0.95   # Automation signals
)

friendly_score = friendly_fraud.confidence * 0.6

# Policy thresholds (configurable)
if criminal_score >= 0.85 or friendly_score >= 0.95:
    return BLOCK
elif criminal_score >= 0.60 or friendly_score >= 0.70:
    return FRICTION
elif criminal_score >= 0.40 or friendly_score >= 0.50:
    return REVIEW
else:
    return ALLOW

Phase 2: Hybrid ML + Rules

ML Model Specification

Criminal Fraud Model

AttributeSpecification
AlgorithmXGBoost (primary), LightGBM (challenger)
ObjectiveBinary classification (is_criminal_fraud)
Training Window90 days of transactions with 120-day label maturity
Retraining FrequencyWeekly (automated pipeline)
Feature Count25+ features
Target AUC>0.85
Latency Budget<25ms P99

Feature List

Velocity Features (Real-time from Redis):

FeatureDescriptionWindow
card_attempts_10mTransaction attempts on card10 min
card_attempts_1hTransaction attempts on card1 hour
card_attempts_24hTransaction attempts on card24 hours
device_distinct_cards_1hUnique cards on device1 hour
device_distinct_cards_24hUnique cards on device24 hours
ip_distinct_cards_1hUnique cards from IP1 hour
user_total_amount_24hTotal spend by user24 hours
card_decline_rate_1hDecline rate for card1 hour

Entity Features (From profiles):

FeatureDescriptionSource
card_age_hoursTime since card first seenRedis
device_age_hoursTime since device first seenRedis
user_account_age_daysAccount creation ageProfile
user_chargeback_count_lifetimeHistorical chargebacksProfile
user_chargeback_rate_90dRecent chargeback rateProfile
card_distinct_devices_30dDevices using this cardRedis
card_distinct_users_30dUsers using this cardRedis

Transaction Features (From event):

FeatureDescriptionComputation
amount_usdTransaction amountDirect
amount_zscoreAmount vs user average(amount - avg) / std
is_new_card_for_userFirst time card usedBoolean
is_new_device_for_userFirst time device usedBoolean
hour_of_dayLocal time hourTimezone adjusted
is_weekendWeekend flagBoolean

Device/Network Features:

FeatureDescriptionSource
is_emulatorDevice is emulatedFingerprint
is_rootedDevice is rooted/jailbrokenFingerprint
is_datacenter_ipIP from cloud providerIP intelligence
is_vpnVPN detectedIP intelligence
is_torTor exit nodeIP intelligence
ip_risk_scoreThird-party IP scoreExternal API

Label Source

Positive Labels (Fraud):

  • Confirmed fraud chargebacks (reason codes 10.1-10.5)
  • TC40/SAFE issuer alerts with fraud type indicators
  • Manual review disposition: "confirmed fraud"

Negative Labels (Legitimate):

  • Authorizations without chargebacks or fraud alerts after 120-day aging
  • Manual review disposition: "legitimate"

Exclusions (Not Used for Training):

  • Partial refunds and service complaints
  • Ambiguous disputes (friendly fraud candidates)
  • Transactions with incomplete feature capture
Label Maturity: 120 days from transaction
  - Reason: Chargebacks can arrive up to 120 days post-transaction
  - Consequence: Training data is always 4 months behind
  - Mitigation: Use issuer alerts (TC40/SAFE) for earlier signal

Retraining Strategy

Weekly Pipeline (Automated):
1. Extract transactions from T-120d to T-30d
2. Join with chargeback outcomes
3. Retrieve point-in-time features from evidence vault
4. Train new model version
5. Validate against holdout (last 7 days)
6. If AUC drop < 2%: Register as challenger
7. If AUC drop >= 2%: Alert DS team, use previous model

Monthly Review (Manual):
1. Compare champion vs challenger performance
2. Analyze feature importance drift
3. Review false positive cases
4. Decide: promote challenger or retrain with adjustments

Champion/Challenger Framework

Experiment Architecture

Traffic Routing:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Load Balancer                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                 β”‚                 β”‚
        β–Ό                 β–Ό                 β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚Champion β”‚       β”‚Challengerβ”‚       β”‚ Holdout β”‚
   β”‚  (80%)  β”‚       β”‚  (15%)  β”‚       β”‚  (5%)   β”‚
   β”‚ Model A β”‚       β”‚ Model B β”‚       β”‚Rules Onlyβ”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing: Deterministic hash on auth_id (reproducible)

Experiment Metrics

MetricChampionChallengerThreshold
Approval Rate91.2%Must be within 1%-1% to +2%
Fraud Rate (30d lag)1.15%Must improve<1.15%
P99 Latency106msMust be within 20%<127ms
False Positive Rate12%Must improve<12%

Promotion Criteria

Promote Challenger if ALL true:
  1. Running for >= 14 days
  2. Sample size >= 100,000 transactions
  3. Fraud rate improved by >= 5% (statistically significant)
  4. Approval rate within 1% of champion
  5. No latency degradation
  6. No anomalies in score distribution

Rollback Challenger if ANY true:
  1. Fraud rate increased by > 10%
  2. Approval rate dropped by > 3%
  3. P99 latency exceeded 150ms
  4. Error rate exceeded 0.5%

Replay Framework

Purpose

Historical replay enables:

  1. Threshold simulation - Test new thresholds on historical data
  2. Model validation - Compare model predictions to known outcomes
  3. Policy change estimation - Quantify impact before deployment

Implementation

async def replay(
    start_date: datetime,
    end_date: datetime,
    policy_config: Optional[dict] = None,
    model_version: Optional[str] = None
) -> ReplayResults:
    """
    Replay historical transactions with optional config changes.

    Key: Uses point-in-time features from evidence vault,
    NOT current features (which would cause look-ahead bias).
    """

    for transaction in get_historical_transactions(start_date, end_date):
        # Get features AS THEY WERE at transaction time
        features = get_features_at_time(
            transaction.auth_id,
            transaction.timestamp
        )

        # Score with specified model/policy
        new_decision = score_and_decide(
            transaction,
            features,
            model_version,
            policy_config
        )

        # Compare to actual outcome
        actual_fraud = was_transaction_fraud(transaction.auth_id)

        # Record for analysis
        results.append({
            "original_decision": transaction.original_decision,
            "new_decision": new_decision,
            "actual_fraud": actual_fraud
        })

    return analyze_results(results)

Simulation Use Cases

Use CaseInputOutput
Threshold changeNew threshold valuesApproval rate delta, fraud caught delta
Model comparisonModel version A vs BAUC difference, FP rate difference
Rule additionNew rule definitionTransactions affected, score changes
Seasonal analysisDate range comparisonPattern differences by period

Phase 3: Advanced ML (Future)

Planned Enhancements

EnhancementTimelineDescription
Graph Neural NetworkPhase 3Detect fraud rings via card-device-user connections
Sequence ModelPhase 3LSTM/Transformer for transaction sequence patterns
Anomaly DetectionPhase 3Isolation Forest for unknown attack patterns
Real-time RetrainingPhase 3Online learning for rapid adaptation
External SignalsPhase 3TC40/SAFE, Ethoca, BIN intelligence, device reputation

External Signal Integration (Phase 3)

SignalSourceUse Case
TC40/SAFEIssuer alertsEarly fraud signal before chargeback
Ethoca/VerifiNetwork alertsPre-dispute notification
BIN IntelligenceCard networksCard risk scoring, country of issuance
Device ReputationThird-party SDKKnown bad devices, emulator detection
IP IntelligenceMaxMind/similarProxy, VPN, datacenter detection
Consortium DataIndustry sharedCross-merchant fraud patterns

Guardrails Against Over-Complexity

The roadmap explicitly avoids:

  • Deploying highly complex or opaque models without clear interpretability paths
  • Adding model variants that materially increase operational risk without proportional fraud/loss benefit
  • Creeping scope into generic "AI everywhere" without clear problem statements
  • Replacing human judgment in edge cases where model confidence is low

ML remains a tool, not the centerpiece - the platform's value is in robust decisioning, evidence, and economics, with ML reinforcing (not replacing) those pillars.

Graph-Based Fraud Ring Detection

Entity Graph:
  Nodes: Cards, Devices, Users, IPs
  Edges: Transaction relationships

Fraud Ring Indicators:
  - Cluster of cards sharing devices
  - Star pattern (one device, many cards)
  - Circular payments between accounts
  - Velocity spikes in connected subgraph

Implementation:
  - Neo4j for graph storage
  - Graph embedding for ML features
  - Community detection for ring identification

Architecture for ML Integration

Current Architecture (ML-Ready)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        API Layer                             β”‚
β”‚                      (FastAPI)                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                    β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Feature Engine β”‚    β”‚   Detection   β”‚    β”‚ Policy Engine β”‚
β”‚   (Redis)     β”‚    β”‚    Engine     β”‚    β”‚   (YAML)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                    β”‚                    β”‚
        β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”            β”‚
        β”‚            β”‚   Currently   β”‚            β”‚
        β”‚            β”‚  Rule-Based   β”‚            β”‚
        β”‚            β”‚               β”‚            β”‚
        β”‚            β”‚  [ML HOOK]    β”‚β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚            β”‚   Phase 2     β”‚
        β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Evidence Vault β”‚
β”‚(Feature Store)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 2 Architecture (With ML)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        API Layer                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                    β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Feature Engine β”‚    β”‚   Scoring     β”‚    β”‚ Policy Engine β”‚
β”‚   (Redis)     β”‚    β”‚   Service     β”‚    β”‚   (YAML)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                    β”‚                    β”‚
        β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
        β”‚         β–Ό          β–Ό          β–Ό        β”‚
        β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
        β”‚    β”‚  Rule  β”‚ β”‚   ML   β”‚ β”‚Ensembleβ”‚   β”‚
        β”‚    β”‚ Engine β”‚ β”‚ Model  β”‚ β”‚  Layer β”‚β—„β”€β”€β”˜
        β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                    β”‚
        β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         β–Ό                     β–Ό
        β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    β”‚Championβ”‚           β”‚Challngrβ”‚
        β”‚    β”‚ Model  β”‚           β”‚ Model  β”‚
        β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Evidence Vault β”‚
β”‚+ ML Features  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ensemble Scoring

class EnsembleScoringService:
    """Combine rule-based and ML scores."""

    def score(self, features: dict) -> RiskScore:
        # Rule-based score (always runs)
        rule_score = self.rule_engine.score(features)

        # ML score (Phase 2+)
        if self.ml_enabled:
            ml_score = self.ml_model.predict(features)
        else:
            ml_score = None

        # Ensemble combination
        if ml_score is not None:
            # Weighted average with rules as safety net
            combined = (
                ml_score * 0.70 +           # ML carries more weight
                rule_score * 0.30           # Rules as backstop
            )

            # Hard overrides (rules always win for certain signals)
            if features.get("is_emulator"):
                combined = max(combined, 0.95)
            if features.get("blocklist_match"):
                combined = 1.0
        else:
            combined = rule_score

        return RiskScore(
            combined=combined,
            rule_score=rule_score,
            ml_score=ml_score,
            model_version=self.model_version
        )

Timeline Summary

PhaseScopeStatusTimeline
Phase 1Rule-based MVPCompleteDone
Phase 2aML model trainingNot startedWeeks 1-2
Phase 2bChampion/challengerNot startedWeeks 2-3
Phase 2cML in productionNot startedWeek 4+
Phase 3Advanced MLFutureTBD

This document consolidates the AI/ML strategy with explicit current status, avoiding the impression that ML is already deployed when it is planned for Phase 2.