Results, Limitations & Personas

Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST


Load Test Results

Test Configuration

ParameterValue
EnvironmentLocal (M-series Mac)
API Workers4 uvicorn workers
RedisSingle node, local Docker
PostgreSQLSingle node, local Docker
Test ToolLocust
Duration2 minutes
Users50 concurrent

Observed Performance

MetricObservedTargetStatus
Throughput260 RPS-Baseline
P50 Latency22ms50ms56% buffer
P99 Latency106ms200ms47% buffer
Error Rate0.00%<0.1%Passing
Failures00Passing

Latency Breakdown

Total P99: 106ms
ā”œā”€ā”€ Feature computation (Redis):  ~50ms (47%)
ā”œā”€ā”€ Risk scoring (detection):     ~20ms (19%)
ā”œā”€ā”€ Policy evaluation:            ~10ms (9%)
ā”œā”€ā”€ Evidence capture (async):     ~20ms (19%)
└── Network/serialization:        ~6ms  (6%)

Key Insight: Redis velocity lookups dominate latency at 47% of total. At scale, this is the first optimization target.

Capacity Projection

Load LevelEst. RPSEst. P99BottleneckMitigation
Baseline (50 users)260106msNone-
2x (100 users)500130msAPI workersAdd workers
4x (200 users)900160msRedis connectionsConnection pooling
8x (400 users)1,500200msRedis throughputRedis Cluster
16x+ (1000 users)3,000+>200msArchitecture limitKafka + Flink

Identified Bottleneck: At ~4,000 RPS in ramp-up testing, PostgreSQL evidence writes saturated connection pool, increasing tail latency. This confirms the planned path to scale via sharding/replicas and event-sourced evidence ingestion.

Replay Validation

Using synthetic historical data with known fraud labels:

ScenarioTransactionsFraud InjectedDetectedFalse Positives
Normal traffic10,0001% (100)72/100180/9,900
Card testing attack1,00010% (100)94/10045/900
Velocity attack50020% (100)88/10022/400
Mixed realistic15,0002% (300)221/300195/14,700

Summary:

  • Detection rate: 72-94% depending on attack type
  • False positive rate: 1.3-5% depending on scenario
  • Card testing attacks have highest detection confidence
  • Velocity attacks show strong detection with rule-based approach

Policy Impact Simulation (Replay)

Comparing baseline rules vs platform policy on 1M synthetic transactions:

MetricBaseline RulesPlatform PolicyDelta
Approval rate89.0%91.5%+2.5%
Criminal fraud caught (recall)60%78%+18%
Criminal fraud passed40%22%-18%
Manual review rate4.2%2.6%-1.6%
Estimated fraud loss100% (baseline)~62%-38%

The replay engine and economics framework provide finance and risk partners a quantified view of trade-offs before changes are deployed.


Limitations

Infrastructure Limitations

LimitationImpactProduction Path
Single node architectureNo failover, limited throughputDeploy Redis Cluster, PostgreSQL replicas
Local Docker deploymentNot representative of cloud latencyDeploy to AWS/GCP with network testing
No load balancerSingle point of failureAdd ALB/NLB with health checks
No auto-scalingCannot handle traffic spikesImplement Kubernetes HPA
No multi-regionGeographic latency, DR riskDeploy to multiple regions

Data Limitations

LimitationImpactMitigation Path
Synthetic test dataMay not reflect real attack patternsShadow deployment on production traffic
No real chargebacksCannot validate label accuracyIntegrate with PSP chargeback feed
Limited feature diversityMay miss real fraud signalsAdd external signals (BIN, device reputation)
No historical baselineCannot compare to existing systemRun parallel with current fraud system
Point-in-time features untestedReplay may have leakageValidate with known delayed labels

Model Limitations

LimitationImpactMitigation Path
Rule-based onlyLower accuracy than MLPhase 2 ML integration
No adaptive thresholdsStatic rules do not evolveImplement threshold optimization
No feedback loopDecisions do not improve systemAdd analyst feedback to training
Single modelNo redundancy or comparisonChampion/challenger framework
No drift detectionModel may degrade silentlyImplement PSI monitoring

Operational Limitations

LimitationImpactMitigation Path
No analyst UIManual review is cumbersomeBuild case management dashboard
No bulk operationsCannot act on patterns efficientlyAdd bulk blocklist/threshold tools
Limited alertingMay miss issuesFull Alertmanager integration
No on-call runbooksIncident response unclearDocument response procedures
No disaster recoverySingle region failure = outageMulti-region active-passive

Honest Assessment

What This Proves:
  āœ“ Architecture meets latency requirements
  āœ“ Detection logic catches known fraud patterns
  āœ“ Evidence capture is comprehensive
  āœ“ Policy engine is configurable
  āœ“ System handles expected load

What This Does Not Prove:
  āœ— Performance under real production traffic
  āœ— Detection accuracy on real fraud (vs synthetic)
  āœ— ML model performance (not yet implemented)
  āœ— Operational readiness (no real incidents yet)
  āœ— Economic impact (no real financial data)

Personas & Dashboard Usage

Persona 1: Fraud Analyst

Role: Reviews flagged transactions, makes manual decisions, investigates patterns

Primary Dashboard Panels:

PanelPurposeKey Metrics
Review QueueTransactions needing manual decisionCount, age, priority
Decision DistributionCurrent system behaviorALLOW/FRICTION/REVIEW/BLOCK %
Recent High-RiskEmerging patternsTransactions with score >70%
Triggered ReasonsWhy transactions flaggedTop 10 triggered signals

Workflow:

1. Check Review Queue
   └── Sort by priority (HIGH first)
   └── Filter by amount (high value first)

2. For each case:
   └── View transaction details (decision, scores, detectors fired, policy version)
   └── Review triggered signals and feature snapshot
   └── Check customer history
   └── Make decision: APPROVE / DECLINE / ESCALATE
   └── Annotate with disposition (confirmed fraud, friendly fraud, service issue)

3. Bulk actions:
   └── Add device to blocklist
   └── Add card to blocklist
   └── Flag user for enhanced monitoring

4. End of shift:
   └── Review queue age metrics
   └── Ensure nothing >4h old

Key Decisions:

  • Accept/decline individual transactions
  • Add entities to blocklists
  • Escalate suspicious patterns to Risk Lead
  • Annotate cases with dispositions (feeds back into model training labels)

Persona 2: Risk Lead / Fraud Manager

Role: Sets strategy, monitors KPIs, adjusts thresholds, manages team

Primary Dashboard Panels:

PanelPurposeKey Metrics
Approval Rate (24h)Customer experience healthTarget: >92%, Alert: <90%
Block Rate (24h)Fraud prevention activityTarget: <5%, Alert: >8%
Fraud Loss (30d lag)Actual financial impactRolling 30-day $
Dispute Win RateEvidence effectivenessTarget: >50%
Review Queue SLAOps efficiency% within 4h SLA

Workflow:

1. Morning Review:
   └── Check 24h approval rate
   └── Review any after-hours alerts
   └── Compare block rate to baseline

2. Weekly Metrics Review:
   └── Fraud rate trend (30d lag)
   └── False positive estimate
   └── Dispute outcomes
   └── Threshold performance

3. Threshold Adjustment:
   └── Run replay simulation on proposed change
   └── Review projected impact
   └── If acceptable: Apply via Policy Settings
   └── Monitor for 48h post-change

4. Incident Response:
   └── Spike in block rate? Check for attack or bug
   └── Drop in approval rate? Check threshold misconfiguration
   └── Latency spike? Escalate to Engineering

Key Decisions:

  • Threshold adjustments (friction/review/block levels)
  • Policy rule additions or modifications
  • Escalation to Engineering or Security
  • Resource allocation (analyst coverage)

Persona 3: SRE / On-Call Engineer

Role: Maintains system reliability, responds to alerts, handles incidents

Primary Dashboard Panels:

PanelPurposeKey Metrics
P99 LatencySystem performanceTarget: <200ms, Alert: >150ms
Error RateSystem reliabilityTarget: <0.1%, Alert: >0.5%
Safe Mode StatusFallback stateNormal / SAFE MODE
Component HealthDependency statusRedis, PostgreSQL, API status
ThroughputTraffic volumeRPS vs expected baseline

Workflow:

1. Alert Response:
   └── Check alert source and severity
   └── Verify via dashboard (not just alert)
   └── Follow runbook for specific alert type

2. Latency Spike Response:
   └── Check Redis latency panel
   └── Check PostgreSQL latency panel
   └── Identify bottleneck component
   └── Scale or restart as needed

3. Safe Mode Activation:
   └── Automatic if error rate >5%
   └── Manual if component failure detected
   └── Notify Fraud Ops (decisions will be conservative)
   └── Document reason and duration

4. Post-Incident:
   └── Collect metrics from incident window
   └── Write post-mortem
   └── Update runbooks if needed

Key Alerts:

AlertThresholdResponse
FraudDecisionLatencyHighP99 >200ms for 2minCheck Redis, scale API
FraudErrorRateCritical>5% for 1minSafe mode, investigate
FraudSafeModeActiveAnyNotify stakeholders, investigate
FraudTrafficDrop<10 RPS for 5minCheck upstream integration
FraudTrafficSpike>2x baselineCheck for attack or event

Dashboard Mapping

Demo Dashboard (dashboard.py) - Current Implementation

TabPrimary PersonaKey Panels
Transaction SimulatorEngineer/DemoTest scenarios, attack presets
Analytics DashboardRisk LeadDecision distribution, latency charts
Decision HistoryFraud AnalystHistorical decisions with filters
Policy InspectorRisk LeadCurrent rules, thresholds, lists
Policy SettingsRisk LeadThreshold adjustment, rule management

Production Dashboard Needs (Gap Analysis)

NeedDemo HasProduction Needs
Review queueNoYes - Priority sorted, age tracking
Case managementNoYes - Assignment, notes, workflow
Bulk actionsNoYes - Multi-select, batch operations
Real-time alertsNoYes - Integrated alerting
Drill-downLimitedYes - Click through to transaction
ExportNoYes - CSV/PDF for investigations
Role-based accessNoYes - Analyst vs Admin views

This document provides an honest assessment of what the system proves and does not prove, mapping dashboards to real user personas and their workflows.