Results, Limitations & Personas
Author: Uday Tamma | Document Version: 1.0 | Date: January 06, 2026 at 11:33 AM CST
Load Test Results
Test Configuration
| Parameter | Value |
|---|---|
| Environment | Local (M-series Mac) |
| API Workers | 4 uvicorn workers |
| Redis | Single node, local Docker |
| PostgreSQL | Single node, local Docker |
| Test Tool | Locust |
| Duration | 2 minutes |
| Users | 50 concurrent |
Observed Performance
| Metric | Observed | Target | Status |
|---|---|---|---|
| Throughput | 260 RPS | - | Baseline |
| P50 Latency | 22ms | 50ms | 56% buffer |
| P99 Latency | 106ms | 200ms | 47% buffer |
| Error Rate | 0.00% | <0.1% | Passing |
| Failures | 0 | 0 | Passing |
Latency Breakdown
Total P99: 106ms āāā Feature computation (Redis): ~50ms (47%) āāā Risk scoring (detection): ~20ms (19%) āāā Policy evaluation: ~10ms (9%) āāā Evidence capture (async): ~20ms (19%) āāā Network/serialization: ~6ms (6%)
Key Insight: Redis velocity lookups dominate latency at 47% of total. At scale, this is the first optimization target.
Capacity Projection
| Load Level | Est. RPS | Est. P99 | Bottleneck | Mitigation |
|---|---|---|---|---|
| Baseline (50 users) | 260 | 106ms | None | - |
| 2x (100 users) | 500 | 130ms | API workers | Add workers |
| 4x (200 users) | 900 | 160ms | Redis connections | Connection pooling |
| 8x (400 users) | 1,500 | 200ms | Redis throughput | Redis Cluster |
| 16x+ (1000 users) | 3,000+ | >200ms | Architecture limit | Kafka + Flink |
Identified Bottleneck: At ~4,000 RPS in ramp-up testing, PostgreSQL evidence writes saturated connection pool, increasing tail latency. This confirms the planned path to scale via sharding/replicas and event-sourced evidence ingestion.
Replay Validation
Using synthetic historical data with known fraud labels:
| Scenario | Transactions | Fraud Injected | Detected | False Positives |
|---|---|---|---|---|
| Normal traffic | 10,000 | 1% (100) | 72/100 | 180/9,900 |
| Card testing attack | 1,000 | 10% (100) | 94/100 | 45/900 |
| Velocity attack | 500 | 20% (100) | 88/100 | 22/400 |
| Mixed realistic | 15,000 | 2% (300) | 221/300 | 195/14,700 |
Summary:
- Detection rate: 72-94% depending on attack type
- False positive rate: 1.3-5% depending on scenario
- Card testing attacks have highest detection confidence
- Velocity attacks show strong detection with rule-based approach
Policy Impact Simulation (Replay)
Comparing baseline rules vs platform policy on 1M synthetic transactions:
| Metric | Baseline Rules | Platform Policy | Delta |
|---|---|---|---|
| Approval rate | 89.0% | 91.5% | +2.5% |
| Criminal fraud caught (recall) | 60% | 78% | +18% |
| Criminal fraud passed | 40% | 22% | -18% |
| Manual review rate | 4.2% | 2.6% | -1.6% |
| Estimated fraud loss | 100% (baseline) | ~62% | -38% |
The replay engine and economics framework provide finance and risk partners a quantified view of trade-offs before changes are deployed.
Limitations
Infrastructure Limitations
| Limitation | Impact | Production Path |
|---|---|---|
| Single node architecture | No failover, limited throughput | Deploy Redis Cluster, PostgreSQL replicas |
| Local Docker deployment | Not representative of cloud latency | Deploy to AWS/GCP with network testing |
| No load balancer | Single point of failure | Add ALB/NLB with health checks |
| No auto-scaling | Cannot handle traffic spikes | Implement Kubernetes HPA |
| No multi-region | Geographic latency, DR risk | Deploy to multiple regions |
Data Limitations
| Limitation | Impact | Mitigation Path |
|---|---|---|
| Synthetic test data | May not reflect real attack patterns | Shadow deployment on production traffic |
| No real chargebacks | Cannot validate label accuracy | Integrate with PSP chargeback feed |
| Limited feature diversity | May miss real fraud signals | Add external signals (BIN, device reputation) |
| No historical baseline | Cannot compare to existing system | Run parallel with current fraud system |
| Point-in-time features untested | Replay may have leakage | Validate with known delayed labels |
Model Limitations
| Limitation | Impact | Mitigation Path |
|---|---|---|
| Rule-based only | Lower accuracy than ML | Phase 2 ML integration |
| No adaptive thresholds | Static rules do not evolve | Implement threshold optimization |
| No feedback loop | Decisions do not improve system | Add analyst feedback to training |
| Single model | No redundancy or comparison | Champion/challenger framework |
| No drift detection | Model may degrade silently | Implement PSI monitoring |
Operational Limitations
| Limitation | Impact | Mitigation Path |
|---|---|---|
| No analyst UI | Manual review is cumbersome | Build case management dashboard |
| No bulk operations | Cannot act on patterns efficiently | Add bulk blocklist/threshold tools |
| Limited alerting | May miss issues | Full Alertmanager integration |
| No on-call runbooks | Incident response unclear | Document response procedures |
| No disaster recovery | Single region failure = outage | Multi-region active-passive |
Honest Assessment
What This Proves: ā Architecture meets latency requirements ā Detection logic catches known fraud patterns ā Evidence capture is comprehensive ā Policy engine is configurable ā System handles expected load What This Does Not Prove: ā Performance under real production traffic ā Detection accuracy on real fraud (vs synthetic) ā ML model performance (not yet implemented) ā Operational readiness (no real incidents yet) ā Economic impact (no real financial data)
Personas & Dashboard Usage
Persona 1: Fraud Analyst
Role: Reviews flagged transactions, makes manual decisions, investigates patterns
Primary Dashboard Panels:
| Panel | Purpose | Key Metrics |
|---|---|---|
| Review Queue | Transactions needing manual decision | Count, age, priority |
| Decision Distribution | Current system behavior | ALLOW/FRICTION/REVIEW/BLOCK % |
| Recent High-Risk | Emerging patterns | Transactions with score >70% |
| Triggered Reasons | Why transactions flagged | Top 10 triggered signals |
Workflow:
1. Check Review Queue āāā Sort by priority (HIGH first) āāā Filter by amount (high value first) 2. For each case: āāā View transaction details (decision, scores, detectors fired, policy version) āāā Review triggered signals and feature snapshot āāā Check customer history āāā Make decision: APPROVE / DECLINE / ESCALATE āāā Annotate with disposition (confirmed fraud, friendly fraud, service issue) 3. Bulk actions: āāā Add device to blocklist āāā Add card to blocklist āāā Flag user for enhanced monitoring 4. End of shift: āāā Review queue age metrics āāā Ensure nothing >4h old
Key Decisions:
- Accept/decline individual transactions
- Add entities to blocklists
- Escalate suspicious patterns to Risk Lead
- Annotate cases with dispositions (feeds back into model training labels)
Persona 2: Risk Lead / Fraud Manager
Role: Sets strategy, monitors KPIs, adjusts thresholds, manages team
Primary Dashboard Panels:
| Panel | Purpose | Key Metrics |
|---|---|---|
| Approval Rate (24h) | Customer experience health | Target: >92%, Alert: <90% |
| Block Rate (24h) | Fraud prevention activity | Target: <5%, Alert: >8% |
| Fraud Loss (30d lag) | Actual financial impact | Rolling 30-day $ |
| Dispute Win Rate | Evidence effectiveness | Target: >50% |
| Review Queue SLA | Ops efficiency | % within 4h SLA |
Workflow:
1. Morning Review: āāā Check 24h approval rate āāā Review any after-hours alerts āāā Compare block rate to baseline 2. Weekly Metrics Review: āāā Fraud rate trend (30d lag) āāā False positive estimate āāā Dispute outcomes āāā Threshold performance 3. Threshold Adjustment: āāā Run replay simulation on proposed change āāā Review projected impact āāā If acceptable: Apply via Policy Settings āāā Monitor for 48h post-change 4. Incident Response: āāā Spike in block rate? Check for attack or bug āāā Drop in approval rate? Check threshold misconfiguration āāā Latency spike? Escalate to Engineering
Key Decisions:
- Threshold adjustments (friction/review/block levels)
- Policy rule additions or modifications
- Escalation to Engineering or Security
- Resource allocation (analyst coverage)
Persona 3: SRE / On-Call Engineer
Role: Maintains system reliability, responds to alerts, handles incidents
Primary Dashboard Panels:
| Panel | Purpose | Key Metrics |
|---|---|---|
| P99 Latency | System performance | Target: <200ms, Alert: >150ms |
| Error Rate | System reliability | Target: <0.1%, Alert: >0.5% |
| Safe Mode Status | Fallback state | Normal / SAFE MODE |
| Component Health | Dependency status | Redis, PostgreSQL, API status |
| Throughput | Traffic volume | RPS vs expected baseline |
Workflow:
1. Alert Response: āāā Check alert source and severity āāā Verify via dashboard (not just alert) āāā Follow runbook for specific alert type 2. Latency Spike Response: āāā Check Redis latency panel āāā Check PostgreSQL latency panel āāā Identify bottleneck component āāā Scale or restart as needed 3. Safe Mode Activation: āāā Automatic if error rate >5% āāā Manual if component failure detected āāā Notify Fraud Ops (decisions will be conservative) āāā Document reason and duration 4. Post-Incident: āāā Collect metrics from incident window āāā Write post-mortem āāā Update runbooks if needed
Key Alerts:
| Alert | Threshold | Response |
|---|---|---|
| FraudDecisionLatencyHigh | P99 >200ms for 2min | Check Redis, scale API |
| FraudErrorRateCritical | >5% for 1min | Safe mode, investigate |
| FraudSafeModeActive | Any | Notify stakeholders, investigate |
| FraudTrafficDrop | <10 RPS for 5min | Check upstream integration |
| FraudTrafficSpike | >2x baseline | Check for attack or event |
Dashboard Mapping
Demo Dashboard (dashboard.py) - Current Implementation
| Tab | Primary Persona | Key Panels |
|---|---|---|
| Transaction Simulator | Engineer/Demo | Test scenarios, attack presets |
| Analytics Dashboard | Risk Lead | Decision distribution, latency charts |
| Decision History | Fraud Analyst | Historical decisions with filters |
| Policy Inspector | Risk Lead | Current rules, thresholds, lists |
| Policy Settings | Risk Lead | Threshold adjustment, rule management |
Production Dashboard Needs (Gap Analysis)
| Need | Demo Has | Production Needs |
|---|---|---|
| Review queue | No | Yes - Priority sorted, age tracking |
| Case management | No | Yes - Assignment, notes, workflow |
| Bulk actions | No | Yes - Multi-select, batch operations |
| Real-time alerts | No | Yes - Integrated alerting |
| Drill-down | Limited | Yes - Click through to transaction |
| Export | No | Yes - CSV/PDF for investigations |
| Role-based access | No | Yes - Analyst vs Admin views |
This document provides an honest assessment of what the system proves and does not prove, mapping dashboards to real user personas and their workflows.