Architecture

Multi-Agent Architecture: Orchestration and Quality Control

A Principal TPM analysis of designing multi-agent AI systems with LangGraph - architecture decisions, quality validation patterns, and production trade-offs.

ailangrapharchitectureagentsinterview-prep

Executive Summary

The Ingredient Scanner demonstrates multi-agent AI architecture at production scale. This analysis covers the why behind architectural decisions - particularly valuable for system design interviews involving LLM orchestration.

Key Insight: Multi-agent systems require explicit orchestration. The complexity of coordinating agents is the primary engineering challenge, not the LLM calls themselves.


The Problem Space

User Need: Analyze ingredient lists for safety concerns, personalized to allergies and skin type.

Technical Challenge: No single LLM call can reliably:

1. Research ingredient safety data

2. Generate personalized analysis

3. Validate output quality

Solution: Specialized agents with explicit orchestration.


Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     LangGraph Workflow                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐           │
│   │ Research │────▶│ Analysis │────▶│  Critic  │           │
│   │  Agent   │     │  Agent   │     │  Agent   │           │
│   └────┬─────┘     └────┬─────┘     └────┬─────┘           │
│        │                │                │                  │
│        ▼                ▼                ▼                  │
│   ┌─────────┐      ┌─────────┐     ┌─────────┐            │
│   │ Qdrant  │      │ Gemini  │     │ 5-Gate  │            │
│   │ + Google│      │  2.0    │     │ Validate│            │
│   └─────────┘      └─────────┘     └─────────┘            │
│                                          │                  │
│                          ┌───────────────┤                  │
│                          │ PASS          │ FAIL             │
│                          ▼               ▼                  │
│                      [Return]       [Retry ≤3x]            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Agent Responsibilities

Supervisor Agent

Routes workflow based on state:

  • Determines which agents to invoke based on current state
  • Handles error states and retry logic
  • Manages conversation context across turns

Research Agent

Gathers ingredient information with fallback strategy:

SourceLatencyCoverageCostUse When
Qdrant vector search50-100ms80%Free tierPrimary (cached data)
Google grounded search500-1000ms95%+API costFallback (novel ingredients)

Design Decision: Try fast path first, fallback on miss. This reduces cost while maintaining coverage.

Analysis Agent

Generates personalized safety reports:

  • Gemini 2.0 Flash for generation (fast, cost-effective)
  • Structured output with Pydantic validation
  • Personalization based on user profile (allergies, skin type, pregnancy status)

Critic Agent

5-Gate Quality Validation - the key differentiator:

GateCheckFailure Action
1. CompletenessAll ingredients analyzedRetry with emphasis
2. AccuracySafety scores within valid rangeRetry with stricter prompt
3. PersonalizationAllergen warnings present if applicableRetry with profile emphasis
4. FormatJSON schema complianceRetry with format examples
5. Safety ClaimsNo unsupported medical claimsEdit or reject

Why 5 Gates?: Each gate catches different failure modes. The Critic catches ~15% of issues that would otherwise reach users.


Key Design Decisions

DecisionAlternatives ConsideredRationaleTrade-off
LangGraph over LangChain chainsLangChain LCEL, raw asyncBetter state management, conditional routing, debuggingLearning curve
5-gate CriticSingle validation passQuality must be explicit and measurableAdds latency
Vector + Search fallbackSearch-only, Vector-onlyBalance speed with coverageComplexity
Centralized LLM configPer-agent configSingle point for model changes and tracingLess flexibility

Failure Mode Analysis

FailureDetectionRecoveryUser Impact
Qdrant timeout3s timeoutFall back to Google search+500ms latency
Gemini rate limit429 responseExponential backoff, queueDelayed response
Critic loopRetry count >3Return with "review needed" flagDegraded quality
Invalid inputPydantic validation422 response with detailsClear error

Observability Integration

LangSmith provides production-grade observability:

CapabilityValueInterview Talking Point
Trace visualizationDebug agent chains"I can see exactly where failures occur"
Latency breakdownIdentify bottlenecks"Research agent was 70% of latency"
Cost trackingBudget management"Each request costs ~$0.02"
Quality metricsCritic pass rates"5-gate validation catches 15% of issues"

Interview Application

When asked "How would you design an LLM-powered system?":

1. Start with orchestration - How do agents coordinate?

2. Define agent boundaries - What is each agent responsible for?

3. Design for failure - Fallbacks at every layer

4. Add quality gates - Validation before user-facing output

5. Instrument everything - Observability from day one

The differentiator: Showing you understand that LLM reliability requires explicit validation, not just "prompt engineering".


Performance Characteristics

MetricValueOptimization
Average latency47sCaching reduces to 5s for repeated queries
Cache hit rate70%Redis + local LRU
Critic pass rate85% first tryPrompt improvements over time
Cost per request~$0.02Mostly Gemini API

*This analysis is part of the AI Ingredient Scanner project. See the [Architecture documentation](/docs/ingredient-scanner/architecture) for implementation details.*