Multi-Agent Architecture: Orchestration and Quality Control

Executive Summary

The Ingredient Scanner demonstrates multi-agent AI architecture at production scale. This analysis covers the why behind architectural decisions - particularly valuable for system design interviews involving LLM orchestration.

Key Insight: Multi-agent systems require explicit orchestration. The complexity of coordinating agents is the primary engineering challenge, not the LLM calls themselves.

The Problem Space

User Need: Analyze ingredient lists for safety concerns, personalized to allergies and skin type.

Technical Challenge: No single LLM call can reliably:

1. Research ingredient safety data

2. Generate personalized analysis

3. Validate output quality

Solution: Specialized agents with explicit orchestration.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     LangGraph Workflow                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌──────────┐     ┌──────────┐     ┌──────────┐           │
│   │ Research │────▶│ Analysis │────▶│  Critic  │           │
│   │  Agent   │     │  Agent   │     │  Agent   │           │
│   └────┬─────┘     └────┬─────┘     └────┬─────┘           │
│        │                │                │                  │
│        ▼                ▼                ▼                  │
│   ┌─────────┐      ┌─────────┐     ┌─────────┐            │
│   │ Qdrant  │      │ Gemini  │     │ 5-Gate  │            │
│   │ + Google│      │  2.0    │     │ Validate│            │
│   └─────────┘      └─────────┘     └─────────┘            │
│                                          │                  │
│                          ┌───────────────┤                  │
│                          │ PASS          │ FAIL             │
│                          ▼               ▼                  │
│                      [Return]       [Retry ≤3x]            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Agent Responsibilities

Supervisor Agent

Routes workflow based on state:

Determines which agents to invoke based on current state
Handles error states and retry logic
Manages conversation context across turns

Research Agent

Gathers ingredient information with fallback strategy:

Source	Latency	Coverage	Cost	Use When
Qdrant vector search	50-100ms	80%	Free tier	Primary (cached data)
Google grounded search	500-1000ms	95%+	API cost	Fallback (novel ingredients)

Design Decision: Try fast path first, fallback on miss. This reduces cost while maintaining coverage.

Analysis Agent

Generates personalized safety reports:

Gemini 2.0 Flash for generation (fast, cost-effective)
Structured output with Pydantic validation
Personalization based on user profile (allergies, skin type, pregnancy status)

Critic Agent

5-Gate Quality Validation - the key differentiator:

Gate	Check	Failure Action
1. Completeness	All ingredients analyzed	Retry with emphasis
2. Accuracy	Safety scores within valid range	Retry with stricter prompt
3. Personalization	Allergen warnings present if applicable	Retry with profile emphasis
4. Format	JSON schema compliance	Retry with format examples
5. Safety Claims	No unsupported medical claims	Edit or reject

Why 5 Gates?: Each gate catches different failure modes. The Critic catches ~15% of issues that would otherwise reach users.

Key Design Decisions

Decision	Alternatives Considered	Rationale	Trade-off
LangGraph over LangChain chains	LangChain LCEL, raw async	Better state management, conditional routing, debugging	Learning curve
5-gate Critic	Single validation pass	Quality must be explicit and measurable	Adds latency
Vector + Search fallback	Search-only, Vector-only	Balance speed with coverage	Complexity
Centralized LLM config	Per-agent config	Single point for model changes and tracing	Less flexibility

Failure Mode Analysis

Failure	Detection	Recovery	User Impact
Qdrant timeout	3s timeout	Fall back to Google search	+500ms latency
Gemini rate limit	429 response	Exponential backoff, queue	Delayed response
Critic loop	Retry count >3	Return with "review needed" flag	Degraded quality
Invalid input	Pydantic validation	422 response with details	Clear error

Observability Integration

LangSmith provides production-grade observability:

Capability	Value	Interview Talking Point
Trace visualization	Debug agent chains	"I can see exactly where failures occur"
Latency breakdown	Identify bottlenecks	"Research agent was 70% of latency"
Cost tracking	Budget management	"Each request costs ~$0.02"
Quality metrics	Critic pass rates	"5-gate validation catches 15% of issues"

Interview Application

When asked "How would you design an LLM-powered system?":

1. Start with orchestration - How do agents coordinate?

2. Define agent boundaries - What is each agent responsible for?

3. Design for failure - Fallbacks at every layer

4. Add quality gates - Validation before user-facing output

5. Instrument everything - Observability from day one

The differentiator: Showing you understand that LLM reliability requires explicit validation, not just "prompt engineering".

Performance Characteristics

Metric	Value	Optimization
Average latency	47s	Caching reduces to 5s for repeated queries
Cache hit rate	70%	Redis + local LRU
Critic pass rate	85% first try	Prompt improvements over time
Cost per request	~$0.02	Mostly Gemini API

*This analysis is part of the AI Ingredient Scanner project. See the [Architecture documentation](/docs/ingredient-scanner/architecture) for implementation details.*