Multi-Agent Architecture: Orchestration and Quality Control
A Principal TPM analysis of designing multi-agent AI systems with LangGraph - architecture decisions, quality validation patterns, and production trade-offs.
Executive Summary
The Ingredient Scanner demonstrates multi-agent AI architecture at production scale. This analysis covers the why behind architectural decisions - particularly valuable for system design interviews involving LLM orchestration.
Key Insight: Multi-agent systems require explicit orchestration. The complexity of coordinating agents is the primary engineering challenge, not the LLM calls themselves.
The Problem Space
User Need: Analyze ingredient lists for safety concerns, personalized to allergies and skin type.
Technical Challenge: No single LLM call can reliably:
1. Research ingredient safety data
2. Generate personalized analysis
3. Validate output quality
Solution: Specialized agents with explicit orchestration.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ LangGraph Workflow │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Research │────▶│ Analysis │────▶│ Critic │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Qdrant │ │ Gemini │ │ 5-Gate │ │
│ │ + Google│ │ 2.0 │ │ Validate│ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ┌───────────────┤ │
│ │ PASS │ FAIL │
│ ▼ ▼ │
│ [Return] [Retry ≤3x] │
│ │
└─────────────────────────────────────────────────────────────┘Agent Responsibilities
Supervisor Agent
Routes workflow based on state:
- Determines which agents to invoke based on current state
- Handles error states and retry logic
- Manages conversation context across turns
Research Agent
Gathers ingredient information with fallback strategy:
| Source | Latency | Coverage | Cost | Use When |
|---|---|---|---|---|
| Qdrant vector search | 50-100ms | 80% | Free tier | Primary (cached data) |
| Google grounded search | 500-1000ms | 95%+ | API cost | Fallback (novel ingredients) |
Design Decision: Try fast path first, fallback on miss. This reduces cost while maintaining coverage.
Analysis Agent
Generates personalized safety reports:
- Gemini 2.0 Flash for generation (fast, cost-effective)
- Structured output with Pydantic validation
- Personalization based on user profile (allergies, skin type, pregnancy status)
Critic Agent
5-Gate Quality Validation - the key differentiator:
| Gate | Check | Failure Action |
|---|---|---|
| 1. Completeness | All ingredients analyzed | Retry with emphasis |
| 2. Accuracy | Safety scores within valid range | Retry with stricter prompt |
| 3. Personalization | Allergen warnings present if applicable | Retry with profile emphasis |
| 4. Format | JSON schema compliance | Retry with format examples |
| 5. Safety Claims | No unsupported medical claims | Edit or reject |
Why 5 Gates?: Each gate catches different failure modes. The Critic catches ~15% of issues that would otherwise reach users.
Key Design Decisions
| Decision | Alternatives Considered | Rationale | Trade-off |
|---|---|---|---|
| LangGraph over LangChain chains | LangChain LCEL, raw async | Better state management, conditional routing, debugging | Learning curve |
| 5-gate Critic | Single validation pass | Quality must be explicit and measurable | Adds latency |
| Vector + Search fallback | Search-only, Vector-only | Balance speed with coverage | Complexity |
| Centralized LLM config | Per-agent config | Single point for model changes and tracing | Less flexibility |
Failure Mode Analysis
| Failure | Detection | Recovery | User Impact |
|---|---|---|---|
| Qdrant timeout | 3s timeout | Fall back to Google search | +500ms latency |
| Gemini rate limit | 429 response | Exponential backoff, queue | Delayed response |
| Critic loop | Retry count >3 | Return with "review needed" flag | Degraded quality |
| Invalid input | Pydantic validation | 422 response with details | Clear error |
Observability Integration
LangSmith provides production-grade observability:
| Capability | Value | Interview Talking Point |
|---|---|---|
| Trace visualization | Debug agent chains | "I can see exactly where failures occur" |
| Latency breakdown | Identify bottlenecks | "Research agent was 70% of latency" |
| Cost tracking | Budget management | "Each request costs ~$0.02" |
| Quality metrics | Critic pass rates | "5-gate validation catches 15% of issues" |
Interview Application
When asked "How would you design an LLM-powered system?":
1. Start with orchestration - How do agents coordinate?
2. Define agent boundaries - What is each agent responsible for?
3. Design for failure - Fallbacks at every layer
4. Add quality gates - Validation before user-facing output
5. Instrument everything - Observability from day one
The differentiator: Showing you understand that LLM reliability requires explicit validation, not just "prompt engineering".
Performance Characteristics
| Metric | Value | Optimization |
|---|---|---|
| Average latency | 47s | Caching reduces to 5s for repeated queries |
| Cache hit rate | 70% | Redis + local LRU |
| Critic pass rate | 85% first try | Prompt improvements over time |
| Cost per request | ~$0.02 | Mostly Gemini API |
*This analysis is part of the AI Ingredient Scanner project. See the [Architecture documentation](/docs/ingredient-scanner/architecture) for implementation details.*