OCR & Translation

The AI Ingredient Scanner supports multi-language ingredient labels, automatically detecting and translating non-English text using Gemini Vision.


Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    OCR & TRANSLATION FLOW                        β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚   Image     β”‚ β†’  β”‚   Gemini    β”‚ β†’  β”‚   Language      β”‚      β”‚
β”‚  β”‚   Capture   β”‚    β”‚   Vision    β”‚    β”‚   Detection     β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚                                                 β”‚                β”‚
β”‚                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                                    β”‚ English?                β”‚  β”‚
β”‚                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                 β”‚                β”‚
β”‚                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚                               β”‚ YES             β”‚ NO          β”‚ β”‚
β”‚                               β–Ό                 β–Ό             β”‚ β”‚
β”‚                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”‚
β”‚                        β”‚ Return Text β”‚   β”‚  Translate  β”‚     β”‚ β”‚
β”‚                        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚
β”‚                               β”‚                 β”‚             β”‚ β”‚
β”‚                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚ β”‚
β”‚                                        β–Ό                      β”‚ β”‚
β”‚                                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚ β”‚
β”‚                                 β”‚  Analysis   β”‚               β”‚ β”‚
β”‚                                 β”‚  Pipeline   β”‚               β”‚ β”‚
β”‚                                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Supported Languages

LanguageDetection Headers
EnglishIngredients:, INGREDIENTS:
FrenchIngrΓ©dients:, COMPOSITION:
SpanishIngredientes:
GermanInhaltsstoffe:, Zutaten:
ItalianIngredienti:
Koreanμ„±λΆ„:, μ „μ„±λΆ„:
Japaneseζˆεˆ†:, ε…¨ζˆεˆ†:
Chineseζˆεˆ†:, 配料:
PortugueseIngredientes:

How It Works

Step 1: Image Capture

The mobile app captures the ingredient label image and encodes it as base64:

// Mobile App
const takePicture = async () => {
  const photo = await cameraRef.current.takePictureAsync({
    base64: true,
    quality: 0.8,
  });

  // Send to backend
  const response = await api.post('/ocr', {
    image: photo.base64,
  });

  setIngredients(response.data.text);
};

Step 2: OCR with Language Detection

The backend uses Gemini Vision to extract ingredients and detect the language:

# Backend OCR Prompt
"""
Find and extract ONLY the ingredient list from this product label image.

INSTRUCTIONS:
1. Look for ingredient list headers in ANY language:
   - English: "Ingredients:", "INGREDIENTS:"
   - French: "IngrΓ©dients:", "COMPOSITION:"
   - Korean: "μ„±λΆ„:", "μ „μ„±λΆ„:"
   - (and others...)

2. Extract the complete list of ingredients

OUTPUT FORMAT:
First line: LANGUAGE_DETECTED: <language code>
Second line onwards: The extracted ingredients
"""
Example Response (Korean Label)
LANGUAGE_DETECTED: ko
μ •μ œμˆ˜, 글리세린, λ‚˜μ΄μ•„μ‹ μ•„λ§ˆμ΄λ“œ, λΆ€ν‹Έλ ŒκΈ€λΌμ΄μ½œ

Step 3: Translation (if needed)

Non-English ingredients are translated while preserving scientific names:

def _translate_ingredients_to_english(client, ingredients_text):
    """Translate non-English ingredient text to English."""

    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=f"""
You are an expert translator specializing in cosmetic and food ingredients.

TASK: Translate the following ingredient list to English.

INSTRUCTIONS:
1. Translate each ingredient name to its standard English equivalent
2. Keep scientific/INCI names unchanged:
   - "Aqua" stays "Aqua"
   - "Sodium Lauryl Sulfate" stays the same
3. Translate common ingredient names:
   - "Eau" β†’ "Water"
   - "μ •μ œμˆ˜" β†’ "Purified Water"
4. Preserve the comma-separated format
5. Return ONLY the translated ingredient list

INGREDIENT LIST TO TRANSLATE:
{ingredients_text}

TRANSLATED INGREDIENTS:"""
    )

    return response.text.strip()
Translation Example
Input: μ •μ œμˆ˜, 글리세린, λ‚˜μ΄μ•„μ‹ μ•„λ§ˆμ΄λ“œ
Output: Purified Water, Glycerin, Niacinamide

Step 4: Analysis

The translated English ingredients proceed through the normal analysis pipeline.


Backend Implementation

OCR Endpoint

@app.post("/ocr")
async def extract_text_from_image(request: OCRRequest):
    client = genai.Client(api_key=settings.google_api_key)

    # Decode image
    image_data = base64.b64decode(request.image)
    image_part = genai.types.Part.from_bytes(
        data=image_data,
        mime_type="image/jpeg"
    )

    # Extract ingredients with language detection
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=[image_part, OCR_PROMPT]
    )

    # Parse language
    lines = response.text.strip().split('\n', 1)
    detected_language = "en"
    ingredients_text = response.text

    if lines[0].startswith("LANGUAGE_DETECTED:"):
        detected_language = lines[0].replace("LANGUAGE_DETECTED:", "").strip().lower()
        ingredients_text = lines[1].strip() if len(lines) > 1 else ""

    # Translate if non-English
    if detected_language != "en" and detected_language != "none":
        ingredients_text = _translate_ingredients_to_english(client, ingredients_text)

    return OCRResponse(success=True, text=ingredients_text)

Mobile Integration

OCR Service

// services/ocr.ts
import { File } from 'expo-file-system';
import api from './api';

export async function extractIngredients(imageUri: string): Promise<string> {
  // Read image as base64
  const imageFile = new File(imageUri);
  const base64Image = await imageFile.base64();

  // Send to backend
  const response = await api.post('/ocr', {
    image: base64Image,
  });

  if (response.data?.text) {
    return cleanIngredientText(response.data.text);
  }

  return '';
}

function cleanIngredientText(text: string): string {
  return text
    .replace(/\s+/g, ' ')           // Normalize whitespace
    .replace(/ingredients?\s*:?\s*/gi, '')  // Remove headers
    .trim();
}

Usage in HomeScreen

const handleCapture = async (base64Image: string) => {
  setLoading(true);

  try {
    const extractedText = await extractIngredients(base64Image);
    setIngredients(extractedText);
    setShowCamera(false);
  } catch (error) {
    Alert.alert('OCR Failed', 'Could not extract ingredients from image');
  } finally {
    setLoading(false);
  }
};

Best Practices

Image Quality

Good
  • Good lighting
  • Steady camera
  • Clear focus on ingredient list
  • Minimal background noise
Avoid
  • Blurry images
  • Partial ingredient lists
  • Glare or reflections
  • Poor lighting

Error Handling

try {
  const ingredients = await extractIngredients(imageUri);

  if (!ingredients) {
    // Allow manual input
    Alert.alert(
      'No Ingredients Found',
      'Please enter ingredients manually or try a clearer photo'
    );
  }
} catch (error) {
  if (error.response?.status === 404) {
    // OCR endpoint not available
    console.log('Using manual input mode');
  } else {
    throw error;
  }
}

Troubleshooting

OCR Not Extracting Text
  • Ensure good lighting conditions
  • Hold camera steady
  • Frame just the ingredient list
  • Try selecting from gallery for clearer images
Translation Errors
  • Scientific/INCI names should remain unchanged
  • Check if the language is supported
  • Very rare ingredients may not translate correctly
Network Issues
const api = axios.create({
  baseURL: API_BASE_URL,
  timeout: 120000,  // 2 minutes for OCR + translation
});

Related Documentation