AI Categorization

The Email Assistant uses Google Gemini AI to intelligently categorize incoming emails, helping you focus on what matters most.


How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  CATEGORIZATION PIPELINE                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚  Email  │───▢│ Extract  │───▢│  Build  │───▢│  Gemini   β”‚  β”‚
β”‚   β”‚         β”‚    β”‚ Metadata β”‚    β”‚ Prompt  β”‚    β”‚    AI     β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                        β”‚         β”‚
β”‚                                                        β–Ό         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚  Store  │◀───│  Parse   │◀────────────────│ Response  β”‚    β”‚
β”‚   β”‚ Result  β”‚    β”‚ Category β”‚                 β”‚           β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Email Categories

CategoryDescriptionExamples
Need-ActionRequires your responseMeeting invites, direct questions
FYIInformational onlyStatus updates, notifications
NewsletterSubscriptionsDaily digests, weekly updates
PromotionalMarketing contentSales, offers, advertisements
SocialSocial networksLinkedIn, Twitter notifications

Categorization Logic

Input Processing

The system extracts key information from each email:

email_data = {
    'subject': email['subject'],
    'from': email['from'],
    'snippet': email['snippet'],  # First 100 chars
    'date': email['date'],
}

AI Prompt

CATEGORIZATION_PROMPT = """
Analyze this email and categorize it:

From: {from_address}
Subject: {subject}
Preview: {snippet}

Categories:
1. NEED_ACTION - Requires response or action
2. FYI - Informational, no action needed
3. NEWSLETTER - Subscription content
4. PROMOTIONAL - Marketing/sales
5. SOCIAL - Social network notifications

Return ONLY the category name.
"""

Response Parsing

def parse_category(response: str) -> str:
    """Parse Gemini response to category."""
    response = response.strip().upper()

    categories = {
        'NEED_ACTION': 'Need-Action',
        'FYI': 'FYI',
        'NEWSLETTER': 'Newsletter',
        'PROMOTIONAL': 'Promotional',
        'SOCIAL': 'Social',
    }

    for key, value in categories.items():
        if key in response:
            return value

    return 'FYI'  # Default fallback

Gemini Configuration

Model Settings

{
  "api_settings": {
    "gemini_model": "gemini-2.5-flash-lite",
    "requests_per_minute": 30,
    "max_retries": 3,
    "timeout_seconds": 30
  }
}

Rate Limiting

The system respects API rate limits with a sliding window approach:

class RateLimiter:
    def __init__(self, requests_per_minute: int = 30):
        self.requests_per_minute = requests_per_minute
        self.request_times = []

    def wait_if_needed(self):
        """Wait if rate limit would be exceeded."""
        now = time.time()
        minute_ago = now - 60

        # Remove old requests
        self.request_times = [t for t in self.request_times if t > minute_ago]

        if len(self.request_times) >= self.requests_per_minute:
            sleep_time = self.request_times[0] - minute_ago
            time.sleep(sleep_time)

        self.request_times.append(now)

Caching

Categorization results are cached to minimize API calls:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      CACHING FLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                 β”‚
β”‚   β”‚ New Email β”‚                                                 β”‚
β”‚   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                                 β”‚
β”‚         β”‚                                                        β”‚
β”‚         β–Ό                                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                                 β”‚
β”‚   β”‚ In Cache? β”‚                                                 β”‚
β”‚   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                                 β”‚
β”‚         β”‚                                                        β”‚
β”‚    Yes  β”‚   No                                                   β”‚
β”‚    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                                                  β”‚
β”‚    β”‚         β”‚                                                   β”‚
β”‚    β–Ό         β–Ό                                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                    β”‚
β”‚  β”‚ Return  β”‚  β”‚  Call     β”‚                                    β”‚
β”‚  β”‚ Cached  β”‚  β”‚  Gemini   β”‚                                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                    β”‚
β”‚                      β”‚                                           β”‚
β”‚                      β–Ό                                           β”‚
β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                     β”‚
β”‚               β”‚   Cache   β”‚                                     β”‚
β”‚               β”‚  Result   β”‚                                     β”‚
β”‚               β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                     β”‚
β”‚                      β”‚                                           β”‚
β”‚                      β–Ό                                           β”‚
β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                     β”‚
β”‚               β”‚  Return   β”‚                                     β”‚
β”‚               β”‚ Category  β”‚                                     β”‚
β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                     β”‚
β”‚                                                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cache Configuration

{
  "cache_settings": {
    "enabled": true,
    "max_cached_emails": 30,
    "cache_expiry_hours": 24
  }
}

Cache Benefits

MetricFirst RunCached Run
API Calls10-150-3
Time13-20 sec5-8 sec
Cost~$0.10~$0.00

Error Handling

Retry Logic

def categorize_with_retry(email: dict, max_retries: int = 3) -> str:
    """Categorize email with retry on failure."""
    for attempt in range(max_retries):
        try:
            return call_gemini_api(email)
        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
        except APIError as e:
            logger.error(f"API error: {e}")
            if attempt == max_retries - 1:
                return 'FYI'  # Default on failure

    return 'FYI'

Fallback Behavior

If Gemini API fails:

  1. 1Log the error for debugging
  2. 2Return default category (FYI)
  3. 3Continue processing other emails
  4. 4Report error in metrics dashboard

Accuracy Improvements

Tips for Better Categorization

Complete Metadata

Ensure subject and snippet are available for best results

Consistent Senders

Known senders improve categorization accuracy

Clean Inbox

Reduce spam before processing for better results

Tune Prompts

Adjust prompts for your specific use case

Common Misclassifications

SituationExpectedCommon MistakeFix
Meeting inviteNeed-ActionFYICheck for invite keyword
Bill reminderNeed-ActionNewsletterCheck sender domain
Product updateNewsletterPromotionalCheck subject patterns

Metrics

Track categorization performance in the dashboard:

# Tracked metrics
metrics.record_api_call(
    model='gemini-2.5-flash-lite',
    latency=elapsed_time,
    success=True,
    category=result,
)

Available metrics in the dashboard:

  • API calls made
  • Response times
  • Category distribution
  • Error rates

Related Documentation