Context Management - Arcbeam Documentation

Context management is about controlling what your AI “remembers” during conversations. Just like talking with a friend who has a limited memory, you need to decide what to keep and what to forget.

Imagine explaining a situation to someone, but you can only speak for 5 minutes total. Do you repeat the entire conversation from the beginning each time, or do you summarize what’s important and focus on recent topics?

Understanding AI Memory

How AI “Remembers”

When you have a conversation with an AI, it doesn’t actually remember in the way humans do. Instead, for each response, you send it:

The Context

Everything you want it to know

Conversation History

The conversation history so far

Current Question

The current question or prompt

The challenge: AIs have a limit on how much they can process at once. Every piece of information you include:

Costs money (charged per word/token)
Takes time to process (slower responses)
Can dilute the AI’s focus (harder to find relevant info)

Context Window Limits

Every AI model has a maximum amount of text it can handle at once, called a “context window.” What this means in practice:

A typical conversation uses 100-500 words per exchange
A 10-turn conversation might be 3,000-5,000 words total
Add documents for reference, and you can quickly approach limits

Why Context Management Matters

The Cost Problem

Every message costs money. If your conversation is 1,000 words long, you pay to process all 1,000 words. After 10 exchanges, you’re paying to process the same early messages over and over. Cost growth example:

Turn	Context Size	Approximate Cost
Turn 1	200 words	$0.01
Turn 5	1,000 words	$0.05
Turn 10	2,000 words	$0.10
Turn 20	4,000 words	$0.20

Without management, costs grow linearly with conversation length.

The Quality Problem

Too much context hurts quality:

AI models can get “lost” in long conversations
They may miss important information buried in the middle
Responses become slower and less focused
Old, irrelevant information can confuse the AI

Think of it like: Giving someone a 50-page document and asking them to find one specific fact. Even if it’s there, they might miss it or take forever to find it.

The Performance Problem

Longer context = slower responses:

More text to process means more computation
Users waiting 5-10 seconds for responses may give up
Real-time applications become unusable

Response times directly impact user experience and satisfaction.

Common Context Management Strategies

Strategy 1: Keep Everything

How it works: Store and send the entire conversation history every time. Good for:

Very short conversations (3-5 exchanges)
When you absolutely need all context
Low-volume applications where cost isn’t critical

Problems with this approach:

Costs grow with every message
Eventually hits context limits
Gets slower over time
Not sustainable for long conversations

Strategy 2: Sliding Window

How it works: Only keep the last N messages (like the last 10 exchanges).

A conversation where you only remember what was said in the last 5 minutes. Anything older is forgotten.

Good for:

Customer support (usually resolved in a few messages)
Task-focused conversations
When older context isn’t needed

Example window sizes:

5 messages

Very short memory, minimal cost

10 messages

Balanced for most conversations

20 messages

Longer memory for complex topics

Benefits	Trade-offs
Predictable, controlled costs	Completely forgets old information
Simple to understand and implement	Can be confusing if users reference earlier topics
Consistent performance	Fixed window may not fit all conversation types

Strategy 3: Summarization

How it works: Keep recent messages as-is, but summarize older parts of the conversation.

Taking detailed notes for the last few minutes of a meeting, but having a one-paragraph summary of what happened in the first hour.

Good for:

Long conversations where early context matters
Technical support that builds on previous issues
Educational or tutoring applications

How it typically works:

Keep recent messages

Store the last 4-6 messages in full detail

Summarize older messages

Compress everything before that into a brief paragraph

Include summary

Add the summary as context for the AI

Benefits	Trade-offs
Retains important information from early conversation	Summarization itself costs money and time
More context-aware than sliding window	May lose nuance or specific details
Costs are controlled but flexible	Slightly more complex to implement

Strategy 4: Semantic Filtering

How it works: Analyze which past messages are relevant to the current question and only include those.

When answering a question, only reminding someone of the parts of the conversation that relate to the current topic.

Good for:

Conversations that jump between topics
Long, multi-topic discussions
Applications where context relevance is critical

Example scenario: If someone asks “What were the shipping costs we discussed?”, the system:

Finds messages about shipping
Ignores messages about product features, returns, etc.
Includes only relevant messages + recent context

Benefits	Trade-offs
Very efficient use of context	Most complex to implement
Highly relevant responses	Requires additional processing to determine relevance
Adapts to conversation flow	May miss context that seems irrelevant but isn’t

Memory Systems for AI

Short-Term Memory
Long-Term Memory
Combining Both

What it is: What the AI remembers during your current conversation.Typical approach:

Store the conversation in memory while the user is active
Clear it when the user closes the chat or session ends
Usually keeps last 10-20 exchanges

When to use:

Most chatbots and assistants
Support conversations
Any single-session interaction

Key characteristic: Temporary - clears when the session ends.

Managing Retrieved Documents (RAG Systems)

When your AI searches through documents to answer questions, you face additional context challenges.

The Document Context Problem

Example scenario: User asks “What’s your return policy?”

What the System Finds

20 potentially relevant document sections
Each section is 200-500 words
Total: 4,000-10,000 words of retrieved content
Plus conversation history: 1,000-2,000 words

The Challenge

You can’t send all retrieved documents to the AI - it’s too much context. You need to be selective about which documents to include.

Strategies for Document Context

Limit Number of Documents

Only use top 3-5 most relevant sections. Most answers don’t need more than this.

Rank and Filter

Score each retrieved section for relevance. Only include those above a threshold for better quality and less noise.

Token Budget Approach

Set a limit (e.g., 3,000 words for documents). Add highest-ranked documents until you hit the limit to ensure you don’t exceed capacity.

Chunk Strategically

Break long documents into smaller, focused sections. Each section answers a specific question, making it easier to select just what’s needed.

Session Management

What is a Session?

A session is a single conversation period. It starts when a user begins chatting and ends when they leave or after a period of inactivity.

Session Timeout

The problem: If someone stops chatting for 30 minutes, should the AI remember the old conversation when they return? Common timeout strategies:

Short Timeout

5-15 minutesGood for customer support and task completion where context is time-sensitive.“If you’ve been away, we’ll start fresh”

Long Timeout

1-4 hoursGood for research and complex tasks where users might need breaks but want continuity.“Welcome back, we were discussing…”

No Timeout

PersistentGood for long-term projects and personal assistants where context is always relevant.“I remember our conversation from yesterday about…”

Session Storage

Where conversation history is kept:

In-Memory

Temporary storage

Fast access
Lost if server restarts
Good for short sessions and low-cost applications

Database

Persistent storage

Survives server restarts
Can be retrieved later
Good for long-term memory and important conversations

Hybrid

Best of both

Active sessions in memory (fast)
Inactive sessions in database (persistent)
Optimal performance and reliability

Practical Tips by Use Case

Customer Support Bots

Recommended approach:

Use sliding window with 10-message history
15-minute session timeout
Don’t store long-term (privacy)
Include retrieved help articles within 2,000-word budget

Why this works: Most support issues resolve quickly, users value privacy, and cost efficiency matters at scale.

Personal Assistants

Recommended approach:

Use summarization for conversations over 10 exchanges
Store important preferences and facts long-term
2-hour session timeout
Maintain context across days/weeks

Why this works: Users expect personalization, conversations may span multiple sessions, and relationships build over time.

Educational/Tutoring Apps

Recommended approach:

Use summarization to track learning progress
Store learning history and preferences long-term
1-hour session timeout
Keep student progress and misconceptions in context

Why this works: Learning builds on previous knowledge, personalization improves outcomes, and progress tracking is essential.

Document Q&A Systems

Recommended approach:

Short conversation history (5 messages)
Focus context budget on retrieved documents
30-minute session timeout
Don’t need much conversation memory

Why this works: Each question is often independent, and document content is more important than chat history.

Monitoring Your Context Usage

What to Track

Context Size Metrics

Average words/tokens per conversation
Maximum context size reached
How often you hit limits

Cost Metrics

Cost per conversation
Cost per message
Total daily/monthly costs

Quality Metrics

Are users satisfied with responses?
Do users repeat information (sign AI forgot)?
Response times

Warning Signs

Context is Too Large

Signs:

Costs are higher than expected
Responses are slow
Users complain about speed

Solutions:

Reduce context window size
Summarize more aggressively
Use sliding window instead of keeping everything

Context is Too Small

Signs:

AI asks for information users already provided
Users complain AI “forgets” things
Quality drops mid-conversation

Solutions:

Increase context window
Keep more conversation history
Use summarization instead of truncation

Quick Fixes

If responses are slow: Reduce context size, limit retrieved documents, or use a faster model with smaller context requirements.

Common Mistakes to Avoid

Sending Entire Conversation Every Time

The mistake: Never managing context, just appending to history.Why it’s wrong: Costs spiral, performance degrades, and you eventually hit limits.Better approach: Choose a strategy (sliding window, summarization) from the start.

Too Aggressive Truncation

The mistake: Only keeping last 2-3 messages to save costs.Why it’s wrong: AI can’t follow conversation flow and asks users to repeat themselves.Better approach: Find balance - usually 8-12 messages minimum for coherent conversations.

Ignoring Session Boundaries

The mistake: Treating all conversations as one continuous session.Why it’s wrong: Confusion when users return hours/days later, privacy issues, and resource waste.Better approach: Define clear session timeouts and start fresh when appropriate.

Not Monitoring Costs

The mistake: Set up context management once and never check costs.Why it’s wrong: Usage patterns change, costs can creep up, and you miss optimization opportunities.Better approach: Track costs weekly, review strategy monthly, and adjust as needed.

Getting Started

Establish Baseline (Week 1)

Understand your current situation:

How long are your conversations typically?
What’s your average cost per conversation?
Are users complaining about anything?
Do you have session timeouts?

Gather data before making changes to understand what needs improvement.

Choose a Strategy (Week 2)

Based on your use case:

Short conversations (3-5 turns): Keep everything, it’s fine
Medium conversations (5-15 turns): Start with sliding window
Long conversations (15+ turns): Use summarization
Multi-session: Implement session timeouts

Select the approach that best fits your conversation patterns and business needs.

Implement and Test (Week 3)

Set up your chosen strategy:

Start conservative (keep more context)
Test with real users
Monitor quality and costs
Gather feedback

It’s easier to reduce context later than to explain why the AI forgot important information.

Optimize (Week 4)

Refine based on data:

Adjust window size or summary frequency
Optimize session timeouts
Balance cost vs. quality
Document your decisions

Make incremental changes and measure their impact before making additional adjustments.

Next Steps

Observability

Monitor context usage across your AI system

Cost Optimization

Reduce costs through better context management

Model Selection

Choose models with appropriate context windows

Data Processing

Process data to fit within context limits

​Understanding AI Memory

​How AI “Remembers”

The Context

Conversation History

Current Question

​Context Window Limits

​Why Context Management Matters

​The Cost Problem

​The Quality Problem

​The Performance Problem

​Common Context Management Strategies

​Strategy 1: Keep Everything

​Strategy 2: Sliding Window

5 messages

10 messages

20 messages

​Strategy 3: Summarization

​Strategy 4: Semantic Filtering

​Memory Systems for AI

​Managing Retrieved Documents (RAG Systems)

​The Document Context Problem

What the System Finds

The Challenge

​Strategies for Document Context

Limit Number of Documents

Rank and Filter

Token Budget Approach

Chunk Strategically

​Session Management

​What is a Session?

​Session Timeout

Short Timeout

Long Timeout

No Timeout

​Session Storage

In-Memory

Database

Hybrid

​Practical Tips by Use Case

Customer Support Bots

Personal Assistants

Educational/Tutoring Apps

Document Q&A Systems

​Monitoring Your Context Usage

​What to Track

Context Size Metrics

Cost Metrics

Quality Metrics

​Warning Signs

Context is Too Large

Context is Too Small

​Quick Fixes

​Common Mistakes to Avoid

Sending Entire Conversation Every Time

Too Aggressive Truncation

Ignoring Session Boundaries

Not Monitoring Costs

​Getting Started

​Next Steps

Observability

Cost Optimization

Model Selection

Data Processing

Understanding AI Memory

How AI “Remembers”

Context Window Limits

Why Context Management Matters

The Cost Problem

The Quality Problem

The Performance Problem

Common Context Management Strategies

Strategy 1: Keep Everything

Strategy 2: Sliding Window

Strategy 3: Summarization

Strategy 4: Semantic Filtering

Memory Systems for AI

Managing Retrieved Documents (RAG Systems)

The Document Context Problem

Strategies for Document Context

Session Management

What is a Session?

Session Timeout

Session Storage

Practical Tips by Use Case

Monitoring Your Context Usage

What to Track

Warning Signs

Quick Fixes

Common Mistakes to Avoid

Getting Started

Next Steps