Cost Optimization - Arcbeam Documentation

Cost optimization is about making your AI system cost-effective while maintaining good quality. AI can get expensive quickly, but there are many ways to reduce costs without users noticing any difference.

Poor optimization can waste 50-80% of your AI budget

Why AI Costs Matter

AI Is Different from Traditional Software

Traditional software is like owning a car (fixed costs), while AI is like taking taxis everywhere (per-trip costs that add up).

Traditional Software	AI Systems
Fixed cost model	Variable cost model
You pay for servers once	You pay per use (every question costs money)
Costs are predictable	Costs grow with usage
Scaling doesn’t change costs much	More users = significantly higher costs
Budget is stable and forecastable	Poor optimization can waste 50-80% of budget

Where AI Costs Come From

The Four Main Cost Categories

AI Model Calls

60-70% of total costs

Every time you ask the AI a question, you pay
Charged based on how much text it processes
More expensive models cost more
Longer conversations cost more

Document Processing

15-25% of total costs

Converting documents to searchable format
Happens when you add new content
One-time cost per document
Can add up with large document sets

Database and Storage

10-15% of total costs

Storing your documents
Running search databases
Server costs
Network/bandwidth

Other Operational Costs

5-10% of total costs

Monitoring tools
Development time
Quality review
Testing

Example Monthly Breakdown

A typical AI chatbot processing 10,000 questions/day:

Cost Category	Monthly Cost	Percentage
Total Monthly Cost	$3,000	100%
AI model calls	$2,000	67%
Document processing	$500	17%
Database/storage	$400	13%
Other operational costs	$100	3%

Strategies to Reduce Costs

1. Use Cheaper Models When Possible

Not every question needs the most expensive AI model.

Model Pricing Comparison

Model Tier	Examples	Cost per 1,000 Questions	Best For
Premium	GPT-4, Claude Opus	$5-15	Complex analysis, critical decisions
Mid-tier	GPT-4o mini, Claude Sonnet	$0.50-2	Most use cases, balanced quality
Basic	GPT-3.5, Claude Haiku	$0.20-0.50	Simple FAQs, basic facts

Key insight: Most questions (70-80%) are simple and work fine with cheaper models - that’s a 20x cost reduction opportunity!

Simple Question	Complex Question
”What are your business hours?"	"Explain the differences between your three subscription plans and recommend one based on my usage patterns”
Use basic model	Use premium model
Cost: $0.0001	Cost: $0.002
Perfect for FAQs and simple facts	Needed for analysis and recommendations

Potential savings: 50-80% of model costs by routing questions to appropriate models

2. Keep Conversations Shorter

The AI reads the entire conversation history each time it responds. Long conversations get expensive.

Turn	Without Optimization	With Optimization
Turn 1	200 words → $0.001	200 words → $0.001
Turn 5	1,000 words → $0.005	400 words → $0.002
Turn 10	2,000 words → $0.010	600 words → $0.003
Turn 20	4,000 words → $0.020	800 words → $0.004
Result	Cost doubles every few turns!	Costs stay manageable!

Optimization strategies:

Summarize old messages (keep last 5-10 in detail)
Remove unnecessary context
Start fresh after certain time/messages
Don’t repeat information

Potential savings: 60-70% on long conversations

3. Limit Response Length

Longer AI responses cost more. Most users don’t need 500-word answers. Question: “What’s your refund policy?”

Aspect	Unoptimized Response	Optimized Response
Word count	500 words	100 words
Content	Detailed explanation, multiple examples, edge cases	Concise answer, key points only, clear and direct
Cost	$0.005	$0.001
Savings	-	80% per question

Win-win optimization: Users prefer shorter, clearer answers anyway - lower costs AND better user experience!

Implementation tips:

Set reasonable length limits (100-200 words for most answers)
Ask AI to be concise in your system prompts
Test to ensure quality isn’t sacrificed

Potential savings: 50-70% on output costs

4. Cache Common Questions

If people ask the same questions repeatedly, save and reuse answers. Scenario: 10 people ask “What are your business hours?”

Metric	Without Caching	With Caching
First request	$0.001	$0.001 (generate & save)
Next 9 requests	$0.001 each	$0.000 (return saved)
Total cost	$0.01	$0.001
Computation	Same computation 10 times	Computed once, reused 9 times
Savings	-	90%

What to cache:

FAQs (e.g., “How do I reset my password?”)
Common product questions
Policies and procedures
Anything asked multiple times

Cache freshness guidelines:

Keep cached answers for 1-24 hours depending on content type
Update when information changes
Review cache hit rate to measure effectiveness

Typical impact: 20-40% overall cost reduction

5. Search Fewer Documents

When AI searches your knowledge base, each document adds cost.

Step	Expensive Approach	Optimized Approach
Search	Search for 20 documents	Search for 20 documents
Selection	Include all 20 in AI context	Pick best 5 to include
Word count	8,000 words	2,000 words
Cost	$0.020	$0.005
Savings	-	75%

Quality over quantity: 5 well-chosen documents often provide better answers than 20 mediocre ones

Document selection guidelines:

Start with 3-5 documents
Only increase if quality suffers
Use relevance scoring to pick the best matches
Test to find the optimal number for your use case

Potential savings: 60-75% on retrieval costs

6. Use Cheaper Document Processing

Converting text to searchable format costs money. Use efficient methods.

Embedding Model Cost Comparison

Model Type	Cost per Million Words	Best For
Premium embedding	$0.13	Specialized domains requiring highest accuracy
Standard embedding	$0.02	95% of use cases (recommended)
Basic embedding	$0.10	Simple keyword matching

Recommendation: Standard embedding models work for 95% of use cases at a fraction of the cost

Additional optimization strategies:

Remove duplicate content before processing
Don’t re-process unchanged documents
Batch process instead of one-at-a-time
Use incremental updates for document changes

Typical savings: 30-40% on document processing costs

7. Optimize Database Costs

Your vector database doesn’t need to be oversized.

Metric	Over-Provisioned Database	Right-Sized Database
Monthly cost	$500	$100
Capacity	Handles 1M queries/day	Handles 50k queries/day
Actual usage	10k queries/day	10k queries/day
Utilization	1% (99% unused capacity)	20% (plenty of headroom for growth)
Efficiency	Wasting money on unused resources	Optimized for actual needs
Savings	-	$400/month

Questions to guide right-sizing:

How many searches per day do you actually need?
How much data are you storing?
What’s your growth projection for the next 6-12 months?
Are you using a managed service when self-hosted would work?

Potential savings: 30-60% on infrastructure costs

8. Compress and Archive Old Data

Not all data needs to be instantly accessible.

Aspect	Active Data (Last 3 Months)	Archived Data (Older)
Storage type	Fast database	Cheap cold storage
Monthly cost	$200	$20
Access pattern	Used frequently	Accessed rarely
Performance	Needs quick access, optimized for speed	Slower retrieval is acceptable
Savings	-	$180/month

Archiving best practices:

Archive data after 3-6 months of inactivity
Compress before archiving to save additional storage costs
Keep a lightweight search index for archived data
Set up retrieval process for rare access needs (slower but acceptable)

Monitoring and Tracking Costs

Setting Up Alerts

Alert Type	Trigger	Action Required
Budget Alerts
Daily budget exceeded	Spending > daily limit	Immediate action needed
High daily spending	Approaching 80% of daily budget	Warning - review today
Weekly overspend	Week running 20% over expected	Review needed
Monthly trend	Month trending over budget	Time to optimize
Pattern Alerts
Expensive query	Single query cost > $1	Investigate this query
Cost spike	Average cost increased 50%	Something changed - review
Feature spike	Specific feature spiking	Potential issue - check logs
Cache degradation	Cache hit rate dropped	Check cache config

Understanding Your Costs

Example: Cost Breakdown by Query Type

Query Type	Monthly Cost	Percentage	Action
Product questions	$800	40%	Optimize first - biggest cost driver
Support questions	$600	30%	Second priority for optimization
General chat	$400	20%	Consider limiting conversation length
Other	$200	10%	Monitor for patterns

Key insight: Focus optimization efforts on the highest-cost categories first for maximum impact

Example: Cost Breakdown by User Segment

User Type	Monthly Cost	Percentage	Consideration
Free users	$1,200	60%	Consider usage limits or conversion prompts
Paid users	$800	40%	Ensure quality experience is maintained

Analyzing costs by user segment helps you make informed decisions about feature access and pricing tiers

Cost Optimization Checklist

Quick Wins (Week 1)

Easy changes - Start here for immediate impact

Switch to cheaper models for simple queries

Use mid-tier or basic models for FAQ-style questions instead of premium models. Most questions (70-80%) don’t need the most expensive AI.

Cache answers to frequently asked questions

Implement caching for common questions to avoid regenerating the same answers repeatedly.

Set AI response length to 200 words max

Limit output length to reduce costs. Users prefer concise answers anyway.

Reduce retrieved documents from 10 to 5

Decrease the number of documents included in context. Quality over quantity.

Use standard embedding models

Switch from premium to standard embedding models - they work for 95% of use cases.

Expected savings: 40-60% reduction in costs

Medium Effort (Weeks 2-4)

More involved optimizations for additional savings

Conversation Summarization

Summarize long conversations to reduce context size and token usage

Intelligent Caching

Build smart caching system for common query patterns

Query Routing

Route queries through cache → simple model → complex model hierarchy

Right-Size Database

Optimize database resources to match actual usage patterns

Remove Duplicates

Clean up duplicate content before processing

Batch Processing

Process new documents in batches instead of one-at-a-time

Expected additional savings: 20-30% reduction in costs

Advanced (Months 2-3)

Sophisticated optimizations for mature systems

Query Complexity Classifier

Automatically classify query complexity to route to appropriate model tiers

Cascading Model Approach

Try cheap model first, upgrade to premium only if needed

Data Archiving

Move old, rarely-accessed data to cold storage

Database Index Optimization

Fine-tune database indexes for better performance

Custom Model Fine-Tuning

Train specialized models for specific high-volume tasks

Expected additional savings: 10-20% reduction in costs

Calculating Expected Costs

Simple Cost Estimation

Questions to answer for your estimation:

How many queries per day? (example: 10,000)
Average question length? (example: 50 words)
Average answer length? (example: 150 words)
Documents needed per query? (example: 5)
Which model? (example: GPT-4o mini)

Example Calculation Walkthrough

Scenario: 10,000 queries/day using GPT-4o mini

Component	Calculation	Result
Per Query Breakdown
Question	50 words = ~70 tokens	70 tokens
Context (5 documents)	500 words = ~650 tokens	650 tokens
Answer	150 words = ~200 tokens	200 tokens
Total per query		920 tokens

Cost Breakdown
Input cost	720 tokens at $0.15/1M	$0.0001
Output cost	200 tokens at $0.60/1M	$0.0001
Cost per query		$0.0002

Scaling
Daily cost	10,000 × $0.0002	$2
Monthly AI calls	$2 × 30 days	$60
Infrastructure	Database + hosting	$150
Total monthly		$210

Comparing Scenarios

Scenario	Optimizations	Monthly Cost	Savings	% Saved
A: No optimization	Premium model for everything 10 documents per query No caching No length limits	$3,000	-	-
B: Basic optimization	Mid-tier model for most queries 5 documents per query Cache common questions 200-word limit	$800	$2,200	73%
C: Advanced optimization	Smart model routing 3-5 documents (optimized) Aggressive caching Conversation summarization	$400	$2,600	87%

Common Mistakes

Mistake	Why It’s Wrong	Better Approach
Using Premium Models for Everything ”We’ll just use GPT-4 for all queries to ensure quality”	80% of queries are simple and don’t need premium models - you’re paying 20x more for minimal quality gain	Use mid-tier for most queries, premium only when needed. Test to see if users notice any difference
Not Monitoring Costs Set up AI, never check costs until the bill arrives	Costs can spiral quickly and small issues become expensive problems	Daily cost monitoring, weekly reviews, monthly analysis - catch issues early
Optimizing Without Measurement ”This should reduce costs” without testing actual impact	You don’t know if optimization worked or if you hurt quality	Measure before and after, track both costs AND quality metrics
Sacrificing Quality for Cost Make AI so cheap it becomes useless	Users leave, defeating the entire purpose	Find the right balance - cut costs where users don’t notice, preserve quality where they do

These mistakes can waste 50-80% of your AI budget or harm user experience. Always balance cost optimization with quality maintenance.

Getting Started

Week 1: Understand Current Costs

Measure your baseline:

What’s your current monthly bill?
Cost per query?
Most expensive query types?
Where is money going?

Week 2: Implement Quick Wins

Easy optimizations:

Switch to mid-tier model
Add response length limits
Cache common questions
Reduce retrieval documents

Measure impact:

Did costs decrease?
By how much?
Any quality issues?

Week 3: Monitor and Adjust

Track results:

Cost savings achieved
User satisfaction maintained?
Any new issues?
Where to optimize next?

Week 4: Plan Long-term

Set up ongoing optimization:

Regular cost reviews
Budget alerts
Quality monitoring
Continuous improvement

Next Steps

Model Selection

Choose cost-effective models for your use case

Context Management

Manage context to reduce token usage

Observability

Monitor costs and identify optimization opportunities

Data Processing

Optimize data processing for efficiency

​Why AI Costs Matter

​AI Is Different from Traditional Software

​Where AI Costs Come From

​The Four Main Cost Categories

AI Model Calls

Document Processing

Database and Storage

Other Operational Costs

​Example Monthly Breakdown

​Strategies to Reduce Costs

​1. Use Cheaper Models When Possible

​Model Pricing Comparison

​2. Keep Conversations Shorter

​3. Limit Response Length

​4. Cache Common Questions

​5. Search Fewer Documents

​6. Use Cheaper Document Processing

​Embedding Model Cost Comparison

​7. Optimize Database Costs

​8. Compress and Archive Old Data

​Monitoring and Tracking Costs

​Setting Up Alerts

​Understanding Your Costs

​Example: Cost Breakdown by Query Type

​Example: Cost Breakdown by User Segment

​Cost Optimization Checklist

​Quick Wins (Week 1)

​Medium Effort (Weeks 2-4)

Conversation Summarization

Intelligent Caching

Query Routing

Right-Size Database

Remove Duplicates

Batch Processing

​Advanced (Months 2-3)

Query Complexity Classifier

Cascading Model Approach

Data Archiving

Database Index Optimization

Custom Model Fine-Tuning

​Calculating Expected Costs

​Simple Cost Estimation

​Example Calculation Walkthrough

​Comparing Scenarios

​Common Mistakes

​Getting Started

​Next Steps

Model Selection

Context Management

Observability

Data Processing

Why AI Costs Matter

AI Is Different from Traditional Software

Where AI Costs Come From

The Four Main Cost Categories

Example Monthly Breakdown

Strategies to Reduce Costs

1. Use Cheaper Models When Possible

Model Pricing Comparison

2. Keep Conversations Shorter

3. Limit Response Length

4. Cache Common Questions

5. Search Fewer Documents

6. Use Cheaper Document Processing

Embedding Model Cost Comparison

7. Optimize Database Costs

8. Compress and Archive Old Data

Monitoring and Tracking Costs

Setting Up Alerts

Understanding Your Costs

Example: Cost Breakdown by Query Type

Example: Cost Breakdown by User Segment

Cost Optimization Checklist

Quick Wins (Week 1)

Medium Effort (Weeks 2-4)

Advanced (Months 2-3)

Calculating Expected Costs

Simple Cost Estimation

Example Calculation Walkthrough

Comparing Scenarios

Common Mistakes

Getting Started

Next Steps