Model Selection - Arcbeam Documentation

Model selection is about choosing which AI “brain” to use for different tasks. Just like you wouldn’t use a calculator for writing an essay, different AI models are better suited for different jobs.

Each AI model has trade-offs in terms of quality, speed, and cost. The key is finding the right balance for your use case.

Understanding AI Models

Basic Models

Smart assistants who can handle routine questions

Advanced Models

Subject matter experts who can tackle complex problems

Specialized Models

Specialists trained for specific tasks

Popular AI Model Providers

The major AI model providers offer a range of models with different capabilities and price points:

OpenAI (ChatGPT)
Anthropic (Claude)
Google (Gemini)
Open Source Models

The company behind ChatGPT offers models ranging from the premium GPT-4o to the cost-effective GPT-4o mini. Known for strong general-purpose performance.Visit: OpenAI Platform

Model selection changes frequently. Visit the provider documentation above for the latest models, pricing, and capabilities. Use evaluations to test which models work best for your specific use case.

How to Choose the Right Model

Consider What You Need

Use Case	What to Choose	Example	Recommended Models
Simple, High-Volume Tasks	Cheaper, faster models	Answering basic FAQs, categorizing requests	GPT-4o mini, Claude Haiku 4.5, or Gemini 2.0 Flash-Lite
Complex Reasoning	Premium models	Analyzing contracts, solving complex problems	GPT-4o, Claude Sonnet 4.5, or Gemini 3 Flash
Very Long Documents	Models with large “memory”	Summarizing 100-page reports	Gemini 2.0 Pro (2M tokens) or Gemini 2.0 Flash (1M tokens)
Budget-Conscious Projects	Most cost-effective model that meets quality needs	Start with GPT-4o mini and test if it’s good enough before upgrading	Gemini 3 Flash ( $0.50/$ 3 per 1M tokens) or GPT-4o mini

The Three-Factor Balance

The key: Find the cheapest model that meets your quality and speed requirements.

Every model choice involves balancing three factors:

Quality

How good are the responses?
How often is it correct?
Does it understand nuance?

Speed

How fast does it respond?
Can users wait that long?
Does it meet your performance needs?

Cost

How much does each response cost?
How many responses do you need per day?
Does it fit your budget?

Common Model Selection Strategies

Start with Mid-Tier

Start with a Balanced Model

Begin with a mid-tier model like GPT-4o mini that offers good quality at reasonable cost

Test with Real Questions

Run actual use case questions through the model to see how it performs

Upgrade Only If Needed

Only move to a premium model if quality isn’t meeting your requirements

Downgrade If Possible

Only move to a cheaper model if costs are too high and quality allows

Why this works: 80% of tasks work fine with mid-tier models

Use Different Models for Different Tasks

You don’t need to use the same model for everything: Example for a customer service AI:

Simple FAQ Questions

Use: Cheaper modelReason: Saves money on high volume simple tasks

Complaint Analysis

Use: Premium modelReason: Quality matters more for sensitive issues

Product Recommendations

Use: Mid-tier modelReason: Balance of quality and cost for moderate complexity

Try Before You Commit

Test with Examples

Test with 50-100 example questions that represent your real use case

Compare Models

Compare responses from different models side by side

Check All Factors

Evaluate quality, speed, and estimated cost for each model

Choose Based on Data

Make your decision based on actual test data, not assumptions

Evaluating Model Performance

The best way to choose a model is to test it with real questions from your use case. Create 20-50 test questions that represent what users will actually ask, then compare how different models perform on accuracy, speed, and cost. Quick evaluation checklist:

Are answers accurate and complete?
Is the tone appropriate for your use case?
How fast does each model respond?
What would daily costs be at your expected volume?

For a complete guide on evaluating models systematically, including setting up automated testing and measuring performance over time, see Evaluations.

Managing Models Over Time

Track Performance

Monitor how your chosen model performs: Weekly checks:

User Satisfaction

Track user satisfaction scores and feedback

Error Rates

Monitor how often the model produces errors

Response Times

Ensure response times stay within acceptable range

Costs

Track actual spending against budget

Cost Considerations

Understanding Pricing

AI models typically charge per “token” (roughly 3/4 of a word): What affects your costs:

Input Length

How much context you provide with each request

Output Length

How long the AI’s responses are

Volume

How many requests you make per day

Model Choice

Premium models cost more than standard models

Example calculation:

If you send 1,000 requests per day:

Average input: 500 words = ~650 tokens
Average output: 100 words = ~130 tokens
Using GPT-4o mini: ~$1.50/day
Using GPT-4o: ~$25/day

Ways to Reduce Costs

Use Shorter Prompts

Don’t send unnecessary context - summarize long history instead of including everything

Limit Response Length

If you only need a short answer, specify that - don’t let the model ramble

Choose Appropriate Models

Don’t use premium models for simple tasks that cheaper models can handle

Cache Common Answers

For common questions, save and reuse answers to reduce duplicate processing

Common Mistakes to Avoid

Always Using the Most Expensive Model

The mistake: “We’ll just use the best model for everything to ensure quality”Why it’s wrong: Most tasks don’t need the absolute best model. You’ll spend 10x more for 5% better qualityBetter approach: Test if cheaper models work first. Only upgrade where quality truly matters

Switching Models Without Testing

The mistake: “This new model is supposed to be better, let’s switch immediately”Why it’s wrong: “Better” in general doesn’t mean better for your specific use caseBetter approach: Always test with your actual questions before switching

Ignoring Speed Requirements

The mistake: Focusing only on quality and costWhy it’s wrong: If users have to wait 10 seconds for a response, they’ll leaveBetter approach: Define acceptable wait times upfront and only consider models that meet them

Not Monitoring Performance

The mistake: Choose a model once and forget about itWhy it’s wrong: Models, costs, and your needs all change over timeBetter approach: Review model performance monthly and be ready to optimize

Getting Started

Your First Model Selection

Week 1: Define Requirements

What tasks will your AI handle?
How many requests do you expect per day?
What’s your quality threshold?
What’s your budget?
How fast do responses need to be?

Week 2: Create Test Cases

Gather 30-50 real example questions
Define what “good” answers look like
Include mix of easy and hard questions

Week 3: Test Models

Try 2-3 candidate models
Run your test questions through each
Measure quality, speed, and cost
Pick the best fit for your needs

Week 4: Launch and Monitor

Start with your chosen model
Track real-world performance
Collect user feedback
Adjust if needed

Questions to Ask Your Team

Before choosing:

Question 1: Volume

“How many requests will we process per day/month?”

Question 2: Budget

“What’s our budget for AI costs?”

Question 3: Speed

“How quickly do responses need to be?”

Question 4: Quality Threshold

“What happens if the quality isn’t perfect?”

Question 5: Special Features

“Do we need features like image understanding?”

After launching:

Question 1: Actual Costs

“What’s our actual cost so far?”

Question 2: User Satisfaction

“Are users happy with response quality?”

Question 3: Variance Analysis

“How does this compare to our estimates?”

Question 4: Optimization

“Should we test other models to optimize?”

Next Steps

Context Management

Learn how to manage conversation history

Cost Optimization

Strategies to reduce AI costs

Evaluations

Test and measure model performance

Observability

Monitor model performance across your system

​Understanding AI Models

Basic Models

Advanced Models

Specialized Models

​Popular AI Model Providers

​How to Choose the Right Model

​Consider What You Need

​The Three-Factor Balance

Quality

Speed

Cost

​Common Model Selection Strategies

​Start with Mid-Tier

​Use Different Models for Different Tasks

Simple FAQ Questions

Complaint Analysis

Product Recommendations

​Try Before You Commit

​Evaluating Model Performance

​Managing Models Over Time

​Track Performance

User Satisfaction

Error Rates

Response Times

Costs

​Cost Considerations

​Understanding Pricing

Input Length

Output Length

Volume

Model Choice

​Ways to Reduce Costs

Use Shorter Prompts

Limit Response Length

Choose Appropriate Models

Cache Common Answers

​Common Mistakes to Avoid

Always Using the Most Expensive Model

Switching Models Without Testing

Ignoring Speed Requirements

Not Monitoring Performance

​Getting Started

​Your First Model Selection

​Questions to Ask Your Team

Question 1: Volume

Question 2: Budget

Question 3: Speed

Question 4: Quality Threshold

Question 5: Special Features

Question 1: Actual Costs

Question 2: User Satisfaction

Question 3: Variance Analysis

Question 4: Optimization

​Next Steps

Context Management

Cost Optimization

Evaluations

Observability

Understanding AI Models

Popular AI Model Providers

How to Choose the Right Model

Consider What You Need

The Three-Factor Balance

Common Model Selection Strategies

Start with Mid-Tier

Use Different Models for Different Tasks

Try Before You Commit

Evaluating Model Performance

Managing Models Over Time

Track Performance

Cost Considerations

Understanding Pricing

Ways to Reduce Costs

Common Mistakes to Avoid

Getting Started

Your First Model Selection

Questions to Ask Your Team

Next Steps