What Is Retrieval Quality?
Retrieval quality measures how well your vector database returns relevant documents for user queries. High-quality retrieval means:- Documents are semantically relevant to the query
- Top results contain the information needed
- Relevance scores accurately reflect usefulness
- Retrieved documents lead to good AI responses
Why Retrieval Quality Matters
Poor retrieval is often the root cause of bad AI outputs:- Wrong documents → AI generates incorrect answers
- Missing documents → AI can’t answer or hallucinates
- Low relevance → AI struggles to extract useful information
- Too many documents → Context window wasted on noise
Measuring Retrieval Quality
Relevance Scores
Check the relevance scores for retrieved documents:- View a trace with retrieved documents
- Check relevance scores (0.0 to 1.0)
- Evaluate score distribution:
- High (>0.8): Strong semantic match
- Medium (0.6-0.8): Moderate match
- Low (<0.6): Weak match, likely not useful
User Feedback Correlation
Compare retrieval quality to user feedback:- Filter traces by user feedback (thumbs up/down)
- Check average relevance scores for each group
- If positive feedback correlates with higher scores, retrieval is working well
Retrieved vs Used
Analyze how many retrieved documents are actually used in responses:- Are all retrieved documents relevant?
- Or does the AI ignore some in the final response?
- This indicates if you’re retrieving too many documents
Common Retrieval Issues
Issue: Low Relevance Scores Across the Board
Symptoms: All documents have scores <0.6 Possible causes:- Embedding model mismatch (query vs documents)
- Poor document chunking strategy
- Documents don’t cover user queries
- Use same embedding model for queries and documents
- Improve chunking (better size, overlap)
- Add more relevant documents to knowledge base
Issue: Right Documents, Wrong Order
Symptoms: Relevant docs have low scores, irrelevant ones rank higher Possible causes:- Distance metric not optimal for your data
- Embeddings not capturing semantic meaning well
- Try different distance metrics (cosine vs euclidean vs dot product)
- Experiment with different embedding models
- Add metadata filters to narrow results
Issue: No Relevant Documents Found
Symptoms: Retrieved documents completely miss the topic Possible causes:- Content gap in knowledge base
- Query phrasing doesn’t match document style
- Chunk size too small or too large
- Identify missing topics and add content
- Implement query expansion or rewriting
- Adjust chunk size and overlap
Improving Retrieval Quality
Optimize Embedding Models
Choose the right embedding model for your use case:Tune Search Parameters
Adjust retrieval parameters: Number of results (k):- Too few → might miss relevant docs
- Too many → adds noise to context
Improve Document Chunking
Better chunks lead to better retrieval: Chunk size:- Too small (< 200 tokens): Lacks context
- Too large (> 1000 tokens): Too generic
- Optimal: 300-600 tokens
- Add 10-20% overlap between chunks
- Ensures important info isn’t split across boundaries
Add Metadata Filters
Narrow retrieval with metadata:Use Hybrid Search
Combine vector search with keyword search:Analyzing Retrieval Patterns
By Query Type
Group traces by query type to see patterns:- Create collections for different query types (factual, procedural, troubleshooting)
- Compare average relevance scores across types
- Identify which types have poor retrieval
- Improve those specific areas
Over Time
Track retrieval quality trends:- Filter traces by date range
- Plot average relevance scores over time
- Look for degradation (might indicate stale data)
- Correlate with data updates
By Dataset
If using multiple datasets:- Compare retrieval quality across datasets
- Identify which datasets perform well
- Learn from high-performing datasets
- Improve or remove low-performing ones
Best Practices
Monitor Continuously
- Check retrieval metrics weekly
- Set up alerts for drops in average relevance
- Review low-scoring traces regularly
Test Before Deploying
- Create test collections with known queries
- Measure retrieval quality on test set
- Only deploy changes that improve metrics
Balance Precision and Recall
- Precision: Are retrieved docs relevant?
- Recall: Are all relevant docs retrieved?
- Adjust
kand threshold to optimize both
Document Your Findings
- Note what works and what doesn’t
- Track changes to embedding models, chunk size, etc.
- Share insights with team
