What Is Retrieval Quality?
Retrieval quality measures how well your vector database returns relevant documents for user queries. High-quality retrieval means:- Documents are semantically relevant to the query
- Top results contain the information needed
- Relevance scores accurately reflect usefulness
- Retrieved documents lead to good AI responses
Why Retrieval Quality Matters
Poor retrieval is often the root cause of bad AI outputs:- Wrong documents → AI generates incorrect answers
- Missing documents → AI can’t answer or hallucinates
- Low relevance → AI struggles to extract useful information
- Too many documents → Context window wasted on noise
Measuring Retrieval Quality
Relevance Scores
Check the relevance scores for retrieved documents:- View a trace with retrieved documents
- Check relevance scores (0.0 to 1.0)
- Evaluate score distribution:
- High (>0.8): Strong semantic match
- Medium (0.6-0.8): Moderate match
- Low (<0.6): Weak match, likely not useful
User Feedback Correlation
Compare retrieval quality to user feedback:- Filter traces by user feedback (thumbs up/down)
- Check average relevance scores for each group
- If positive feedback correlates with higher scores, retrieval is working well
Retrieved vs Used
Analyze how many retrieved documents are actually used in responses:- Are all retrieved documents relevant?
- Or does the AI ignore some in the final response?
- This indicates if you’re retrieving too many documents
Common Retrieval Issues
Issue: Low Relevance Scores Across the Board
Symptoms: All documents have scores <0.6 Possible causes:- Embedding model mismatch (query vs documents)
- Poor document chunking strategy
- Documents don’t cover user queries
- Use same embedding model for queries and documents
- Improve chunking (better size, overlap)
- Add more relevant documents to knowledge base
Issue: Right Documents, Wrong Order
Symptoms: Relevant docs have low scores, irrelevant ones rank higher Possible causes:- Distance metric not optimal for your data
- Embeddings not capturing semantic meaning well
- Try different distance metrics (cosine vs euclidean vs dot product)
- Experiment with different embedding models
- Add metadata filters to narrow results
Issue: No Relevant Documents Found
Symptoms: Retrieved documents completely miss the topic Possible causes:- Content gap in knowledge base
- Query phrasing doesn’t match document style
- Chunk size too small or too large
- Identify missing topics and add content
- Implement query expansion or rewriting
- Adjust chunk size and overlap
Improving Retrieval Quality
Optimize Embedding Models
Choose the right embedding model for your use case:Tune Search Parameters
Adjust retrieval parameters: Number of results (k):- Too few → might miss relevant docs
- Too many → adds noise to context
Improve Document Chunking
Better chunks lead to better retrieval: Chunk size:- Too small (< 200 tokens): Lacks context
- Too large (> 1000 tokens): Too generic
- Optimal: 300-600 tokens
- Add 10-20% overlap between chunks
- Ensures important info isn’t split across boundaries
Add Metadata Filters
Narrow retrieval with metadata:Use Hybrid Search
Combine vector search with keyword search:Analyzing Retrieval Patterns
By Query Type
Group traces by query type to see patterns:- Create collections for different query types (factual, procedural, troubleshooting)
- Compare average relevance scores across types
- Identify which types have poor retrieval
- Improve those specific areas
Over Time
Track retrieval quality trends:- Filter traces by date range
- Plot average relevance scores over time
- Look for degradation (might indicate stale data)
- Correlate with data updates
By Dataset
If using multiple datasets:- Compare retrieval quality across datasets
- Identify which datasets perform well
- Learn from high-performing datasets
- Improve or remove low-performing ones
Best Practices
Monitor Continuously
- Check retrieval metrics weekly
- Set up alerts for drops in average relevance
- Review low-scoring traces regularly
Test Before Deploying
- Create test collections with known queries
- Measure retrieval quality on test set
- Only deploy changes that improve metrics
Balance Precision and Recall
- Precision: Are retrieved docs relevant?
- Recall: Are all relevant docs retrieved?
- Adjust
kand threshold to optimize both
Document Your Findings
- Note what works and what doesn’t
- Track changes to embedding models, chunk size, etc.
- Share insights with team
Next Steps
Trace Issues to Source Data
Debug problems using data lineage
See What Data Is Used
Analyze document usage patterns
Add Data Sources
Connect vector databases
Compare Versions
A/B test retrieval strategies
