What Is Source Attribution?
Every document in your vector database came from somewhere. Source attribution tracks which of these original sources are actually being cited in AI responses.PDF Files
Product manuals, research papers, and documentation
Web Pages
Blog posts, documentation sites, and online resources
Markdown Files
READMEs, wiki pages, and technical documentation
Code Files
Source code with docstrings and inline documentation
Database Records
FAQ entries, support tickets, and structured data
Why Source Attribution Matters
Build Trust with Stakeholders
When showing AI outputs to non-technical stakeholders, source attribution transforms unverifiable claims into trustworthy information. Instead of “The AI said we have a 30-day return policy,” you can say “The AI said we have a 30-day return policy, based on our official Return Policy PDF last updated in March 2024.”
Find Outdated Content
If traces are using a “Pricing Guide 2022.pdf” in 2025, you immediately know this source is outdated, needs to be updated or removed, and current pricing might be wrong in AI responses.
Prioritize Content Updates
Focus on updating high-impact sources. Source A used in 500 traces should be updated first, while Source B used in 2 traces can be lower priority.
Audit Compliance
For regulated industries, track which official documents were cited, ensure AI only uses approved sources, and demonstrate compliance in audits.
Viewing Source Attribution
Source-Level Metrics
For each source file, you see these key metrics that help you understand source performance and value:Document Count
How many documents in your vector database came from this source.Example: “product-guide.pdf” was chunked into 45 documents.
Retrieval Count
How many times documents from this source were retrieved across all traces.Shows which source files are most valuable.
Usage Rate
Percentage of documents from this source that have been retrieved at least once.
| Rate | Meaning |
|---|---|
| Above 70% | Highly relevant source, most chunks are useful |
| 40-70% | Good source, many chunks are used |
| Below 40% | Sparse source, many chunks unused |
Last Retrieved
When a document from this source was most recently used.
| Timeframe | Status |
|---|---|
| Recent | Actively cited |
| Old (over 90 days) | Possibly outdated, check if still relevant |
Average Relevance
Mean relevance score for documents from this source when retrieved.
| Score | Interpretation |
|---|---|
| High | Well-written source, good for embeddings |
| Low | Poorly structured source, hard to retrieve from |
User Satisfaction
Feedback correlation - How users rate traces that cited this source.
| Feedback | Interpretation |
|---|---|
| High thumbs up | Trusted, accurate source |
| High thumbs down | Problematic source, needs review |
Source Details Page
Click on any source to see comprehensive information about that source file.Full Source Information
File Path or URL
Where this source lives
Last Updated
When the original file was modified
Size
Original file size
Format
PDF, HTML, Markdown, etc.
Owner
Who maintains this file (if tracked)
Documents from This Source
List of all document chunks that came from this source:- Document content (preview)
- Retrieval count per document
- Relevance scores
Traces Using This Source Coming Soon
Recent traces that retrieved documents from this source:- Trace ID and link
- User query
- Which document from this source was used
- Timestamp
- User feedback
This view helps you see how this source is being used in practice and understand the context in which it’s retrieved.
Related Sources
Other sources frequently cited alongside this one. For example: “Refund Policy PDF” often cited with “Returns FAQ HTML” This shows which sources cover related topics and helps you understand content relationships.Use Cases
Identify High-Impact Sources
Sort by Retrieval Count
Sort sources by Retrieval Count from high to low to see which sources are used most frequently
Result: Focus maintenance efforts on high-value sources that directly impact AI response quality.
Find Outdated Sources
Cross-check Retrieval Count
Check the Retrieval Count for these old sources. High retrieval count plus old date means urgent update needed.
Result: Keep AI responses accurate and current by proactively catching stale content.
Audit Which Sources Are Used
Result: Compliance with internal policies and confidence that only approved content is cited.
Remove Unused Sources
Review Each Source
Review each source to determine if it’s truly irrelevant or if it might be needed in the future
Result: Leaner, faster vector database that focuses on relevant content.
Track Source Quality
Sort by User Satisfaction
Sort sources by User Satisfaction from low to high to identify problematic sources
Review Bottom Sources
Check sources with the lowest satisfaction scores and read documents from those sources
Result: Higher quality AI responses through continuous source quality improvement.
Grouping Sources
Group related sources for easier management using these common organizational strategies:By Type
Product Documentation - All product guide PDFsMarketing Content - Blog posts, landing pagesTechnical Docs - API references, code docsSupport Materials - FAQs, troubleshooting guides
By Department
Engineering - Technical specifications, architecture docsProduct - Product requirements, roadmapsCustomer Success - Support articles, training materialsLegal - Policies, terms of service
By Recency
| Category | Last Updated |
|---|---|
| Current | In last 6 months |
| Recent | 6-12 months ago |
| Old | 1-2 years ago |
| Stale | Over 2 years ago |
Source Update Workflow
When a source needs updating, follow this workflow to ensure quality improvements:Identify the Issue
From Arcbeam, look for warning signs:
• Source is outdated (last updated over 1 year ago)
• High retrieval count combined with low user satisfaction
• Negative feedback on traces using this source
Update the Original File
Edit the PDF, webpage, or markdown file to correct outdated information and improve clarity if needed
Update Vector Database
Replace old chunks with new ones, or add the new file and deprecate the old one. Ensure embeddings are regenerated for the updated content.
Source Versioning Coming Soon
Track changes to sources over time to understand how content evolution impacts AI responses.Version History
When a source file is updated, version tracking helps you understand which content was used:- V1: Original content
- V2: Updated content (March 2024)
- V3: Latest revision (January 2025)
See which version was used in each trace to debug issues like “This trace used the old pricing from V1, before we updated it” or “All traces after March use V2, which has the corrected information”
Compliance and Governance
For organizations with compliance requirements, source attribution provides critical audit capabilities:| Capability | What It Provides | How To Use It |
|---|---|---|
| Approved Sources List | Control which sources can be used in AI responses | Maintain a list of approved sources (official documentation only, no personal notes or drafts, only sources reviewed by legal/compliance). Set alerts if unapproved sources are detected. |
| Source Audit Trail | Complete history of source management | Track who added each source, when it was added, who approved it, and when it was last reviewed. |
| Citation Requirements | Enforcement of citation standards | Configure Arcbeam to always include source attribution in responses, warn if responses lack source citations, or block responses without verifiable sources (strict mode). |
Best Practices
Review High-Usage Sources Quarterly
Regularly review your top 20 sources by retrieval count to verify they’re still accurate, check for updates in source files, and re-sync if changes were made.Set Update Reminders
For critical sources, establish a review cadence with clear ownership:- Set calendar reminders to review
- Assign owners for each major source
- Track updates in a spreadsheet
Correlate with Business Events
After major changes, immediately update and re-sync affected sources:| Business Event | Required Source Updates |
|---|---|
| Product launch | Update product docs |
| Policy change | Update policy PDFs |
| Rebranding | Update all marketing content |
Use Source Tags
Tag sources for easier organization and filtering:| Tag | Purpose |
|---|---|
| official | Approved, authoritative sources |
| draft | Work-in-progress, not for production |
| deprecated | Old sources, scheduled for removal |
| external | Third-party sources |
