Skip to main content
Source attribution connects AI responses back to the original files they came from—PDFs, web pages, internal docs, or code repositories. This transparency builds trust and helps you maintain your knowledge base.

What Is Source Attribution?

Every document in your vector database came from somewhere. Source attribution tracks which of these original sources are actually being cited in AI responses.

PDF Files

Product manuals, research papers, and documentation

Web Pages

Blog posts, documentation sites, and online resources

Markdown Files

READMEs, wiki pages, and technical documentation

Code Files

Source code with docstrings and inline documentation

Database Records

FAQ entries, support tickets, and structured data

Why Source Attribution Matters

Build Trust with Stakeholders

When showing AI outputs to non-technical stakeholders, source attribution transforms unverifiable claims into trustworthy information. Instead of “The AI said we have a 30-day return policy,” you can say “The AI said we have a 30-day return policy, based on our official Return Policy PDF last updated in March 2024.”

Find Outdated Content

If traces are using a “Pricing Guide 2022.pdf” in 2025, you immediately know this source is outdated, needs to be updated or removed, and current pricing might be wrong in AI responses.

Prioritize Content Updates

Focus on updating high-impact sources. Source A used in 500 traces should be updated first, while Source B used in 2 traces can be lower priority.

Audit Compliance

For regulated industries, track which official documents were cited, ensure AI only uses approved sources, and demonstrate compliance in audits.

Viewing Source Attribution

1

Navigate to Datasets

Go to Data → Datasets in the main navigation
2

Select a Dataset

Click on the dataset you want to analyze
3

Open Sources Tab

Navigate to the Sources tab to see the breakdown by original source file

Source-Level Metrics

For each source file, you see these key metrics that help you understand source performance and value:

Document Count

How many documents in your vector database came from this source.Example: “product-guide.pdf” was chunked into 45 documents.

Retrieval Count

How many times documents from this source were retrieved across all traces.Shows which source files are most valuable.

Usage Rate

Percentage of documents from this source that have been retrieved at least once.
RateMeaning
Above 70%Highly relevant source, most chunks are useful
40-70%Good source, many chunks are used
Below 40%Sparse source, many chunks unused

Last Retrieved

When a document from this source was most recently used.
TimeframeStatus
RecentActively cited
Old (over 90 days)Possibly outdated, check if still relevant

Average Relevance

Mean relevance score for documents from this source when retrieved.
ScoreInterpretation
HighWell-written source, good for embeddings
LowPoorly structured source, hard to retrieve from

User Satisfaction

Feedback correlation - How users rate traces that cited this source.
FeedbackInterpretation
High thumbs upTrusted, accurate source
High thumbs downProblematic source, needs review

Source Details Page

Click on any source to see comprehensive information about that source file.

Full Source Information

File Path or URL

Where this source lives

Last Updated

When the original file was modified

Size

Original file size

Format

PDF, HTML, Markdown, etc.

Owner

Who maintains this file (if tracked)

Documents from This Source

List of all document chunks that came from this source:
  • Document content (preview)
  • Retrieval count per document
  • Relevance scores
Click through to see individual document analytics.

Traces Using This Source Coming Soon

Recent traces that retrieved documents from this source:
  • Trace ID and link
  • User query
  • Which document from this source was used
  • Timestamp
  • User feedback
This view helps you see how this source is being used in practice and understand the context in which it’s retrieved.
Other sources frequently cited alongside this one. For example: “Refund Policy PDF” often cited with “Returns FAQ HTML” This shows which sources cover related topics and helps you understand content relationships.

Use Cases

Identify High-Impact Sources

1

Sort by Retrieval Count

Sort sources by Retrieval Count from high to low to see which sources are used most frequently
2

Note Top Sources

Identify the top 10 sources that are being retrieved most often
3

Prioritize Updates

Make these sources your priority for keeping up to date, and set alerts if they become outdated
Result: Focus maintenance efforts on high-value sources that directly impact AI response quality.

Find Outdated Sources

1

Filter by Last Updated

Filter sources to show only those with Last Updated > 1 year ago
2

Cross-check Retrieval Count

Check the Retrieval Count for these old sources. High retrieval count plus old date means urgent update needed.
3

Update and Re-sync

Update the source file and re-sync to Arcbeam to ensure current information is being used
Result: Keep AI responses accurate and current by proactively catching stale content.

Audit Which Sources Are Used

1

Review Sources with Retrievals

Review all sources that have been retrieved at least once
2

Check Against Approved List

Check each source against your approved sources list
3

Remove Unapproved Sources

If an unapproved source is being cited, remove it and re-sync the data source
Result: Compliance with internal policies and confidence that only approved content is cited.

Remove Unused Sources

1

Filter Zero Retrievals

Filter to sources with zero retrievals over the past 90 days
2

Review Each Source

Review each source to determine if it’s truly irrelevant or if it might be needed in the future
3

Clean Up Database

Remove irrelevant sources from your vector database and re-sync
Result: Leaner, faster vector database that focuses on relevant content.

Track Source Quality

1

Sort by User Satisfaction

Sort sources by User Satisfaction from low to high to identify problematic sources
2

Review Bottom Sources

Check sources with the lowest satisfaction scores and read documents from those sources
3

Take Action

Determine the issue and take appropriate action: • Content is wrong → Update source • Content is confusing → Rewrite for clarity • Source is irrelevant → Remove it
Result: Higher quality AI responses through continuous source quality improvement.

Grouping Sources

Group related sources for easier management using these common organizational strategies:

By Type

Product Documentation - All product guide PDFsMarketing Content - Blog posts, landing pagesTechnical Docs - API references, code docsSupport Materials - FAQs, troubleshooting guides

By Department

Engineering - Technical specifications, architecture docsProduct - Product requirements, roadmapsCustomer Success - Support articles, training materialsLegal - Policies, terms of service

By Recency

CategoryLast Updated
CurrentIn last 6 months
Recent6-12 months ago
Old1-2 years ago
StaleOver 2 years ago

Source Update Workflow

When a source needs updating, follow this workflow to ensure quality improvements:
1

Identify the Issue

From Arcbeam, look for warning signs: • Source is outdated (last updated over 1 year ago) • High retrieval count combined with low user satisfaction • Negative feedback on traces using this source
2

Update the Original File

Edit the PDF, webpage, or markdown file to correct outdated information and improve clarity if needed
3

Update Vector Database

Replace old chunks with new ones, or add the new file and deprecate the old one. Ensure embeddings are regenerated for the updated content.
4

Re-sync to Arcbeam

Go to Settings → Data Sources, click Sync Now, and wait for the sync to complete
5

Verify Improvement

Check new traces using this source, monitor user feedback, and confirm responses are better

Source Versioning Coming Soon

Track changes to sources over time to understand how content evolution impacts AI responses.

Version History

When a source file is updated, version tracking helps you understand which content was used:
  • V1: Original content
  • V2: Updated content (March 2024)
  • V3: Latest revision (January 2025)
See which version was used in each trace to debug issues like “This trace used the old pricing from V1, before we updated it” or “All traces after March use V2, which has the corrected information”

Compliance and Governance

For organizations with compliance requirements, source attribution provides critical audit capabilities:
CapabilityWhat It ProvidesHow To Use It
Approved Sources ListControl which sources can be used in AI responsesMaintain a list of approved sources (official documentation only, no personal notes or drafts, only sources reviewed by legal/compliance). Set alerts if unapproved sources are detected.
Source Audit TrailComplete history of source managementTrack who added each source, when it was added, who approved it, and when it was last reviewed.
Citation RequirementsEnforcement of citation standardsConfigure Arcbeam to always include source attribution in responses, warn if responses lack source citations, or block responses without verifiable sources (strict mode).

Best Practices

Review High-Usage Sources Quarterly

Regularly review your top 20 sources by retrieval count to verify they’re still accurate, check for updates in source files, and re-sync if changes were made.
Set up a recurring calendar reminder to review high-impact sources every quarter to stay ahead of potential issues.

Set Update Reminders

For critical sources, establish a review cadence with clear ownership:
  • Set calendar reminders to review
  • Assign owners for each major source
  • Track updates in a spreadsheet
Example tracking: Source: Product Pricing PDF Owner: Sales Team Review Frequency: Quarterly Last Review: Jan 2025 Next Review: Apr 2025

Correlate with Business Events

After major changes, immediately update and re-sync affected sources:
Business EventRequired Source Updates
Product launchUpdate product docs
Policy changeUpdate policy PDFs
RebrandingUpdate all marketing content
Failing to update sources after major business changes can lead to AI responses with outdated or incorrect information.

Use Source Tags

Tag sources for easier organization and filtering:
TagPurpose
officialApproved, authoritative sources
draftWork-in-progress, not for production
deprecatedOld sources, scheduled for removal
externalThird-party sources

Monitor for Deleted Sources

Set up alerts to catch when a source file is deleted from its original location but is still being cited in traces. When this happens, update your vector database to remove the obsolete source.

Next Steps