Connect Your Data Sources

Connecting your to Arcbeam creates a complete picture of your AI system. You’ll see not just what your model said, but exactly which documents from your knowledge base influenced each response.

Why Connect Data Sources?

When you only see traces without data sources, you know your AI system produced an answer but you don’t know where it came from. Was it hallucinating? Did it retrieve the right documents? Which parts of your knowledge base are actually being used? With data sources connected, every trace shows:

Document Attribution

Which specific documents were retrieved and their exact content

Usage Analytics

How often each document is used across all queries

Quality Insights

Which documents need updating or improvement

Actionable Data

Turn observability into concrete improvements

How It Works

Choose your source

Navigate to Data Platform → Data Sources and choose your data source.

Connect your vector database

Tell Arcbeam how to access your .

Map your schema

Type which fields contain your document content, IDs, and metadata.

For nested columns type them out with a . in between each level.Example: If the field you want to reference is col3 in {'col1' : {'col2': {'col3': 'value'}}}, then type in col1.col2.col3

Sync once

Arcbeam pulls metadata about your documents (not the vectors themselves).

See the connections

When traces come in, Arcbeam automatically links retrieved documents to each query.

Your data sources are now connected and enriching your traces!

What Gets Synced

Arcbeam syncs metadata about your documents, not the documents themselves.

Data Type	Synced?	Description
Document IDs	✓ Yes	To match retrieved docs in traces
Source attribution	✓ Yes	Which file/URL each document came from
Basic metadata	✓ Yes	Timestamps, document names, etc.
Document content	✓ Yes	Text content of documents (stays in your DB, referenced in traces)
Vector embeddings	✗ Never	Your are never synced
Unmapped fields	✗ Never	Only the metadata fields you explicitly specify are copied

The actual document content stays in your vector database. Arcbeam only stores what’s needed to show you which documents were used in each trace.

Supported Vector Databases

pgvector

PostgreSQL with pgvector extension (fully supported)

Coming Soon

Pinecone, Weaviate, Chroma, and others

Privacy and Security

What Data is Accessed

Arcbeam only reads the fields you explicitly map:

Field Type	Accessed?	Description
Document IDs	✓ Yes	Required to match documents in traces
Document content (text)	✓ Yes	Document text content for display
Source attribution fields	✓ Yes	Track which file/URL documents came from
Metadata fields	✓ Optional	Only fields you explicitly map in configuration
Timestamp fields	✓ Optional	Only if you configure last updated tracking
Vector embeddings	✗ Never	Your embeddings are never accessed or synced
Unmapped fields	✗ Never	Only explicitly mapped fields are read

Connection Security

Connection strings are encrypted at rest
Database credentials are never logged or exposed
All connections use SSL/TLS when available
Read-only access is recommended

Create a dedicated read-only database user for Arcbeam to minimize security risk.

Self-Hosted Option

If your data can’t leave your infrastructure:

Run Arcbeam in your own VPC
Keep all data within your network
Full control over data storage and access

Learn more about self-hosting →

When to Connect Data Sources

Use Case	Connect Data Sources?	Why
Using	✓ Yes	Track which documents are retrieved and their impact
Track which documents are most useful	✓ Yes	See usage analytics and document performance
Debug why certain answers were given	✓ Yes	Trace answers back to source documents
Measure knowledge base quality	✓ Yes	Identify gaps and improvement opportunities
Only using direct LLM calls (no retrieval)	✗ Skip	No document retrieval to track
Just want to track costs and errors	✗ Skip	Traces alone provide this information
Using function calling without RAG	✗ Skip	No vector database retrieval involved

Quick Example

Here’s what connecting a pgvector database looks like:

# Your vector database has a table with this structure:
# CREATE TABLE knowledge_base (
#   id TEXT PRIMARY KEY,
#   content TEXT,
#   source TEXT,
#   updated_at TIMESTAMP
# );

# In Arcbeam dashboard:
# 1. Add pgvector integration
# 2. Provide connection string
# 3. Map schema:
#    - ID field: "id"
#    - Document field: "content"
#    - Source field: "source"
#    - Last updated: "updated_at" (optional)

Once connected, traces automatically show which documents were retrieved:

Chunks from your source data your AI system used.

What You Can Do With Connected Data

Track Document Usage

See which documents are retrieved most often, which are never used, and which ones lead to good vs bad responses.

Debug RAG Pipelines

When a trace shows wrong information, see exactly which documents were retrieved and whether they contained the right content.

Measure Knowledge Base Quality

Find gaps in your knowledge base by seeing which queries don’t retrieve useful documents.

Improve with User Feedback

When users give thumbs down, see which documents were involved and update them accordingly.

Next Steps

Connect pgvector

Step-by-step guide to connecting a pgvector database

Syncing Data

Learn how to keep your data source in sync

View Retrieved Documents

Explore retrieved documents in traces

Track Document Usage

Analyze which documents are being used

Common Questions

Does this slow down my database?

No. Syncing is a one-time operation that reads metadata. It doesn’t run queries during normal operation. Your application’s vector database queries are completely separate.

What if my schema changes?

Update the schema mapping in Arcbeam, then trigger a re-sync. Arcbeam will refresh the metadata.

Can I connect multiple databases?

Yes. You can connect multiple vector databases or different tables/indices within the same database. Each becomes a separate dataset in Arcbeam.

Do I need read-write access?

No. Arcbeam only needs read access to your vector database. Using a read-only user is recommended for security.

What if I'm using multiple vector databases?

Connect each one separately. Arcbeam will track documents across all of them and show you which database was used for each retrieval.

Get Started

How To

Debug & Improve AI Performance

Understand Data Impact

Collaborate with Your Team

Deploy & Manage

Integrations

Learn

Resources

Why Connect Data Sources?

Document Attribution

Usage Analytics

Quality Insights

Actionable Data

How It Works

What Gets Synced

Supported Vector Databases

pgvector

Coming Soon

Privacy and Security

When to Connect Data Sources

Quick Example

What You Can Do With Connected Data

Track Document Usage

Debug RAG Pipelines

Measure Knowledge Base Quality

Improve with User Feedback

Next Steps

Connect pgvector

Syncing Data

View Retrieved Documents

Track Document Usage

Common Questions

Get Started

How To

Debug & Improve AI Performance

Understand Data Impact

Collaborate with Your Team

Deploy & Manage

Integrations

Learn

Resources

​Why Connect Data Sources?

Document Attribution

Usage Analytics

Quality Insights

Actionable Data

​How It Works

​What Gets Synced

​Supported Vector Databases

pgvector

Coming Soon

​Privacy and Security

​When to Connect Data Sources

​Quick Example

​What You Can Do With Connected Data

Track Document Usage

Debug RAG Pipelines

Measure Knowledge Base Quality

Improve with User Feedback

​Next Steps

Connect pgvector

Syncing Data

View Retrieved Documents

Track Document Usage

​Common Questions

Why Connect Data Sources?

How It Works

What Gets Synced

Supported Vector Databases

Privacy and Security

When to Connect Data Sources

Quick Example

What You Can Do With Connected Data

Next Steps

Common Questions