RAG — retrieval-augmented generation — has become the default answer to "how do I connect my AI agent to company data?" And it's a good default for many cases. But it's not the right answer for every case, and applying it universally leads to agents that are slower, less accurate, and harder to audit than they need to be.
This article lays out a decision framework for choosing between RAG-style vector retrieval and live data mounts, based on the specific requirements of your agent's task.
RAG pipelines work by pre-processing your data into vector embeddings, storing them in a vector database, and then at query time: embedding the query, finding semantically similar chunks, and stuffing those chunks into the LLM's context window.
The key insight is that RAG is a search problem. It finds the most semantically relevant chunks of your data given a query. It's good at answering questions like "what does our policy say about X?" or "find documentation about Y." It's not good at answering questions like "how many orders did customer #12345 place last month?" or "what is the current status of ticket #67890?"
RAG is fundamentally an approximation. It finds relevant content, not authoritative content. For many tasks, this approximation is sufficient. For others, it's a fundamental architectural mismatch.
Semantic search over unstructured content. If the agent needs to answer questions based on documents, knowledge base articles, PDFs, or other unstructured text, RAG is purpose-built for this. The semantic search capability is exactly what you need.
Data that changes infrequently. RAG requires maintaining an index. If your underlying data changes frequently, keeping the index fresh is an operational burden. If it changes rarely — documentation, policies, static reference data — RAG's batch indexing model works well.
High query volume, latency-tolerant. Vector search is fast once the index is built. For high-volume question-answering workloads where approximate answers are acceptable, RAG scales economically.
When the agent doesn't need the raw data. RAG returns excerpts and chunks. If the agent needs summaries or explanations based on document content, RAG is appropriate. If it needs to extract specific structured values or verify exact data, RAG is not.
Structured, queryable data. If your agent needs to query a database, filter by specific criteria, aggregate values, or look up a specific record by ID, RAG cannot do this. You need direct database access. A vector index over a relational database is both unnecessary and loses the query capabilities that make relational databases valuable.
Data freshness requirements. If the agent is answering questions about the current state of something — the current status of an order, the current inventory level, a real-time price — RAG's indexed data will be stale. The staleness depends on your indexing cadence, but it's never zero. Live mounts return current data on every call.
Transactional operations. If the agent needs to write data — create a record, update a status, trigger a workflow — RAG provides no write capability. You need direct data access with write permissions.
Exact value retrieval. If the agent needs to return a specific value — "what is customer #12345's account balance?" — RAG's probabilistic retrieval is wrong for this task. The answer exists exactly one place in your system; retrieval should be deterministic.
The question is not "should I use RAG or live mounts?" but "does my agent's task require semantic search or structured queries?" The answer determines the architecture.
Most production agents need both. A customer support agent might use RAG to search the knowledge base for relevant documentation (semantic search over unstructured content) and live mounts to look up the specific customer's account history (structured query of current data).
This hybrid pattern is common and appropriate. The key is being intentional about which pattern serves which data access need, rather than defaulting to one approach for everything.
In a hybrid architecture:
RAG has an interesting security property: the data that reaches the LLM is a small chunk retrieved by similarity search. An agent using RAG over a large database may never see most of the database content in any given interaction. This limits the blast radius of a compromised context window.
The downside is that RAG's retrieval process is opaque. The agent (and the user) can't verify which specific records were retrieved or confirm that the retrieved content is the authoritative current state. This is fine for question-answering but problematic for tasks that require accuracy guarantees.
Live mounts have the opposite security profile. They return authoritative current data, with deterministic query results. But they require stronger access controls: every query returns real, current data that may be sensitive, and the access pattern is more predictable (and therefore more predictably exploitable by prompt injection).
The security answer for live mounts is a proper mount architecture: scoped credentials, query-level audit logging, and anomaly detection. This is infrastructure that RAG architectures don't typically need to the same degree.
Use RAG when:
Use live mounts when:
Use both when:
The infrastructure implications are different for each path. RAG requires maintaining a vector index and an embedding pipeline. Live mounts require a secure data access layer with proper credential management and audit logging. Both are solvable problems — the goal is to solve the right one for your actual requirements.
When you need live mounts, do them securely
Agent Mounts handles the secure data access layer for structured, real-time data sources — with scoped permissions and a complete audit trail.
Get early access