RAG (Retrieval-Augmented Generation)
An LLM architecture that looks things up before answering, rather than relying solely on training data.
Retrieval-Augmented Generation (RAG) is an LLM application pattern where the model retrieves relevant information from a knowledge base or document store before generating a response. RAG grounds LLM outputs in specific, current, often private data — reducing hallucination and enabling answers about domain-specific information.
Context
AI search products (Google AI Overviews, ChatGPT Search, Perplexity) are all RAG systems at their core. They retrieve candidate passages from a search index, then synthesize an answer citing those passages.
Enterprise applications of RAG include customer support (retrieving from the company's knowledge base), sales enablement (retrieving from product docs), and research tools (retrieving from a curated document set). The quality of a RAG system depends primarily on retrieval quality, not on the LLM's reasoning.
A SaaS company's AI support agent runs RAG: when a customer asks a question, the system retrieves the 5 most relevant help-center articles, passes them to an LLM along with the question, and generates a grounded answer citing the source articles. Hallucination drops substantially versus asking the LLM to answer from its pretraining alone.
RAG reduces hallucination but doesn't eliminate it. LLMs can still misinterpret retrieved content, combine facts incorrectly across retrieved passages, or generate plausible-sounding additions. RAG outputs destined for public use still need review.