What is Retrieval-Augmented Generation (RAG)?
A technique that supplements an LLM with relevant documents fetched at query time, grounding its answers in up-to-date, verifiable sources.
Full Definition
Retrieval-Augmented Generation (RAG) is an architecture that improves LLM accuracy and reduces hallucinations by retrieving relevant documents from an external knowledge base and injecting them into the model's context window before generating a response. The pipeline typically involves: (1) embedding the user's query into a vector representation, (2) performing a semantic similarity search over a vector database of pre-indexed documents, (3) prepending the top-k retrieved chunks to the prompt, and (4) generating a grounded answer with citations. RAG enables models to access knowledge beyond their training cutoff, cite specific sources, and work with proprietary data without retraining. It is the foundational pattern behind enterprise AI assistants, AI search engines like Perplexity, and document Q&A systems.
Tools that use Retrieval-Augmented Generation (RAG)
Perplexity
NewAI-powered search engine with real-time citations and source transparency
Notion
NewAll-in-one workspace with AI-powered writing, projects, and knowledge management
ChatGPT
The most widely used AI assistant with 900M+ weekly users
Claude
Best-in-class reasoning with 1M token context window