Glossary

What is Retrieval-Augmented Generation (RAG)?

A technique that supplements an LLM with relevant documents fetched at query time, grounding its answers in up-to-date, verifiable sources.

Full Definition

Retrieval-Augmented Generation (RAG) is an architecture that improves LLM accuracy and reduces hallucinations by retrieving relevant documents from an external knowledge base and injecting them into the model's context window before generating a response. The pipeline typically involves: (1) embedding the user's query into a vector representation, (2) performing a semantic similarity search over a vector database of pre-indexed documents, (3) prepending the top-k retrieved chunks to the prompt, and (4) generating a grounded answer with citations. RAG enables models to access knowledge beyond their training cutoff, cite specific sources, and work with proprietary data without retraining. It is the foundational pattern behind enterprise AI assistants, AI search engines like Perplexity, and document Q&A systems.