The question that keeps coming up#
Every few weeks, someone building an AI-powered product asks me the same question in different words: "Should I use RAG, fine-tune a model, or just write better prompts?"
The question is phrased as if these are three options on the same menu. They are not. They solve different problems, have different cost profiles, and usually belong together rather than as alternatives.
The confusion mostly comes from blog posts that compare them head to head as if they were competitors. They are not. After building production AI features for a handful of clients over the last year, here is the actual decision framework I use.
What each one actually does#
Before you can pick, you need a clear picture of what each technique adds.
Prompting is how you talk to the model. The model's weights do not change. What changes is the context window: the instructions, examples, and reference material you provide each time you ask a question. A better prompt gets a better answer from the same model.
RAG (Retrieval-Augmented Generation) is a way to dynamically inject relevant information into the prompt at query time. You store your knowledge in a searchable form (a vector database, a keyword index, or both), and at runtime you retrieve the pieces relevant to the user's question and add them to the prompt. The model is unchanged. The prompt is built on the fly from your knowledge base.
Fine-tuning changes the model itself. You take a base model and train it further on examples that match the behavior you want. The resulting model has different weights. It will respond in the style of your training data even without you repeating the instructions in every prompt.
The crucial distinction most people miss#
Prompting and RAG add or modify information the model has access to. Fine-tuning changes how the model behaves by default.
That distinction drives almost every decision. If your problem is "the model does not know my specific facts", you need RAG or good prompting, not fine-tuning. If your problem is "the model does not respond in my required format or style reliably enough", fine-tuning helps, but so does a good prompt with examples.
When people say "fine-tune the model on our documentation", they usually mean "I want the model to know our documentation". Fine-tuning is the wrong answer for that. The model will not reliably recall specific facts from the training data; it will generalise them in ways you cannot predict. RAG is the right answer.
When to choose each#
Start with prompting#
If you are asking whether to use RAG or fine-tuning, the honest first answer is usually: "Have you actually tried a well-engineered prompt first?"
Most people have not. They wrote one prompt, got okay results, and jumped to the conclusion that they needed infrastructure. A careful prompt with:
- A clear role
- 2-3 examples of ideal output
- Explicit constraints
- A specific output format
...will outperform a lot of early-stage RAG systems and most amateur fine-tuning attempts. It also costs nothing to iterate, which is a huge advantage.
For a deeper walk through of what a well-engineered prompt looks like, our Claude guide goes into specifics. But the rule is: master prompting before you build anything on top of it.
Add RAG when the model needs specific knowledge it cannot hold#
Use RAG when:
- You have a body of knowledge that changes over time (customer support docs, product catalogs, policy documents)
- The model needs to cite specific sources
- You cannot fit all the relevant information into the context window
- Answers need to be grounded in verifiable facts rather than generated
RAG is the right tool for every "chat with your documents" style product. It is also the right tool for customer support bots, product search, internal knowledge tools, and anything where the answer depends on information the user or the business owns.
The setup cost is real. You need a document ingestion pipeline, an embedding model, a vector store (or a hybrid keyword+vector search), and logic to build prompts from retrieved chunks. The maintenance cost is also real: your retrieval quality drifts as the source data changes, and you need to monitor what is actually being retrieved vs what should be.
But RAG avoids the biggest pitfall of fine-tuning: the model cannot tell you its knowledge is out of date. A RAG system with fresh documents always has current information. A fine-tuned model is a snapshot of your data at training time.
Add fine-tuning when the behavior needs to change, not the knowledge#
Use fine-tuning when:
- You need the model to respond in a very specific format, tone, or style that is hard to describe in a prompt
- The task is repetitive enough that baking behavior into the model is cheaper than sending examples every time
- You need lower latency and cost per query (fine-tuned models often allow shorter prompts)
- You are using the same model at high volume for a narrow task
Classic fine-tuning use cases: classification models, structured output extractors, code transformation tools, and domain-specific assistants where the tone matters (legal, medical, customer service).
The thing fine-tuning is bad at: teaching the model new facts. If you fine-tune a model on 1.000 customer support tickets hoping it will know your product, you will get a model that sounds like it knows your product but will hallucinate specifics. RAG solves that. Fine-tuning makes it worse.
When to stack them#
The best production AI systems usually use all three. A typical pattern:
- A fine-tuned model that knows how to respond in the right format and style
- RAG to inject the specific facts relevant to this query
- A prompt that combines both, along with the user's actual question
The fine-tuning handles style. The RAG handles knowledge. The prompt handles task framing. Each does what it is good at, nothing has to do everything.
The cost reality#
A few numbers to orient the decision.
Prompting costs zero to iterate. Every token you save in your prompt is cost savings forever, and you can A/B test prompts in an afternoon.
RAG costs maybe 500-5.000 $ in engineering time to set up a first version, plus ongoing infrastructure (vector DB, embedding costs, retrieval tuning). The marginal cost per query is low.
Fine-tuning costs a data curation step (often the most expensive part, because you need good examples), plus the training run itself. For an OpenAI fine-tune on a mid-tier model, expect 500-2.000 $ for a decent-sized dataset. Self-hosted fine-tuning on open models is cheaper per run but has much higher setup costs.
The ratio is roughly: prompting is free, RAG is a four-digit investment, fine-tuning is a five-digit investment with ongoing maintenance. Budget accordingly, and do not jump to the expensive option without proof the cheap option failed.
The mistake I see most often#
The most common mistake is fine-tuning as the first move. A startup has a product idea, decides AI is involved, hears that fine-tuning is "the proper way" to customize a model, and spends weeks and thousands of dollars producing a fine-tuned model before anyone has seriously tried a well-engineered prompt.
The fine-tuned model is usually worse than a careful prompt. It is definitely worse than a careful prompt plus RAG. And it locks in behavior that is hard to iterate on, because every change now requires a new training run.
The discipline that separates teams that ship good AI products from teams that struggle is the willingness to exhaust prompting before adding infrastructure, and to exhaust RAG before fine-tuning. The fancier tool is not always the better tool.
The decision tree#
My simplified version:
- Can you get acceptable results with a well-engineered prompt including examples? Ship it.
- Is the failure mode "the model does not know my specific facts"? Add RAG.
- Is the failure mode "the model cannot consistently produce the right format or style"? Try prompt improvements with few-shot examples first. If still failing at scale, fine-tune.
- Both failure modes together? Do both.
Never start at step 3.
For practical tool comparisons on the underlying models you might use, our ChatGPT vs Claude comparison covers which models are better suited for which kinds of customization. But model choice is a much smaller decision than getting the architecture right.
Prompting, RAG, and fine-tuning are not three flavors of the same thing. They are three different kinds of surgery. Most projects need one of them. Some need two. Very few need all three. Start with the cheapest, add more only when proven necessary.
