The tricks, and where they came from#
The standard "prompt engineering" advice people repeat has a specific origin story. Nearly all of the canonical tricks were popularised by papers published in 2022 and 2023, measured on the models that existed then. The two most influential:
Chain-of-thought / "Let's think step by step" comes from Kojima et al., Large Language Models are Zero-Shot Reasoners, NeurIPS 2022 (arXiv:2205.11916). The paper measured accuracy jumps on text-davinci-002: MultiArith 17.7% → 78.7%, GSM8K 10.4% → 40.7%. Those are the numbers that launched a thousand "think step by step" additions to prompts.
EmotionPrompt / "this is very important to my career" comes from Li et al., Large Language Models Understand and Can Be Enhanced by Emotional Stimuli, 2023 (arXiv:2307.11760). The paper reported a 8.00% relative performance improvement on Instruction Induction and 115% on BIG-Bench across Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4.
These are both real papers with real results on real models. The problem is that everything since has eroded how well those specific tricks map to the current generation of frontier models, and the official prompting guidance from the model labs themselves has moved on.
What the evidence on modern models actually says#
There are three honest things to say about the classic tricks on 2026 frontier models like Claude Opus 4.7 (released 16 April 2026), Claude Sonnet 4.6, Claude Haiku 4.5, and GPT-5.
First, there is no public replication of the original CoT or EmotionPrompt numbers on current frontier models. The headline percentages you see quoted are from the old model generation and should not be assumed to transfer. They might, they might not. The honest position is uncertain.
Second, for reasoning-enabled models, "think step by step" is structurally redundant. Models like Claude with Extended Thinking, GPT-5 in reasoning mode, and OpenAI's o-series already produce internal chain-of-thought before responding. Adding the instruction to the prompt does not invoke a new behaviour; it duplicates one the model is already doing. The official Anthropic guidance reflects this. Claude 4 best practices (docs) emphasise being explicit about what you want, giving examples, and structuring your input, not wrapping the prompt in a reasoning-induction ritual.
Third, role prompting's effect on current models is mixed, not clearly positive. A 2025 RecSys evaluation (Revisiting Prompt Engineering, ACM 2025) found role-play prompting "did not always improve... and actually reduced performance in some cases" on recommendation tasks. An earlier arXiv paper (2509.23501) tested role prompting across GPT-3.5, GPT-4o, and Llama 2 and reported results were "task-sensitive", not consistently positive. The older intuition that "you are an expert X" reliably unlocks expert capability is not supported by current systematic evaluation.
None of this is a claim that the old techniques are dead. It is a claim that they are less load-bearing than their reputation suggests, and that the frontier-model guidance has shifted elsewhere.
What the official guides actually recommend now#
Anthropic's current prompt engineering overview and Claude 4 best practices focus on:
- Being explicit and direct about the desired output.
- Providing context and motivation for the task.
- Using examples for the desired format.
- Structuring prompts with XML tags or clear sections.
- Letting Claude think when useful, either by asking or by using extended thinking mode.
OpenAI's Prompt Engineering Guide, the GPT-4.1 prompting cookbook, and the GPT-5 prompt guidance hit similar themes: clarity, structure, examples, explicit success criteria. GPT-4.1 and GPT-5 are described as following instructions more literally than older models, which means clarity of instructions matters more, not less.
Read across both, the picture is consistent. The high-leverage moves in 2026 are not ritual phrases. They are:
- Stating the goal precisely in one or two sentences.
- Providing the right context — including reference material, examples of the desired output style, and explicit success criteria.
- Structuring long prompts so the model can find the parts that matter.
- Using the right model for the task, including whether to turn on reasoning mode.
I think of this as prompt design rather than prompt engineering. The skill has shifted from "know the clever phrases" to "know how to specify what you want and provide what the model needs to deliver it". That skill has always been the load-bearing one. The tricks were a shortcut that worked when models were less capable at inferring intent.
The job title collapse#
If the techniques have shifted, the labour market has moved with them. Microsoft's Work Trend Index 2025, surveying 31,000 people across 31 countries, found "prompt engineer" ranked near the bottom of planned new roles. Fortune reported in May 2025 that prompt-engineering job postings "peaked at roughly 0.3% of AI-related listings in 2024 and declined sharply" (Fortune, 7 May 2025). TechRepublic covered the same trend (source).
The job collapse is telling because hiring is a revealed preference. When companies were actually unsure how to work with LLMs, "prompt engineer" was a meaningful specialty. As the model labs published better guides and the base models became easier to work with, the specialty got absorbed into ordinary AI-assisted knowledge work. That is not a tragedy — it is a normal pattern for any category where the hard part gets commoditised.
The specialty work that is still real has moved up the stack. Building prompt pipelines where the output of one call feeds another, designing retrieval layers where context is built programmatically, setting up evaluations to measure whether an AI feature is actually working — these are substantive engineering tasks. They are also not what "prompt engineer" meant in 2023.
What to practise instead#
If you want to get better at working with current-generation models, the useful habits are:
Write clear specifications. Most bad output comes from fuzzy requests. Practise stating what you want in five specific sentences.
Build a personal library of working contexts. Save prompts and context patterns that worked, with notes on why. Your writing voice, your audience profiles, your code style. Reuse them.
Read the official guides, not the blogs. Anthropic's and OpenAI's published guidance is free, maintained, and more current than the average 2023-era prompt-engineering blog post.
Test model-to-model. When output is bad, try a different model with the same prompt. Sometimes the problem is the prompt. Sometimes the problem is the model. Knowing which is a skill.
Use reasoning mode deliberately. For tasks with verifiable correct answers or where the model tends to make systematic errors, turn on extended thinking. For creative tasks and latency-sensitive work, leave it off. Choosing is the job.
Further reading#
- Debugging Claude Prompts for what to do when output is wrong.
- Reasoning Models: When Are They Worth It? for when extended thinking pays off.
- Vibe Coding Is a Lie for the related data on AI coding tool productivity.
Prompt engineering was a temporary discipline that filled a gap between what models could do and what users could ask. The gap has closed enough that the discipline has lost its specificity. What remains is a broader skill called prompt design, and the evidence is that it maps onto the same habits of clear thinking and careful specification that have always separated good technical communication from bad.
Sources#
- Kojima et al., Large Language Models are Zero-Shot Reasoners, arXiv:2205.11916: https://arxiv.org/abs/2205.11916
- Li et al., EmotionPrompt, arXiv:2307.11760: https://arxiv.org/abs/2307.11760
- Revisiting Prompt Engineering (RecSys 2025, ACM): https://dl.acm.org/doi/10.1145/3705328.3748159
- Role prompting evaluation (arXiv:2509.23501): https://arxiv.org/html/2509.23501v1
- Anthropic Prompt Engineering Overview: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
- Anthropic Claude 4 Best Practices: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices
- OpenAI Prompt Engineering Guide: https://platform.openai.com/docs/guides/prompt-engineering
- OpenAI GPT-4.1 Prompting Guide: https://cookbook.openai.com/examples/gpt4-1_prompting_guide
- OpenAI GPT-5 Prompt Guidance: https://developers.openai.com/api/docs/guides/prompt-guidance
- Fortune on prompt-engineering jobs, May 2025: https://fortune.com/2025/05/07/prompt-engineering-200k-six-figure-role-now-obsolete-thanks-to-ai/
- TechRepublic on prompt-engineering jobs: https://www.techrepublic.com/article/news-prompt-engineering-ai-jobs-obsolete/
- Claude Opus 4.7 launch: https://llm-stats.com/blog/research/claude-opus-4-7-launch
