Skip to main content
AI Tool Radar
Deep Dives

Vibe Coding Is a Lie. Here Is What Senior Engineers Actually Use Claude Code For

The 'vibe coding' pitch makes it look like you describe a feeling and an app appears. Karpathy's original framing was half-joking. The productivity data that has emerged since is more sobering than either the boosters or the skeptics predicted.

7 min read2026-04-19By Roland Hentschel
claude codevibe codingai codingengineeringworkflow

Where the term actually came from#

"Vibe coding" was coined by Andrej Karpathy in a tweet on February 2, 2025. The full post is worth reading because it is very different from how the term got used in the twelve months that followed:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper..."

Source: Andrej Karpathy on X, 2 Feb 2025.

Karpathy was describing his personal experience on weekend projects, playfully. Within weeks, the term had been absorbed into a wave of tutorials, YouTube channels, and TikToks built around a stronger claim: that you do not need to understand the code anymore, the AI does it all, you describe what you feel the app should do and ship it.

That stronger claim is the one I want to dismantle here, because the evidence that has arrived since Karpathy's tweet runs against it in interesting ways.

The productivity data that surprised everyone#

In July 2025, METR (the Model Evaluation and Threat Research non-profit) published the first well-designed randomized controlled trial on AI coding tools and experienced developers. The study had 16 experienced open-source developers complete 246 real tasks in mature projects they had worked on for an average of five years. Tools used were Cursor Pro with Claude 3.5/3.7 Sonnet.

The result: developers were 19% slower with AI tools than without. They had predicted they would be 24% faster going in, and estimated after the fact that they were 20% faster. Reality was the opposite. Paper: arXiv 2507.09089; study write-up: metr.org.

This finding is not the whole story. Earlier field experiments on GitHub Copilot (GitHub/MIT/Microsoft/Accenture) found junior developers sped up by roughly 35-39% and senior developers by 8-16%. Paper: MIT Copilot study.

Put together, the emerging picture is:

  • Junior developers benefit most from AI coding tools.
  • Senior developers benefit less in general-purpose tasks, and in mature codebases that they already know well, they can actually slow down.
  • Self-reported perception of productivity is consistently higher than measured productivity — developers feel faster even when they are not.

This is not what the vibe coding pitch predicted. The pitch was that AI coding tools would flatten the difference between junior and senior, or invert it. The data says the opposite: for the people who know what they are doing in a codebase they know, the tools are at best a modest help and at worst actively slow them down on certain tasks.

What senior engineers actually use these tools for#

Given that data, the question becomes: what are the cases where Claude Code, Cursor, and similar tools pay off for experienced engineers?

I have been using Claude Code as part of my own work for the last year, and based on conversations with other senior engineers, a pattern emerges that is less glamorous than the marketing but more honest.

The biggest wins are in the boring middle layer:

  • Scaffolding new modules or test harnesses from a spec.
  • Mechanical refactors across many files where the pattern is clear but tedious: renames, signature changes, null-handling additions.
  • Writing test cases from existing code.
  • Translating between similar frameworks or languages.
  • Exploring an unfamiliar codebase to answer specific questions.

These are the tasks where my own experience lines up with the GitHub field-study direction: real speedup, limited risk of the model producing the wrong kind of thing. They are also the tasks that previously ate a disproportionate share of an engineer's week.

The smaller, harder wins are in genuinely novel work, where the model functions less as a coder and more as an interactive rubber duck: a thing to argue with while you figure out the architecture. That is real value but hard to measure.

The cases where the tools underperform or slow you down:

  • Deep debugging of a subtle bug in a system you already know well. The model generates plausible-looking wrong fixes faster than it generates right ones.
  • Work that requires holding a lot of implicit context about the codebase in your head. The model does not share your mental model, and explaining enough of it for the model to be useful can cost more time than just doing the work.
  • High-stakes touches to auth, billing, data-layer code, where every line has to be right and reviewed.

This is the shape of the METR finding, translated into lived experience. The model is a leverage tool that amplifies your input. If your input is a junior-sized understanding, it amplifies that into a meaningful productivity gain. If your input is a senior-sized understanding of a specific codebase, the amplification works less well because the bottleneck was never typing.

Habits that actually matter#

A few observations about how I and other senior engineers I have talked to use these tools effectively, distinct from the vibe-coding framing.

Read the code. I read every line the model writes that goes into my repo. Not out of distrust — after a year of Claude Code usage I know roughly where it is reliable. I read because I need to know what the system does, for the same reason I review a colleague's pull request. Not reading means losing my mental model of the system.

Scope is still a skill. The biggest differentiator between a senior and a junior using these tools is scope management. Asking the model to do too much at once produces output that is harder to review than to have written yourself. Senior users tend to break work into chunks the model handles well.

More tests, not fewer. Cheap test generation is one of the most valuable capabilities these tools have. Asking Claude Code to write tests is cheap, and the tests are how I know the generated code actually does what I intended. This is one of the places where the productivity gain is real and measurable for me.

Ownership does not move. The model writes the code. I ship the code. Whatever goes into the repo is my responsibility. The "AI did it" excuse does not exist in any codebase I would want to work in.

What this means for the hype curve#

Cursor closed a Series D at $29.3 billion valuation in November 2025 (CNBC coverage), with reports in April 2026 of further funding talks at higher valuations. Anthropic launched Claude Code in preview in February 2025 and generally available in May 2025 alongside Claude 4, with a web version following in October 2025.

These are not small products. They are real businesses with real users producing real work. What they are not is a replacement for engineering judgment. The METR result is the first rigorous measurement of what happens when you drop AI coding tools into the workflow of experienced engineers on mature codebases, and the answer is that the impact is mixed and context-dependent. That is a useful result, and it argues against both extremes of the debate.

The vibe-coding framing, in its stronger form, was always wrong about senior work. The data is now catching up to what practitioners noticed. The tools are useful, sometimes dramatically so, for specific kinds of work. They are not a universal solvent for software engineering, and they are particularly not a shortcut around the understanding that makes engineering work in the first place.

Further reading#

Sources#


Roland Hentschel

Roland Hentschel

AI & Web Technology Expert

Web developer and AI enthusiast helping businesses navigate the rapidly evolving landscape of AI tools. Testing and comparing tools so you don't have to.

Tools Covered in This Post

More from the Blog