Developers who use AI think they're quicker and better, but they're actually slower and worse. Not surprised to hear it's similar in other fields. AI in general, just like drugs, makes people feel good but without any actual substance behind the feeling.
(If feeling without substance is all you need, then it's okay to use AI. AI Dungeon, for example, was pretty cool. Or slide backgrounds that would have otherwise been solid colours because they're worth $0 and you wouldn't have paid a designer.)
This story is not about developers, but setting that aside: The reason it's not damning is that the results can't be generalized. It's mostly self-reported "minutes per issue" anecdata by 16 experienced OSS maintainers working on their own, mature repos, most of whom were new to Cursor and did not a chance to develop or adopt frameworks for AI-assisted development.
That has not been my experience at all. Whenever I tried asking the AI to do something, it took an inordinate amount of time and thought to go through its changes.
The mistakes were subtle but critical. Like copying a mutex by value. Now, if I would be writing the code, I would not make the mistake. But when reviewing, it almost slipped by.
And that's where the issue is: you have to review the code as if it's written by a clueless junior dev. So, building up the mental model when reviewing, going through all the cases and thinking of the mistakes that could possibly have happened... sorry, no help.
Maybe 10% of typing it out but when I think about it, it's taking more time because I have to create the mental model in my mind then create the mental model out of the thing that AI typed out and then have to compare the two. This latter is much more time consuming than creating the model in my mind and typing it out.
I think that programming languages (well, at least some of them, maybe not all) have succeeded in being good ways of expressing programmatic thought. If I know what I want in (say) C, it can be faster for me to write C code than to write English that describes C code.
I guess it depends on what you use it for. I found it quite relaxing to get AI to write a bunch of unit tests for existing code. Simple and easy to review, and not fun to write myself.
> Developers who use AI think they're quicker and better, but they're actually slower and worse.
You responded that this is a "gross overgeneralization of the content of the actual study", but the study appears to back up the original statement. To quote the summary:
> When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.
(I realise newer models have been released since the study, but you didn't claim that the findings have been superceded.)
Sure! The study focused on experienced devs working in complex codebases they already knew thoroughly. This was and is the worst case for using AI tooling from a cold start, _particularly_ AI tooling as it existed at the time.
There were also only 16 developers involved in the study.
Time has passed since the study and we've had an entirely new class of tool introduced (the agentic CLI a la Claude Code) as well as two subsequent generations of model improvement (Sonnet 3.7 to Sonnet 4 to Sonnet 4.5). Given that the results of the METR study were stated as an eternal, unqualified truth, the fact that tooling and models are much superior now compared to when the study was conducted is worth noting as well.
Which would be non-news if the developers also thought it wasn't going to be helpful because they already knew their codebases thoroughly. Or at least if they did the task, and then reported that AI made it harder. But in reality, they expected it to be faster, and then after doing it slower, said they'd done it faster. That's weird.
I appreciate the clarification. From my perspective, the most striking observation was the gap between perception and reality. Whether the recent model advances have widened or narrowed that gap is unclear.
> You responded that this is a "gross overgeneralization of the content of the actual study", but the study appears to back up the original statement.
It doesn't, and the study authors themselves are pretty clear about the limitations. The irony is that current foundation models are pretty good at helping to identify why this study doesn't offer useful general insights into the productivity benefits (or lack of) of AI-assisted development.
(If feeling without substance is all you need, then it's okay to use AI. AI Dungeon, for example, was pretty cool. Or slide backgrounds that would have otherwise been solid colours because they're worth $0 and you wouldn't have paid a designer.)
This first chart should be absolutely damning: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...