METR AI Coding Research Inconclusive Because Dev Participants Refused to Complete Tasks Without AI

brianpeiris@lemmy.ca · edit-2 28 days ago

METR AI Coding Research Inconclusive Because Dev Participants Refused to Complete Tasks Without AI

Rimu@piefed.social · 28 days ago

20 hours of work turns into 20 minutes

The gains, where they exist, are nowhere near that much. In some cases, it makes developers slower (even though they think they’re a bit faster):

we find that when developers use AI tools, they take 19% longer than without - AI makes them slower.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

pixxelkick@lemmy.world · 28 days ago

Have you actually read the study? People keep citing this study without reading it.

To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work.

They grabbed like 8 devs who did not have pre-existing set up workflows for optimizing AI usage, and just throw them into it as a measure of “does it help”

Imagine if I grabbed 8 devs who had never used neovim before and threw them into it without any plugins installed or configuration and tried to use that as a metric for “is nvim good for productivity”

People need to stop quoting this fuckass study lol, its basically meaningless.

Im a developer using agentic workflows with over 17 years experience.

I am telling you right now, with the right setup, I weekly turn 20 hour jobs into 20 minute jobs.

Predominantly large “bulk” operations that are mostly just boilerplate code that is necessary, where the AI has an existing huge codebase to draw from as samples and I just give it instructions of “see what already exists? implement more of that following <spec>”

A great example is integration testing where like 99% of the code is just boilerplate.

Arrange the same setup every time. Arrange your request following an openapi spec file. Send the request. Assert on the response based on the openapi spec.

I had an agent pump out 120 integration tests based on a spec file yesterday and they were, for the most part, 100% correct, yesterday. In like an hour.

The same volume of work would’ve easily taken me way longer.

slacktoid@lemmy.ml · 28 days ago

What about developer burnout rates? Cause those same studies also say there was significantly less Dev burnout happening.

xthexder@l.sw0.com · 27 days ago

If anything my personal experience is the opposite. When using AI the way work wants me to, with multiple agents going in the background, I’ve completely lost any sort of “flow state” I normally get when focused on a problem. It’s no fun anymore, and the only thing keeping me going is working on my personal projects without AI in my free time… I didn’t get in to this to become an AI babysitter.

slacktoid@lemmy.ml · 27 days ago

Yeah I get that. I just like avoiding having to do boring tasks is all so that I can work on the core problem I’m trying to solve. I don’t want to deal with code refactoring manually, I’d rather babysit this thing to do that piece by piece. It’d probably take me longer, cause id do something else on the side that I actually wanted to work on, but id be more content not having to manually do the tedious refactoring myself.

AI is not a catch all for all problems, if I’m thinking something thru very different set of tools for that. I might use an LLM for that but mainly as an interface over a vectordb and help me look things up, and not write or show me any code ever. Essentially a contextual grep or rg.

Sorry you’re being forced to use a hammer to make a surgical precision cut. That really sucks man.

METR AI Coding Research Inconclusive Because Dev Participants Refused to Complete Tasks Without AI

METR AI Coding Research Inconclusive Because Dev Participants Refused to Complete Tasks Without AI

We are Changing our Developer Productivity Experiment Design