I promise this blog will have more than an engineer yelling at the clouds about LLM pain. I was actually working on a retrospective on X-Men Legends, or my experiences building an NES emulator in Rust. However slop PRs started hitting a codebase I maintain at work. And like any healthy individual I wish to commiserate about this with others on the internet.
I’d like to think I have some ability for introspection. Is it me that’s the problem? Am I just not with it anymore? It’s one thing to see the spam come in and go “man this sucks”. But it’s another thing to have to explain to misguided individuals why their behavior is destructive. This is my attempt at delving into the “why”. Why do these slop PRs bother me so much? Why do they feel like such a drain?
With more and more places pushing LLM generated code I’m getting real review fatigue. Code reviews used to be very reasonable and you could be very mindful about other persons code. Now it’s starting to be just LLM generated code which makes reviewing tedious as you are reading the same sloppy stuff in hopes that you can build your mental model of the changes to codebase before new LLM slop bomb hits the PR queue.
One huge issue is that LLMs do weird and stupid things differently than how humans do them.
If you’ve developed an eye for reading human-made changes, you’re not necessarily going to recognize new and surprising failure modes as easily. It’s literally harder than regular code review.
Humans with modern tooling, for example, rarely hallucinate field/class/method/object names because non-spicy autocomplete keeps them on the rails. LLMs seem much more willing to decide the menu bar is .menuBar and not .topMenu, probably because their training corpus is full of the former.
Exactly.
Another problem with LLMs is that they are actually useful in some tasks and they can generate very good quality code if you’re diligent enough developer. I also have built personal tools with them, but I don’t have the knowledge of the code the LLM has hallucinated which means that before I would push this code forward I will have to basically familiarise myself with the code in a way how a code review works.
The knowledge you gain from this is also different from that of actually writing and running the code yourself. I have seen people who use LLMs to write commit messages which is the last thing you should do. Commit messages are probably the only places were we can meaningfully store the knowledge gathered during development and the more I see LLM commits the more I lose hope.
Surely we can come up with networks of trust for this sort of thing, so that you don’t have to deal with PRs from people with no references.
Everyone starts off without references, and there’s already less of a pipeline from user to helpful contributor to fellow maintainer than most projects want without having to add more chokepoints. There isn’t a solution without downsides while there are people using LLMs.
That’s true, but as a maintainer you could encourage those helpful maintainers to triage issues from regular users.
I think the real benefit would come from taking a user’s reputation into account across projects.
At the end of the day you can’t have low effort pull requests, and expect maintainers to look at everything. It’s the same spam problem as in any other domain.
This works if you have the luxury to select the people whose PRs you review, but in a corporate environment you just don’t have that option. I would love to just reject obvious LLM code, but it’s not going to keep me employed. Instead I’m stuck at figuring out how to meaningfully review LLM changes and how to manage the mental model with these rapid changes.
Humans are also lazy. I won’t make 5,000 line changes just because I feel like refactoring while adding a field to a query. A LLM might, and I don’t want to read all of that.