

I keep seeing the “it’s good for prototyping” argument they post here, in real life.
There are real cases where bugs aren’t a huge deal.
Take shell scripts. Bash is designed to make it really fast to write throwaway, often one-line software that can accomplish a lot with minimal time.
Bash is not, as a programming language, very optimized for catching corner cases, or writing highly-secure code, or highly-maintainable code. The great majority of bash code that I have written is throwaway code, stuff that I will use once and not even bother to save. It doesn’t have to handle all situations or be hardened. It just has to fill that niche of code that can be written really quickly. But that doesn’t mean that it’s not valuable. I can imagine generated code with some bugs not being such a huge problem there. If it runs once and appears to work for the inputs in that particular scenario, that may be totally fine.
Or, take test code. I’m not going to spend a lot of time making test code perfect. If it fails, it’s probably not the end of the world. There are invariably cases that I won’t have written test code for. “Good enough” is often just fine there.
And it might be possible to, instead of (or in addition to) having human-written commit messages, generate descriptions of commits or something down the line for someone browsing code.
I still feel like I’m stretching, though. Like…I feel like what people are envisioning is some kind of self-improving AI software package, or just letting an LLM go and having it pump out a new version of Microsoft Office. And I’m deeply skeptical that we’re going to get there just on the back of LLMs. I think that we’re going to need more-sophisticated AI systems.
I remember working on one large, multithreaded codebase where a developer who isn’t familiar with or isn’t following the thread-safety constraints would create an absolute maintenance nightmare for others, where you’re going to spend way more time tracking down and fixing breakages induced than you saved by them not spending time coming up to speed on the constraints that their code needs to conform to. And the existing code-generation systems just aren’t really in a great position to come up to speed on those constraints. Part of what a programmer does is, when writing code, is to look at the human-language requirements, and identify that there are undefined cases and go back and clarify the requirement with the user, or use real-world knowledge to make reasonable calls. Training an LLM to map from an English-language description to code is creating a system that just doesn’t have the capability to do that sort of thing.
But, hey, we’ll see.
















So, this is an area where I’m also pretty skeptical. It might be possible to address some of the security issues by making minor shifts away from a pure-LLM system. There are (conventional) security code-analysis tools out there, stuff like Coverity. Like, maybe if one says “all of the code coming out of this LLM gets rammed through a series of security-analysis tools”, you catch enough to bring the security flaws down to a tolerable level.
One item that they highlight is the problem of API keys being committed. I’d bet that there’s already software that will run on git-commit hooks that will try to red-flag those, for example. Yes, in theory an LLM could embed them into code in some sort of obfuscated form that slips through, but I bet that it’s reasonable to have heuristics that can catch most of that, that will be good-enough, and that such software isn’t terribly difficult to write.
But in general, I think that LLMs and image diffusion models are, in their present form, more useful for generating output that a human will consume than that a CPU will consume. CPUs are not tolerant of errors in programming languages. Humans often just need an approximately-right answer, to cue our brains, which itself has the right information to construct the desired mental state. An oil painting isn’t a perfect rendition of the real world, but it’s good enough, as it can hint to us what the artist wanted to convey by cuing up the appropriate information about the world that we have in our brains.
This Monet isn’t a perfect rendition of the world. But because we have knowledge in our brain about what the real world looks like, there’s enough information in the painting to cue up the right things in our head to let us construct a mental image.
Ditto for rough concept art. Similarly, a diffusion model can get an image approximately right — some errors often just aren’t all that big a deal.
But a lot of what one is producing when programming is going to be consumed by a CPU that doesn’t work the way that a human brain does. A significant error rate isn’t good enough; the CPU isn’t going to patch over flaws and errors itself using its knowledge of what the program should do.
EDIT:
Yes. Here are instructions for setting up trufflehog to run on git pre-commit hooks to do just that.
EDIT2: Though you’d need to disable this trufflehog functionality and have some out-of-band method for flagging false positives, or an LLM could learn to bypass the security-auditing code by being trained on code that flags false positives: