Agentic coding from Galapagos Island (what is the appeal to this?)

HaraldvonBlauzahn@feddit.org · edit-2 23 hours ago

Agentic coding from Galapagos Island (what is the appeal to this?)

HaraldvonBlauzahn@feddit.org · edit-2 22 hours ago

And another aspect is: You can, of course, engineer reliable things from unreliable components. Much of hardware works like that. Even my bicycle needs to have two brakes, for redundancy. Cloud computing and things like distributed databases and file systems works like that, at the price of massive complexity.

I can see that some intelligent people are attracted by the challenge. Like a juggler who tries to keep more balls in the air.

But for generating code and algorithms, and the price being intelligibility and maintainability - is this a good idea?

litchralee@sh.itjust.works · 8 hours ago

You can, of course, engineer reliable things from unreliable components.

I think the only way this statement can hold true in all circumstances is if we select an arbitrary boundary for what constitutes “reliable”. And that’s no small matter, because the threshold of reliability in a consumer IoT device would be inappropriate in a commercial or automotive setting, would be deeply wrong for industrial personnel safety, would be manifestly unlawful for a military or aerospace application, and potentially fatal for medical use.

Engineering is all about balancing a set of objectives, be it cost, time to market, efficiency, size, weight, or competitive advantage, and more. Doubling up as a way to improve reliability necessarily implicates size, complexity, and efficiency, but that’s tolerable for large data centers where the customer counts servers by the number of floors, not the number of Rack Units (RU). But no one would accept installing two pacemakers because one of them might fail early; that’s an intolerable solution to the product’s base objective.

As it happens, most USA jurisdictions only require a single brake on a bicycle, and it doesn’t even have to be on the more-effective front wheel. But the idea in law is to enforce the absolute minimum of requirements: having no brakes at all is where the line has been drawn, for a mode of transport that rarely gets above 50 kph (~30 MPH). But even then, all commercial bicycles for sale must have two brakes, so the law implicitly allows for some lost redundancy, because even one brake should be enough.

Could a bicycle brake be developed such that it is inherently always able to stop? Likely yes. Would it appreciably improve macro safety objectives such as by reducing collisions with stationary objects? No, not really.

And that’s the rub: just because engineers can double up things to get more redundancy, is this any better than the alternative? If an LLM is used as a search engine, is that appreciably better than using “grep”, a battle tested, secure, locally-ran application with a lineage harkening back to the 70s?

The drawback with inherent unreliability is that it can only be statistically reduced, but never eliminated. NASA understands this risk better than most, because cost pressures mean they can’t be using military-grade hardware for everything. Perhaps then, it can be better said that engineers also have to balance risk in their decisions, and as it stands right now, the risk/uncertainty for LLM output is unquantifiable by any existing approach.

Academics have long been researching ways to make LLMs “safe”, so that their outputs are constrainted in concrete ways. But I believe they’ve long concluded that the current approach of generative transformers simply cannot have safety “bolted on” after the fact. New constructions for machine learning will have to be invented with safety from day zero. The academics continue to work on that, while the commercial AI vendors are barreling ahead with LLMs, in spite of their risks and in the pursuit of a return.

I’ve not seen anything that would suggest the academics are wrong, nor that industry has managed to produce large safety or reliability improvements, so at this point, I only see a plateau and dead end for the industry. Maybe if the industry would put more into R&D and theoretical work, this would be a lot more graceful as they run up against the buffer stops.

MagicShel@lemmy.zip · 19 hours ago

One key piece of getting good results from LLMs is not to have them do anything you can’t do yourself. I catch AI doing weird things all the time and just fix it or have AI fix it accordingly.

Left to its own devices, AI will generally produce bad output over a large enough size. This is why I argue AI will ultimately not replace developers. Even the best models I’ve seen just make more sophisticated errors. The product must be reviewed and fixed by someone who actually understands how to write it.

The question is more the threshold at which AI costs more than is gained in efficiency. As we’ve seen a lot of folks don’t gain efficiency, that’s obvious in some cases. Yet, other folks do see gains and the question is whether this is a domain issue or a technique issue.

Agentic coding from Galapagos Island (what is the appeal to this?)

Agentic coding from Galapagos Island (what is the appeal to this?)

Agentic test processes, LLM benchmarks, and other notes on agentic coding from Galapagos Island