Why write code if the LLM can just do the thing? (web app experiment)

vermaterc@lemmy.ml · 1 month ago

Why write code if the LLM can just do the thing? (web app experiment)

kibiz0r@midwest.social · 1 month ago

No apps, no code, just intent and execution.

So the only problems you’re left with are:

Making a precise description of what you want, at high and low levels of detail with consistent terminology
Verifying that the system is behaving as you expect, by exercising specific parts of it in isolation
The ability to make small incremental steps from one complete working state to the next complete working state, so you don’t get stuck by painting yourself into a corner

Problems which… code is much better than English at handling.

And always will be.

Almost like there’s a reason code exists other than just “Idk let’s make it hard so normies can’t do it mwahaha”.

Jankatarch@lemmy.world · edit-2 1 month ago

It’s really funny to think about.

Equations and algorithms used to be written in human language and there were many problems.

So over thousands of years we made this great thing called math language.

And now some people are saying it’s elitist to be against writing algorithms in human language.

MajorHavoc@programming.dev · edit-2 1 month ago

Okay, this is fun, but it’s time for an old programmer to yell at the cloud, a little bit:

The cost per AI request is not trending toward zero.

Current ludicrous costs are subsidized by money from gullible investors.

The cost model whole house of cards desperately depends on the poorly supported belief that the costs will rocket downward due to some future incredible discovery very very soon.

We’re watching an edurance test between irrational investors and the stubborn boring nearly completely spent tail end of Moore’s law.

My money is in a mattress waiting to buy a ten pack of discount GPU chips.

Hallucinating a new unpredictable result every time will never make any sense for work that even slightly matters.

But, this test still super fucking cool. I can think of half a dozen novel valuable ways to apply this for real world use. Of course, the reason I can think of those is because I’m an actual expert in computers.

Finally - I keep noticing that the biggest AI apologists I meet tend to be people who aren’t experts in computers, and are tired of their “million dollar” secret idea being ignored by actual computer experts.

I think it is great that the barrier of entry is going down for building each unique million dollar idea.

For the ideas that turn out to actually be market viable, I look forward to collaborating with some folks in exchange for hard cash, after the AI runs out of lucky guesses.

If we can’t make an equitable deal, I look forward to spending a few weeks catching up to their AI start-up proof-of-concept, and then spending 5 years courting their customers to my new solution using hard work and hard earned decades of expert knowledge.

This cool AI stuff does change things, but it changes things far less than the tech bros hope you will believe.

pelya@lemmy.world · 1 month ago

The future is here! And it costs $10-$50 per 1000 HTTP requests.

vermaterc@lemmy.ml · 1 month ago

Yes, sounds ridiculous, but how will this ratio change if we take into account the cost of hiring a programmer and the costs of implementation of a niche feature that this experiment provides at a cost of LLM inference?

Also: we can cache and reuse enpoint implementation.

wischi@programming.dev · 1 month ago

Play tic tac toe a few times against Chat-GPT. Wouldn’t trust an LLM that can’t win tic tac toe against four year olds with production code 🤣

4am@lemmy.zip · edit-2 1 month ago

The cost of an HTTP request with a normal web server is fractions of a penny, perhaps even less.

$50 for 1000 requests is $5 per request. Per request. One page load on Lemmy can be 100 requests.

Your company is bankrupt in 24 hours.

Yes it’s much cheaper to hire a guy to create a feature than it is _have an LLM hallucinate a new HTTP response in realtime _ each time a browser sends a packet to your webserver.

And from a .ml user too, I’d like to think you’d see through this LLM horseshit, brother. It’s a capitalist mind trap, they’re creating a religion around it to allow magical thinking to drive profits.

kate@lemmy.uhhoh.com · 1 month ago

$50 for 1000 requests is $5 per request

me when i use chat gpt to do maths

pelya@lemmy.world · 1 month ago

Considering that most techbro startups are going to be dead within a year, I’d say AI wins.
Plus most of the competent programmers already have high resistance for technobabble bullshit, and will simply refuse to work on something like an online contacts app (are you copying a Facebook or what?)

thingsiplay@beehaw.org · 1 month ago

I like writing code myself, its a process I enjoy. If the LLM write it for me, then I would only do the worse part of the job: debugging. Also for many people let the Ai write code means less understanding. Otherwise you could have written it yourself. However there are things when the Ai is helpful, especially for writing tests in a restrictive language such as Rust. People forget that writing the code is one part of the job, the other is to depend on it, debug and build other stuff on top.

Ephera@lemmy.ml · 1 month ago

However there are things when the Ai is helpful, especially for writing tests in a restrictive language such as Rust.

For generating the boilerplate surrounding it, sure.
But the contents of the tests are your specification. They’re the one part of the code, where you should be thinking what needs to happen and they should be readable.

A colleague at work generated unit tests and it’s the stupidest code I’ve seen in a long while, with all imports repeated in each test case, as well as tons of random assertions also repeated in each test case, like some shotgun-approach to regression testing.
It makes it impossible to know which parts of the asserted behaviour are actually intended and which parts just got caught in the crossfire.

kibiz0r@midwest.social · 1 month ago

I think maybe the biggest conceptual mistake in computer science was calling them “tests”.

That word has all sorts of incorrect connotations to it:

That they should be made after the implementation
That they’re only useful if you’re unsure of the implementation
That they should be looking for deviations from intention, instead of giving you a richer palette with which to paint your intention

You get this notion of running off to apply a ruler and a level to some structure that’s already built, adding notes to a clipboard about what’s wrong with it.

You should think of it as a pencil and paper — a place where you can be abstract, not worry about the nitty-gritty details (unless you want to), and focus on what would be right about an implementation that adheres to this design.

Like “I don’t care how it does it, but if you unmount and remount this component it should show the previous state without waiting for an HTTP request”.

Very different mindset from “Okay, I implemented this caching system, now I’m gonna write tests to see if there are any off-by-one errors when retrieving indexed data”.

I think that, very often, writing tests after the impl is worse than not writing tests at all. Cuz unless you’re some sort of wizard, you probably didn’t write the impl with enough flexibility for your tests to be flexible too. So you end up with brittle tests that break for bad reasons and reproduce all of the same assumptions that the impl has.

You spent extra time on the task, and the result is that when you have to come back and change the impl you’ll have to spend extra time changing the tests too. Instead of the tests helping you write the code faster in the first place, and helping you limit your tests to only what you actually care about keeping the same long-term.

thingsiplay@beehaw.org · edit-2 1 month ago

It’s actually the first time I used to do Ai assisted unit test creation. There were multiple iterations and sometimes it never worked well. And the most important part is, as you say, think through and read every single test case and edit or replace if necessary. Some tests are really stupid, especially stuff that is already encoded in the type system through Rust. I mean you still need a head for revision and know what you want to do.

I still wonder if I should have just gave it the function signature without the inner workings of the function. That’s an approach I want to explore next time. I really enjoyed working with it for the tests, because writing tests is very time consuming. Although I am not much of test guy, so maybe the results aren’t that good anyway.

Edit: In about 250 unit tests (which does not cover all functions sadly) for a cli json based tool, several bugs were found thanks to this approach. I wouldn’t have done it manually.

TehPers@beehaw.org · edit-2 1 month ago

The conclusion of this experiment is objectively wrong when generalized. At work, to my disappointment, we have been trying for years to make this work, and it has been failure after failure (and I wish we’d just stop, but eventually we moved to more useful stuff like building tools adjacent to the problem, which is honestly the only reason I stuck around).

There are a couple reasons why this problem cannot succeed:

The outputs of LLMs are nondeterministic. Most problems require determinism. For example, REST API standards require idempotency from some kinds of requests, and a LLM without a fixed seed and a temperature of 0 will return different responses at least some of the time.
Most real-world problems are not simple input-output machines. When calling, let’s say for example, an API to post a message to Lemmy, that endpoint does a lot of work. It needs to store the message in the darabase, federate the message, and verify that the message is safe. It also needs to validate the user’s credential before all of this, and it needs to record telemetry for observability purposes. LLMs are not able to do all this. They might, if you’re really lucky, be able to generate code that does this, but a single LLM call can’t do it by itself.
Some real world problems operate on unbounded input sizes. Context sizes are constrained and as currently designed cannot handle unbounded inputs. See signal processing for an example of this, and for an example of a problem a LLM cannot solve because it cannot receive the input.
LLM outputs cannot be deterministically improved. You can make changes to prompts and so on but the output will not monotonically improve when doing this. Improving one result often means sacrificing another result.
The kinds of models you want to run are not in your control. Using Claude? K Anthropic updated the model and now your outputs all changed and you need to update your prompts again. This fucked us over many times.

The list keeps going on. My suggestion? Just don’t. You’ll spend less time implementing the thing than trying to get an LLM to do it. You’ll save operating expenses. You’ll be less of an asshole.

rozodru@pie.andmc.ca · 1 month ago

But you can do all this anyways, this isn’t new or ground breaking.

you can load up Claude Code an a completely empty directory and tell it to build something and it will do it. it’ll do it slowly and most of the time incorrectly but it’ll eventually build “something” that will sort of work. Unless i’m still waiting for my coffee to kick in and I’m missing something here - companies already do this. Hell a lot of my current clients currently do this. no code, nothing to base anything off, just tells Claude an idea for a project and to build it.

4am@lemmy.zip · 1 month ago

Yeah so what I’m getting from the description is that this LLM doesn’t generate code, at all.

This feeds HTTP traffic directly to an LLM that is prompted how to respond to those requests.

This isn’t an LLM being served prompts to write code to create an HTTP server; the model’s output IS the HTTP server. The model itself is being the webserver, instead of being an autocomplete for an IDE.

The author seems to acknowledge that “the future where it’s just us and our LLMs and intent, no code and no apps” is “science fiction” but he wanted to see how close we could get with today’s tech.

Beryl@jlai.lu · 1 month ago

Thanks for making this clear. Certainly a fun little experiment, but the shear inefficiency of the whole thing just boggles one’s mind. Hopefully this is not the direction tech is going though, it’s not like we should curb our energy needs anyway…

rozodru@pie.andmc.ca · 1 month ago

ah ok, thanks for explaining that makes sense. yeah I clearly missed it.

ImgurRefugee114@reddthat.com · 1 month ago

sobchak@programming.dev · 1 month ago

Kinda reminds me of this: I built the most expensive CPU ever! (Every instruction is a prompt)

Why write code if the LLM can just do the thing? (web app experiment)

Why write code if the LLM can just do the thing? (web app experiment)

GitHub - samrolken/nokode