What old datacenter / AI hardware could end up in desktop PC's?

kahjtheundedicated@lemmy.world · edit-2 14 hours ago

What old datacenter / AI hardware could end up in desktop PC's?

tal@lemmy.today · edit-2 10 hours ago

If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

By the time the things are cycled out, they may not be terribly compute-competitive, in which case…shrugs

Also, a major unknown is where models go. Say that a bunch of people decide that they can’t get access to parallel compute hardware or a lot of memory, and they research looking at models split up into MoEs or otherwise broken up. Recent LLM models have been oriented towards MoEs. Llama.cpp, and I assume the other engines capable of running LLMs, has the ability to offload experts that don’t fit in GPU memory to main memory when they aren’t actively being used. Then maybe…having a bank of consumer-level 24GB GPUs or something like that is fine, and having chips with direct access to very large amounts of memory isn’t all that interesting. Then, what becomes essential to being competitive changes.

EDIT: I also think that it’s safe to say that more memory will probably always benefit. But I’ll also say that it is also probably very likely that our existing models are staggeringly inefficient with memory. We are still doing early passes at this.

Let me give you an example. I have a LLama 3-based model currently loaded on my Framework Desktop that’s using 96GB of memory for the model and associated storage.

Prompt: What is 1+1?

Response: I can answer that. 1+1 = 2.

Prompt: How about 37 times 12?

Response: 37 times 12 is 444.

Now, those are correct answers. But…in order to make an LLM capable of providing that correct response, to do it purely via running a neural net trained on natural language, we had to stick a really inefficient amount of data into memory. That same hardware that I’m running it on has the ability to do billions of integer computations per second. As of today, the software running that model doesn’t provide it access to the hardware, and the model was never trained to use it. But…it could be. And if it were, suddenly a lot of need for storing edges in some neural net wasted on arithmetic go away.

Plus, we could get better results:

Prompt: What about 783901/76523?

Response: 783901 divided by 76523 is approximately 10.23.

That’s not far off — it is approximately 10.23 — but it should have been rounded to 10.24.

$ maxima -q

(%i1) float(783901/76523);

(%o1)                         10.243992002404506
(%i2)

So we could probably get more-useful models that don’t waste a ton of space in the model if we gave the model access to the computational hardware that’s presently sitting idle and trained it to use it. That’s an off-the-cuff example, but I think that it highlights how we’re solving problems inefficiently in terms of memory.

Same sort of thing with a lot of other problems that we have (immensely-more-efficient and probably accurate) software packages that we can already solve problems with. If you can train the model to use those and run the software in an isolated sandbox rather than trying to do it itself, then we don’t need to blow space in the LLM on the capabilities there, shrink it.

If we reduce the memory requirements enough to solve a lot of problems that people want with a much-smaller amount of memory, or with a much-less-densely-connected set of neural networks, the hardware that people care about may radically change. In early 2026, the most-in-demand hardware is hugely-power-hungry parallel processors with immense amounts of memory directly connected to it. But maybe, in 2028, we figure out how to get models to use existing software packages designed for mostly-serial computation, and suddenly, what everyone is falling over themselves to get ahold of is more-traditional computer hardware. Maybe the neural net isn’t even where most of the computation is happening for most workloads.

Maybe the future is training a model to use a library of software and to write tiny, throwaway programs that run on completely different hardware optimized for this “model scratch computation” purposes, and mostly it consulting those.

Lot of unknowns there.

CameronDev@programming.dev · 11 hours ago

Agreed on all points, I think OP is hoping the bubble will burst, and the big players will have to unload their excess hardware all at once. But I don’t think that’s likely tbh.