What old datacenter / AI hardware could end up in desktop PC's?

kahjtheundedicated@lemmy.world · edit-2 13 hours ago

What old datacenter / AI hardware could end up in desktop PC's?

empireOfLove2@lemmy.dbzer0.com · 13 hours ago

Memory and CPU’s are about it.

GPU’s have all shifted to bespoke hardware that is physically impossible to run on consumer hardware platforms. All the Blackwell etc type chips are insanely dense. Most GPU’s built for datacenter use don’t even have video output hardware so it’s somewhat useless.

Memory (DIMM’s) are somewhat standard. Most servers use registered ECC which doesn’t work in consumer platforms, but the actual memory chips themselves could be removed and replaced onto normal consumer DIMM’s as they are basically univsersal.

x86 CPU’s are still CPU’s at least. You might need weird motherboards but those can still be run by us plebs.

Samskara@sh.itjust.works · 1 hour ago

Buying ECC RAM is a good idea.

Hamartiogonic@sopuli.xyz · 8 hours ago

You could just buy one of those workstations that are actually almost servers. Some of them have 2 CPUs, 8 slots for RAM and a PCIe slot for your GPU. Those motherboards can handle ECC.

jj4211@lemmy.world · 6 hours ago

The server boards would pretty much have to come with them. Also, and if those cpus go as high as 500W, and as a result a lot of homes might not have a powerful enough socket to power them. Even without GPUs, might need something like a dryer outlet to realistically power.

CameronDev@programming.dev · 2 hours ago

500w isn’t that high, sockets in Aus can push out 2000w+ without any issues, and your not gonna spend 1500w on the rest of the system.

Quick google suggests USA can do 1800w with a 15a circuit, or 2400w with 20a circuit, so plenty of headroom there as well.

Remember people plug space heaters into sockets, and those will out draw even a high end CPU easily.

jj4211@lemmy.world · 1 hour ago

Keep in mind these are dual socket systems, and that’s CPU without any GPU yet. So with the CPUs populated and a consumer-grade high end GPU added, those components are at 1500W, ignoring PSU inefficiencies and other components that can consume non-trivial power.

For USA, you almost never run a 20A circuit, most are 15A, but even then that’s considered short term consumption and if you run over a longer term it’s supposed to be 80%, so down to 1440W. Space heaters usually max out at 1400W in the USA when expected to plug into a standard outlet because of this. A die-hard enthusiast might figure out how to spread non-rendundant multiple PSUs across circuits, or have a rare 20A circuit run, but it’s going to be a very very small niche.

CameronDev@programming.dev · 12 hours ago

Removing and replacing memory chips is likely so labor expensive that it’ll never happen.

If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

My money is on an unprecedented level of e-waste, and nothing trickling down to consumers…

tal@lemmy.today · edit-2 9 hours ago

If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

By the time the things are cycled out, they may not be terribly compute-competitive, in which case…shrugs

Also, a major unknown is where models go. Say that a bunch of people decide that they can’t get access to parallel compute hardware or a lot of memory, and they research looking at models split up into MoEs or otherwise broken up. Recent LLM models have been oriented towards MoEs. Llama.cpp, and I assume the other engines capable of running LLMs, has the ability to offload experts that don’t fit in GPU memory to main memory when they aren’t actively being used. Then maybe…having a bank of consumer-level 24GB GPUs or something like that is fine, and having chips with direct access to very large amounts of memory isn’t all that interesting. Then, what becomes essential to being competitive changes.

EDIT: I also think that it’s safe to say that more memory will probably always benefit. But I’ll also say that it is also probably very likely that our existing models are staggeringly inefficient with memory. We are still doing early passes at this.

Let me give you an example. I have a LLama 3-based model currently loaded on my Framework Desktop that’s using 96GB of memory for the model and associated storage.

Prompt: What is 1+1?

Response: I can answer that. 1+1 = 2.

Prompt: How about 37 times 12?

Response: 37 times 12 is 444.

Now, those are correct answers. But…in order to make an LLM capable of providing that correct response, to do it purely via running a neural net trained on natural language, we had to stick a really inefficient amount of data into memory. That same hardware that I’m running it on has the ability to do billions of integer computations per second. As of today, the software running that model doesn’t provide it access to the hardware, and the model was never trained to use it. But…it could be. And if it were, suddenly a lot of need for storing edges in some neural net wasted on arithmetic go away.

Plus, we could get better results:

Prompt: What about 783901/76523?

Response: 783901 divided by 76523 is approximately 10.23.

That’s not far off — it is approximately 10.23 — but it should have been rounded to 10.24.

$ maxima -q

(%i1) float(783901/76523);

(%o1)                         10.243992002404506
(%i2)

So we could probably get more-useful models that don’t waste a ton of space in the model if we gave the model access to the computational hardware that’s presently sitting idle and trained it to use it. That’s an off-the-cuff example, but I think that it highlights how we’re solving problems inefficiently in terms of memory.

Same sort of thing with a lot of other problems that we have (immensely-more-efficient and probably accurate) software packages that we can already solve problems with. If you can train the model to use those and run the software in an isolated sandbox rather than trying to do it itself, then we don’t need to blow space in the LLM on the capabilities there, shrink it.

If we reduce the memory requirements enough to solve a lot of problems that people want with a much-smaller amount of memory, or with a much-less-densely-connected set of neural networks, the hardware that people care about may radically change. In early 2026, the most-in-demand hardware is hugely-power-hungry parallel processors with immense amounts of memory directly connected to it. But maybe, in 2028, we figure out how to get models to use existing software packages designed for mostly-serial computation, and suddenly, what everyone is falling over themselves to get ahold of is more-traditional computer hardware. Maybe the neural net isn’t even where most of the computation is happening for most workloads.

Maybe the future is training a model to use a library of software and to write tiny, throwaway programs that run on completely different hardware optimized for this “model scratch computation” purposes, and mostly it consulting those.

Lot of unknowns there.

CameronDev@programming.dev · 10 hours ago

Agreed on all points, I think OP is hoping the bubble will burst, and the big players will have to unload their excess hardware all at once. But I don’t think that’s likely tbh.

stoy@lemmy.zip · 12 hours ago

I sadly believe you are right, there is probably a clause in the contract between manufacturers and AI companies stating that the the chips can’t be used outside of their intended purpose.

I envision a possible similar situation as with HDDs after the Tsunami, if AI companies goes bust after chips has been manufacturerd, who will use them?

They are made for specs used in AI data centers, that doesn’t mean that they are good for general purpose.

I could see the stocks being sold for cheap to low cost memory module manufacturers, producing some weird and possibly failure prone memory modules.

cecilkorik@lemmy.ca · 12 hours ago

Nvidia P40s with 24GB VRAM are relatively cheap for what they are and are available in bulk. They have no video output, no cooling (you can 3d print a duct for fans or probably buy one, or just lower the power limits and run them semi-passively until they start throttling)

If you want to putter around a bit with machine learning technologies (I refuse to call it “AI” because it’s not) they’re useful and reasonably capable tools, although far from the fastest compared to what’s out there nowadays.

tal@lemmy.today · edit-2 12 hours ago

I posted in a thread a bit back about this, but I can’t find it right now, annoyingly.

You can use the memory on GPUs as swap, though on Linux, that’s currently through FUSE — going through userspace — and probably not terribly efficient.

https://wiki.archlinux.org/title/Swap_on_video_RAM

Linux apparently can use it via HMM: the memory will show up as system memory.

https://www.kernel.org/doc/html/latest/mm/hmm.html

Provide infrastructure and helpers to integrate non-conventional memory (device memory like GPU on board memory) into regular kernel path

It will have higher latency due to the PCI bus. It sounds like it basically uses main memory as a cache, and all attempts to directly access a page on the device trigger an MMU page fault:

Note that any CPU access to a device page triggers a page fault and a migration back to main memory. For example, when a page backing a given CPU address A is migrated from a main memory page to a device page, then any CPU access to address A triggers a page fault and initiates a migration back to main memory.

I don’t know how efficiently Linux deals with this for various workloads; if it can accurately predict the next access, it might be able to pre-request pages and do this pretty quickly. That is, it’s not that the throughput is so bad, but the latency is, so you’d want to mitigate that where possible. There are going to be some workloads for which that’s impossible: an example case would be just allocating a ton of memory, and then accessing random pages. The kernel can’t mitigate the PCI latency in that case.

There’s someone who wrote a driver to do this for old Nvidia cards, something that starts with a “P”, that I also can’t find at the moment, which I thought was the only place where it worked, but it sounds like it can also be done on newer Nvidia and AMD hardware. Haven’t dug into it, but I’m sure that it’d be possible.

A second problem with using a card as swap is going to be that a Blackwell card uses extreme amounts of power, enough to overload a typical consumer desktop PSU. That presumably only has to be used if you’re using the compute hardware, which you wouldn’t if you’re just moving memory around. I mean, existing GPUs normally use much less power than they do when crunching numbers. But if you’re running a GPU on a PSU that cannot actually provide enough power for it running at full blast, you have to be sure that you never actually power up that hardware.

EDIT: For an H200 (141 GB memory):

https://www.techpowerup.com/gpu-specs/h200-nvl.c4254

TDP: 600 W

EDIT2: Just to drive home the power issue:

https://www.financialcontent.com/article/tokenring-2025-12-30-the-great-chill-how-nvidias-1000w-blackwell-and-rubin-chips-ended-the-era-of-air-cooled-data-centers

NVIDIA’s Blackwell B200 GPUs, which became the industry standard earlier this year, operate at a TDP of 1,200W, while the GB200 Superchip modules—combining two Blackwell GPUs with a Grace CPU—demand a staggering 2,700W per unit. However, it is the Rubin architecture, slated for broader rollout in 2026 but already being integrated into early-access “AI Factories,” that has truly broken the thermal ceiling. Rubin chips are reaching 1,800W to 2,300W, with the “Ultra” variants projected to hit 3,600W.

A standard 120V, 15A US household circuit can only handle 1,800W, even if you keep it fully loaded. Even if you get a PSU capable of doing that and dedicate the entire household circuit to that, beyond that, you’re talking something like multiple PSUs on independent circuits or 240V service or something like that.

I have a space heater in my bedroom that can do either 400W or 800W.

So one question, if one wants to use the card for more memory, is going to be what the ceiling on power usage that you can ensure that those cards will use while most of their on-board hardware is idle.

Godort@lemmy.ca · edit-2 13 hours ago

Hard to say,

The GPUs will slot in just fine, those run on the same PCIe slots.

Disks are a maybe. Getting U.2 and SAS interfaces into consumer hardware is challenging, but some of those machines might have some regular M.2 or SATA disks

RAM is also a maybe. Most server RAM will be ECC and a lot of consumer motherboards and processors simply are not compatible, however, some are.

empireOfLove2@lemmy.dbzer0.com · 13 hours ago

The GPUs will slot in just fine, those run on the same PCIe slots.

Not anymore. The GPGPU parallel compute chips being pushed by Nvidia and occupying most fresh datacenter buildout space are bespoke hardware requiring it’s own custom mainboards. PCIe is falling by the wayside.

bizarroland@lemmy.world · 12 hours ago

You can already get SXM2 adapters and external boards that run to PCI-E X16 slots.

I wouldn’t say it’s the smartest way to blow a thousand dollars, but you can add a few of the older, like, 2019 models to a computer for about a thousand dollars already.

SXM4, I believe, has also been cracked, but it’s a lot more expensive, and I’m sure SXM5 will not be too far behind. The main difference between the PCI-E and SXM models as far as I’m aware, other than their interconnect and the built-in tethering between the multiple GPUs, is that they are able to run on 48 volt power, which means that the amperage running through the wires is much lower, and you’re less likely to cause everything to burst into flames randomly.

jj4211@lemmy.world · 6 hours ago

Yes, they connect by PCIe and thus the physical mismatch may be overcome, but they also are now drawing 15kw. More wattage than any circuit in my residential breaker box can handle.

Even if you did, there’s not even a whiff of driving circuitry for a video port, so your only application would be local models, and if the bubble bursts, well that would seem to indicate that use case would be not that popular.

No I would expect that these systems get rented out of sold to supercomputer concerns for super cheap if a bubble pop should occur.