My understanding is that they’re fundamentally different - HBM is a 3D printed stack of memory with a massive data bus, and has to live on-chip (like, millimeters away from the processor).
DDR5 has 2x32 bit channels. HBM4 has a massive 2,048 bits bus. There’s absolutely no way to run the number of traces it would need through a motherboard…
HBM4 has a massive 2,048 bits bus. There’s absolutely no way to run the number of traces it would need through a motherboard
Presumably you would put a chip next to it to ‘downmix’. Just saying, if there’s enough of it floating around unused, a way will be found. Chip wouldn’t need to do much, and now we’re back to normal traces, bit of a waste of potential bandwidth, but better than losing GP compute for the masses.
Could always do something like stick a Blackwell next to it and pop it on the PCIe bus and do a kernel mapping ;) (yes that’s a video card, or AI accelerator, that’s the joke, but also…)
My understanding is that they’re fundamentally different - HBM is a 3D printed stack of memory with a massive data bus, and has to live on-chip (like, millimeters away from the processor).
DDR5 has 2x32 bit channels. HBM4 has a massive 2,048 bits bus. There’s absolutely no way to run the number of traces it would need through a motherboard…
Presumably you would put a chip next to it to ‘downmix’. Just saying, if there’s enough of it floating around unused, a way will be found. Chip wouldn’t need to do much, and now we’re back to normal traces, bit of a waste of potential bandwidth, but better than losing GP compute for the masses.
Could always do something like stick a Blackwell next to it and pop it on the PCIe bus and do a kernel mapping ;) (yes that’s a video card, or AI accelerator, that’s the joke, but also…)