These unnamed companies have collectively pre-purchased several exabytes of storage capacity
So… What’s the point of this? Even the openai ram purchase was bizarre to me as they’ve been able to train their LLMs without gobbling up all the ram just fine, then the ssds and now hard disks? It’s not even the write cache that sold out, the hard disks themselves are.
I don’t think there’s any untapped source of data for llm training is there? Datasets can’t get larger so I fail to see any reason this would be necessary.
Does anyone know the reason they’re buying up storage? No cynical doomerism please.
Even aside from training datasets, as long as they keep hosting chats for people using these genAIs their storage will keep expanding over time as people upload/generate more pictures and videos.
I really hope all this urgency on getting as much hardware as possible is because the ai companies see the funding drying up soon and are trying to maximize what they get out of it, but it’s probably more likely they’re just focusing more on videos now and want to have all of youtube (+etc) in their training sets
I’m not sure there’s many options other than cynical doomerism for analysing this situation. My uneducated guess? They probably already ran out of real world data, and are now forced to produce ridiculous amounts of LLM-generated data to try and continue the training process like this.
Other alternatives I can think of:
they might be creating multiple model versions and keeping them for iteration metrics
they might need to ingest a lot more real-world data to continue. Since video has been such a focus as of late, maybe they’re building huge video libraries for the models? Or maybe they’re creating their own real-world data with high detail.
my most doomer take is that this is the beginning of a vastly deeper authoritarian online state, where a LOT more data is getting collected from EVERYONE and being fed into both new training data for models, as well as knowledge bases for context for models to work on top of.
We’ve known for a while that they’re running out of training data, so it makes a lot of sense to either generate more data with models, create it at big scale without AI, or collect even more data in even more invasive ways from everyone online. There’s literally no other reason I can think of to buy the entire stock of WD drives for 2026 2 months into the year.
So… What’s the point of this? Even the openai ram purchase was bizarre to me as they’ve been able to train their LLMs without gobbling up all the ram just fine, then the ssds and now hard disks? It’s not even the write cache that sold out, the hard disks themselves are.
I don’t think there’s any untapped source of data for llm training is there? Datasets can’t get larger so I fail to see any reason this would be necessary.
Does anyone know the reason they’re buying up storage? No cynical doomerism please.
Even aside from training datasets, as long as they keep hosting chats for people using these genAIs their storage will keep expanding over time as people upload/generate more pictures and videos.
I really hope all this urgency on getting as much hardware as possible is because the ai companies see the funding drying up soon and are trying to maximize what they get out of it, but it’s probably more likely they’re just focusing more on videos now and want to have all of youtube (+etc) in their training sets
I’m not sure there’s many options other than cynical doomerism for analysing this situation. My uneducated guess? They probably already ran out of real world data, and are now forced to produce ridiculous amounts of LLM-generated data to try and continue the training process like this.
Other alternatives I can think of:
We’ve known for a while that they’re running out of training data, so it makes a lot of sense to either generate more data with models, create it at big scale without AI, or collect even more data in even more invasive ways from everyone online. There’s literally no other reason I can think of to buy the entire stock of WD drives for 2026 2 months into the year.