I’m not sure there’s many options other than cynical doomerism for analysing this situation. My uneducated guess? They probably already ran out of real world data, and are now forced to produce ridiculous amounts of LLM-generated data to try and continue the training process like this.
Other alternatives I can think of:
they might be creating multiple model versions and keeping them for iteration metrics
they might need to ingest a lot more real-world data to continue. Since video has been such a focus as of late, maybe they’re building huge video libraries for the models? Or maybe they’re creating their own real-world data with high detail.
my most doomer take is that this is the beginning of a vastly deeper authoritarian online state, where a LOT more data is getting collected from EVERYONE and being fed into both new training data for models, as well as knowledge bases for context for models to work on top of.
We’ve known for a while that they’re running out of training data, so it makes a lot of sense to either generate more data with models, create it at big scale without AI, or collect even more data in even more invasive ways from everyone online. There’s literally no other reason I can think of to buy the entire stock of WD drives for 2026 2 months into the year.
I’m not sure there’s many options other than cynical doomerism for analysing this situation. My uneducated guess? They probably already ran out of real world data, and are now forced to produce ridiculous amounts of LLM-generated data to try and continue the training process like this.
Other alternatives I can think of:
We’ve known for a while that they’re running out of training data, so it makes a lot of sense to either generate more data with models, create it at big scale without AI, or collect even more data in even more invasive ways from everyone online. There’s literally no other reason I can think of to buy the entire stock of WD drives for 2026 2 months into the year.