AI’s Memorization Crisis | Large language models don’t “learn”—they copy. And that could change everything for the tech industry.

silence7@slrpnk.net · 2 days ago

AI’s Memorization Crisis | Large language models don’t “learn”—they copy. And that could change everything for the tech industry.

leftzero@lemmy.dbzer0.com · edit-2 2 days ago

The images on the article clearly show that they’re not storing the data, they’re storing enough information about the data to reconstruct a rough and mostly useless approximation of the data (and they do so in such a way that the information about one piece of data can be combined with the information about another one to produce another rough and mostly useless approximation of a combination of those two pieces of data, which was not in the original dataset).

It’s like playing a telephone game with a description of an image, with the last person drawing the result.

The legal and ethical failure is in commercially using artists’ works (as a training model) without permission, not in storing or even reproducing them, since the slop they produce is evidently an approximation and not the real thing.

TheBlackLounge@lemmy.zip · edit-2 1 day ago

The law disagrees. Compression has never been a valid argument. A crunchy 360p rip of a movie is a mostly useless approximation but sharing it is definitely illegal.

Fun fact, you can use mpeg for a very decent perceptual image comparison algorithm (eg for facial recognition) , by using the file size of a two-frame video. This works mostly for the same theoretical reasons as neural network based methods. Of course, mpeg was built by humans using legally obtained videos for evaluation, but it does so without being able to reproduce any of those at all. So that’s not a requirement for compression.