

Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.
This is false, by omission. Many of the AI companies have been downloading content through means other than scraping, such as bittorrent, to access and compile copyrighted data that is not publicly scrape-able. That includes Meta, OpenAI, and Google.
The day we ban ingress of copyrighted works into whatever TF people want is the day the Internet stops working.
That is also false. Just because you don’t understand the legal distinction between scraping content to summarize in order to direct people to a site (there was already a lawsuit against Google that established this, as well as its boundaries), versus scraping content to generate a replacement that obviates the original content, doesn’t mean the law doesn’t understand it.
My comment right here is copyrighted. So is yours! I didn’t ask your permission before my Lemmy client downloaded it. I don’t need to ask your permission to use your comment however TF I want until I distribute it. That’s how the law works. That’s how it’s always worked.
The DMCA also protects the sites that host Lemmy instances from copyright lawsuits. Because without that, they’d be guilty of distribution of copyrighted works without the owner’s permission every damned day.
And none of this matters, because AI companies aren’t just reading content, they’re taking it and using it for commercial purposes.
Perhaps you are unaware, but (at least in the US) while it is legal for you to view a video on YouTube, if you download it for offline use that would constitute copyright infringement if the owner objects. The video being public does not grant anyone and everyone the right to use it however they wish. Ditto for something like making an mp3 of a song on Spotify using Audacity.
People who hate AI are supporting an argument that the movie and music studios made in the 90s: That “downloading is theft.” It is not! In fact, because that is not theft, we’re all able to enjoy the Internet every day.
First off, I do not hate AI, I use it myself (locally-run). My issue is with AI companies using it to generate profit at the expense of the actual creators whose art AI companies are trying to replace (i.e. not directing people to it, like search results).
Secondly, no one is arguing that it is theft, they are arguing that it is copyright infringement, which is what all of us are also subject to under the DMCA. So we’re actually arguing that AI companies should be held to the same standard that we are.
Also, note that AI companies have argued in court (in the case brought by Steven King et al) that their use of copyrighted material shouldn’t fall under DMCA at all (i.e. arguing that it’s not about Fair Use), because their argument is that AI training is not the ‘intended use’ of the source material, so this is not eating into that commercial use. That argument leaves copyright infringement liability intact for the rest of us, while solely exempting them from liability. No thanks.
Luckily, them arguing they’re apart and separate from Fair Use also means that this can be rejected without affecting Fair Use! Double-win!








Might have to break this into a couple replies. because this is a LOT to work through.
Meta is being sued by several groups over this, including porn companies who caught them torrenting. Their defense has been to claim that the 2,400 videos downloaded to their corporate IP space was done for “personal use”.
OpenAI is also being accused of pirating books (not scraping), and it has been unable to prove legal procurement of them.
Interestingly, it’s actually Meta’s most recent partial win that explicitly helps disproves this. Apart from just generally ripping into Meta for clearly infringing copyright, the judge wrote (page 3)
So yes, Fair Use absolutely does take into account market harms.
I never asserted this, and I am well aware of the distinction between the copyright infringement which involved the illegal obtainment of copyrighted material, and the AI training. You seem to be bringing a whole host of objections you get from others and applying them to me.
I think it’s perfectly reasonable to require that AI companies legally acquire a copy of any copyrighted material. Just as it would not be legal for me to torrent a movie even if I wanted to do something transformative with it, AI companies should not be able to do so either.