AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

yoasif@fedia.io · 26 days ago

AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

calcopiritus@lemmy.world · 22 days ago

No you can’t. In the same way you can’t watch a Mickey mouse movie and then draw your own Mickey mouse from what you recall from the movie.

Copying can be done manually by memory, it doesn’t need to be a 1:1 match. Otherwise you could take a GPL licensed file, change the name of 1 variable, and make it proprietary code.

LLMs are just fancy lossy compression algorithms you can interact with. If I save a Netflix series in my hard drive, then re encode it, it is still protected by copyright, even if the bytes don’t match.

atzanteol@sh.itjust.works · 22 days ago

No you can’t. In the same way you can’t watch a Mickey mouse movie and then draw your own Mickey mouse from what you recall from the movie

Yes, I can. I can create a legally distinct mouse-bases cartoon.

You’re right that if an llm gives you copyrighted code that it would be a potential problem. But the article saying that it somehow “strips the code of any copyright” is ridiculous.

calcopiritus@lemmy.world · 21 days ago

Is there anything in the LLMs code preventing it from emitting copyrighted code? Nobody outside LLM companies know, but I’m willing to bet there isn’t.

Therefore, LLMs DO emit copyrighted code. Due to them being trained on copyrighted code and the statistical nature of LLMs.

Does the LLM tell its users that the code it outputted has copyright? I’m not aware of any instance of that happening. In fact, LLMs are probably programmed to not put a copyright header at the start of files, even if the code it “learnt” from had them. So in the literal sense, it is stripping the code of copyright notices.

Does the justice system prosecute LLMs for outputting copyrighted code? No it doesn’t.

I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

atzanteol@sh.itjust.works · 21 days ago

I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

Just what was stated in the fucking article

By incorporating copyleft data into their models, the LLMs do share the work - but not alike. Instead, the AI strips the work of its provenance and transforms it to be copyright free.

That’s bullshit.