TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.

  • calcopiritus@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    8 hours ago

    Is there anything in the LLMs code preventing it from emitting copyrighted code? Nobody outside LLM companies know, but I’m willing to bet there isn’t.

    Therefore, LLMs DO emit copyrighted code. Due to them being trained on copyrighted code and the statistical nature of LLMs.

    Does the LLM tell its users that the code it outputted has copyright? I’m not aware of any instance of that happening. In fact, LLMs are probably programmed to not put a copyright header at the start of files, even if the code it “learnt” from had them. So in the literal sense, it is stripping the code of copyright notices.

    Does the justice system prosecute LLMs for outputting copyrighted code? No it doesn’t.

    I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

      Just what was stated in the fucking article

      By incorporating copyleft data into their models, the LLMs do share the work - but not alike. Instead, the AI strips the work of its provenance and transforms it to be copyright free.

      That’s bullshit.