TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.

  • yoasif@fedia.ioOP
    link
    fedilink
    arrow-up
    24
    ·
    5 days ago

    Do you understand how free software works? Did you read the post? I’d love to clarify, but I’m not going to rewrite the article.

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 days ago

      Yes. And this is kinda hand-wavy bullshit.

      By incorporating copyleft data into their models, the LLMs do share the work - but not alike. Instead, the AI strips the work of its provenance and transforms it to be copyright free.

      That’s not how it works. Your code is not “incorporated” into the model in any recognizable form. It trains a model of vectors. There isn’t a file with your for loop in there though.

      I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license. So can an LLM.

      • calcopiritus@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        23 hours ago

        No you can’t. In the same way you can’t watch a Mickey mouse movie and then draw your own Mickey mouse from what you recall from the movie.

        Copying can be done manually by memory, it doesn’t need to be a 1:1 match. Otherwise you could take a GPL licensed file, change the name of 1 variable, and make it proprietary code.

        LLMs are just fancy lossy compression algorithms you can interact with. If I save a Netflix series in my hard drive, then re encode it, it is still protected by copyright, even if the bytes don’t match.

        • atzanteol@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 hours ago

          No you can’t. In the same way you can’t watch a Mickey mouse movie and then draw your own Mickey mouse from what you recall from the movie

          Yes, I can. I can create a legally distinct mouse-bases cartoon.

          You’re right that if an llm gives you copyrighted code that it would be a potential problem. But the article saying that it somehow “strips the code of any copyright” is ridiculous.

          • calcopiritus@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            6 hours ago

            Is there anything in the LLMs code preventing it from emitting copyrighted code? Nobody outside LLM companies know, but I’m willing to bet there isn’t.

            Therefore, LLMs DO emit copyrighted code. Due to them being trained on copyrighted code and the statistical nature of LLMs.

            Does the LLM tell its users that the code it outputted has copyright? I’m not aware of any instance of that happening. In fact, LLMs are probably programmed to not put a copyright header at the start of files, even if the code it “learnt” from had them. So in the literal sense, it is stripping the code of copyright notices.

            Does the justice system prosecute LLMs for outputting copyrighted code? No it doesn’t.

            I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

            • atzanteol@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              25 minutes ago

              I don’t know what definition you use for “strip X of copyright” but I’d say if you can copy something openly and nobody does anything against it, you are stripping it’s copyright.

              Just what was stated in the fucking article

              By incorporating copyleft data into their models, the LLMs do share the work - but not alike. Instead, the AI strips the work of its provenance and transforms it to be copyright free.

              That’s bullshit.

      • yoasif@fedia.ioOP
        link
        fedilink
        arrow-up
        3
        ·
        4 days ago

        I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license.

        Why is Clean-room design a thing then?

        • atzanteol@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 days ago

          create my own code with the knowledge gained from your code

          Not copy your code. Use it to learn what algorithms it uses and ideas on how to implement it.

      • VoterFrog@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        4 days ago

        I can read your code, learn from it, and create my own code with the knowledge gained from your code without violating an OSS license. So can an LLM.

        Not even just an OSS license. No license backed by law is any stronger than copyright. And you are allowed to learn from or statistically analyze even fully copyrighted work.

        Copyright is just a lot more permissive than I think many people realize. And there’s a lot of good that comes from that. It’s enabled things like API emulation and reverse engineering and being able to leave our programming job to go work somewhere else without getting sued.

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 days ago

      Also - this conclusion is ridiculous:

      By incorporating copyleft data into their models, the LLMs do share the work - but not alike. Instead, the AI strips the work of its provenance and transforms it to be copyright free.

      That is absolutely not true. It doesn’t remove the copyright from the original work and no court has ruled as such.

      If I wrote a “random code generator” that just happened to create the source code for Microsoft Windows in entirety it wouldn’t strip Microsoft of its copyright.