• mindbleach@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    9 hours ago

    You think these billion-dollar companies keep hyper-illegal images around, just to train their hideously expensive models to do the things they do not want those models to do?

    Like combining unrelated concepts isn’t the whole fucking point?

    • CerebralHawks@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 hours ago

      Yes and they’ve been proven to do so. Meta (Facebook) recently made the news for pirating a bunch of ebooks to train its AI.

      Anna’s Archive, a site associated with training AI, recently scraped some 99.9% of Spotify songs. They say at some point they will make torrents so the common people can download it, but for now they’re using it to teach AI to copy music. (Note: Spotify uses lower quality than other music currently available, so AA will offer nothing new if/when they ever do release these torrents.)

      So, yes, that is exactly what they’re doing. They are training their models on all the data, not just all the legal data.

    • mcv@lemmy.zip
      link
      fedilink
      English
      arrow-up
      9
      ·
      7 hours ago

      No, I think these billion dollar companies are incredibly sloppy about curating the content they steal to train their systems on.

    • stray@pawb.social
      link
      fedilink
      English
      arrow-up
      4
      ·
      8 hours ago

      It literally can’t combine unrelated concepts though. Not too long ago there was the issue where one (Dall-E?) couldn’t make a picture of a full glass of wine because every glass of wine it had been trained on was half full, because that’s generally how we prefer to photograph wine. It has no concept of “full” the way actual intelligences do, so it couldn’t connect the dots. It had to be trained on actual full glasses of wine to gain the ability to produce them itself.