Father, Hacker (Information Security Professional), Open Source Software Developer, Inventor, and 3D printing enthusiast

  • 1 Post
  • 56 Comments
Joined 2 years ago
cake
Cake day: June 23rd, 2023

help-circle

  • To be fair, that’s what an AI video generator thinks an FPS is. That’s not the same thing as AI-assisted coding. Though it’s still hilarious! “Press F to pay respects” 🤣

    For reference, using AI to automate your QA isn’t a bad idea. There’s a bunch of ways to handle such things but one of the more interesting ones is to pit AIs against each other. Not in the game, but in their reports… You tell AI to perform some action and generate a report about it while telling another AI to be extremely skeptical about the first AI’s reports and to reject anything that doesn’t meet some minimum standard.

    That’s what they’re doing over at Anthropic (internally) with Claude Code QA tasks and it’s super fascinating! Heard them talk about that setup on a podcast recently and it kinda blew my mind… They have more than just two “Claudes” pitted against each other too: In the example they talked about, they had four: One generating PRs, another reviewing/running tests, another one checking the work of the testing Claude, and finally a Claude setup to perform critical security reviews of the final PRs.






  • Many of the AI companies have been downloading content through means other than scraping, such as bittorrent, to access and compile copyrighted data that is not publicly scrape-able. That includes Meta, OpenAI, and Google.

    Anthropic is the only company to have admitted publicly to doing this. They were sued and settled out of court. Google and OpenAI have had no such accusations as far as I’m aware. Furthermore, Google had the gigantic book scanning project where it was determined in court that the act of scanning as many fucking books as you want is perfectly legal (fair use). Read all about it: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

    In late 2013, after the class action status was challenged, the District Court granted summary judgment in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court’s summary judgment in October 2015, ruling Google’s “project provides a public service without violating intellectual property law.” The U.S. Supreme Court subsequently denied a petition to hear the case.

    You say:

    That is also false. Just because you don’t understand the legal distinction between scraping content to summarize in order to direct people to a site (there was already a lawsuit against Google that established this, as well as its boundaries), versus scraping content to generate a replacement that obviates the original content, doesn’t mean the law doesn’t understand it.

    There is no such legal distinction. Scraping content is legal no matter WTF you plan to do with it. This has been settled in court many, many times. Here’s some court cases for you to learn the actual legality of scraping and storing of said scraped data:

    To summarize all this: You are 100% wrong. I have cited my sources. I was there (“3000 years ago…”) when all this went down. Pepperidge Farm remembers.

    You say:

    And none of this matters, because AI companies aren’t just reading content, they’re taking it and using it for commercial purposes.

    This is a common misconception of copyright law: Remember Napster? They were sued and argued in court that because users don’t profit from sharing songs with their friends, it is legal. The court rejected this argument: https://en.wikipedia.org/wiki/A%26M_Records,_Inc._v._Napster,_Inc. See also: https://en.wikipedia.org/wiki/Capitol_Records,_Inc._v._Thomas-Rasset and https://en.wikipedia.org/wiki/Harper_%26_Row_v._Nation_Enterprises and https://en.wikipedia.org/wiki/American_Geophysical_Union_v._Texaco,_Inc. where the courts all ruled the same way.

    You say:

    Perhaps you are unaware, but (at least in the US) while it is legal for you to view a video on YouTube, if you download it for offline use that would constitute copyright infringement if the owner objects. The video being public does not grant anyone and everyone the right to use it however they wish. Ditto for something like making an mp3 of a song on Spotify using Audacity.

    Downloading a Youtube video for offline use is legal… Depending on the purpose. This is one of those very, very nuanced areas of copyright law where fair use intersects with the DMCA and also intersects with the CFAA. The DMCA states, “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” Since Youtube videos have some technical measures to prevent copying (depending on the resolution and platform!), it is illegal to circumvent them. However, The Librarian of Congress can grant exceptions to this rule and has done so for many situations. For example, archiving (https://www.arl.org/news/librarian-of-congress-expands-dmca-exemption-for-text-and-data-mining/) which is just plain wacky, IMHO.

    Regardless, if Youtube didn’t put an anti-circumvention mechanism into their videos it would be perfectly legal to download the videos. Just like it’s legal to record TV shows with a VCR. This was ruled in Sony Corp. of America v. Universal City Studios (already cited). There’s no reason why it wouldn’t still apply to Youtube videos. The fact that no one has been sued for doing this since then (that I could find) seems to indicate that this is a very settled thing.

    You say:

    no one is arguing that it is theft, they are arguing that it is copyright infringement, which is what all of us are also subject to under the DMCA. So we’re actually arguing that AI companies should be held to the same standard that we are.

    No. Fuck no. A shittton of people are saying it’s “theft”. Have you been on the Internet recently? LOL! I see it every damned day and I’m sick of it. I repeat myself that, “it’s not theft, it’s copyright infringement” and I get downvoted for “being pedantic”. Like it’s not a very fucking important distinction!

    …but also: What an AI model does isn’t copyright infringement (usually). You ask it to generate an image or some text and it just does what you ask it to do. The fact that it’s possible for it to infringe copyright shouldn’t matter because it’s just a tool like a Xerox machine/copier. It has already been ruled fair use for an AI company to train their models with copyrighted works (great summary of that here: https://www.debevoise.com/insights/publications/2025/06/anthropic-and-meta-decisions-on-fair-use ). Despite these TWO court rulings, people are still saying that training AI models is both “theft” and somehow “illegal”. We’re already past that.

    AI models are terrible copyright violators! Everything they generate—at best—can only ever be, “kinda sorta like” a copyrighted work. You can get closer and closer if you get clever with prompts and tell the model to generate say, 10000 images of the same thing. Then you can look at your prayers to the RNG gods and say, “Aha! Look! This image looks very very similar to Indiana Jones!”

    You say:

    Also, note that AI companies have argued in court (in the case brought by Steven King et al) that their use of copyrighted material shouldn’t fall under DMCA at all (i.e. arguing that it’s not about Fair Use), because their argument is that AI training is not the ‘intended use’ of the source material, so this is not eating into that commercial use. That argument leaves copyright infringement liability intact for the rest of us, while solely exempting them from liability. No thanks.

    Luckily, them arguing they’re apart and separate from Fair Use also means that this can be rejected without affecting Fair Use! Double-win!

    Where TF did you see this? I did some searching and I cannot see anything suggesting that the AI companies have rejected any kind of DMCA protection.


  • If you believe AI companies should NOT be allowed to train AI with copyrighted works you should stop using Internet search engines. Because the same rules that allow Google to train their search with everyone’s copyrighted websites are what allow the AI companies to train their models.

    Every day, Google and others download huge swaths of the Internet directly into their servers and nobody bats an eye. An AI company does the same thing and now people say that’s copyright infringement.

    What the fuck! I don’t get it. It’s the exact same thing. Why is an AI company doing that any different‽

    It’d be one thing if people were bitching about just the output of AI models but they’re not. They’re bitching about the ingress step!

    The day we ban ingress of copyrighted works into whatever TF people want is the day the Internet stops working.

    My comment right here is copyrighted. So is yours! I didn’t ask your permission before my Lemmy client downloaded it. I don’t need to ask your permission to use your comment however TF I want until I distribute it. That’s how the law works. That’s how it’s always worked.

    The DMCA also protects the sites that host Lemmy instances from copyright lawsuits. Because without that, they’d be guilty of distribution of copyrighted works without the owner’s permission every damned day.

    People who hate AI are supporting an argument that the movie and music studios made in the 90s: That “downloading is theft.” It is not! In fact, because that is not theft, we’re all able to enjoy the Internet every day.

    Ever since the Berne convention, literally everything is copyrighted. Everything.







  • Imagine you have a magic box that can generate any video you want. Some people ask it to generate fan fiction-like videos, some ask it to generate meme-like videos, and a whole lot of people ask it to generate porn.

    Then there’s a few people that ask it to generate videos using trademarked and copyrighted stuff. It does what the user asks because there’s no way for it to know what is and isn’t copyrighted. What is and isn’t parody or protected fair use.

    It’s just a magic box that generates videos… Whatever the human asks for.

    This makes some people and companies very, very upset. They sue the maker of the magic box, saying it’s copying their works. They start PR campaigns, painting the magic box in a bad light. They might even use the magic box quite a lot themselves but it doesn’t matter. To them, the magic box is pure evil; indirectly preventing them from gaining more profit… Somehow. Just like Sony was sued for making a machine that let people copy whatever videos they wanted (https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios%2C_Inc.).

    Before long, other companies make their own magic boxes and then, every day people get access to their own, personal magic boxes that no one can see the output from unless they share.

    Why is this different from the Sony vs Universal situation? The AI magic box is actually worse at copying videos than a VCR.

    When a person copies—and then distributes—a movie do we say the maker of the VCR/DVD burner/computer is at fault for allowing this to happen? No. It’s the person that distributed the copyrighted work.






  • For reference, every AI image model uses ImageNET (as far as I know) which is just a big database of publicly accessible URLs and metadata (classification info like, “bird” <coordinates in the image>).

    The “big AI” companies like Meta, Google, and OpenAI/Microsoft have access to additional image data sets that are 100% proprietary. But what’s interesting is that the image models that are constructed from just ImageNET (and other open sources) are better! They’re superior in just about every way!

    Compare what you get from say, ChatGPT (DALL-E 3) with a FLUX model you can download from civit.ai… you’ll get such superior results it’s like night and day! Not only that, but you have an enormous plethora of LoRAs to choose from to get exactly the type of image you want.

    What we’re missing is the same sort of open data sets for LLMs. Universities have access to some stuff but even that is licensed.


  • Listen, if someone gets physical access to a device in your home that’s connected to your wifi all bets are off. Having a password to gain access via adb is irrelevant. The attack scenario you describe is absurd: If someone’s in a celebrity’s home they’re not going to go after the robot vacuum when the thermostat, tablets, computers, TV, router, access point, etc are right there.

    If they’re physically in the home, they’ve already been compromised. The fact that the owner of a device can open it up and gain root is irrelevant.

    Furthermore, since they have root they can add a password themselves! Something they can’t do with a lot of other things in their home that they supposedly “own” but don’t have that power (but I’m 100% certain have vulnerabilities).