We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB). It’s the world’s first “preservation archive” for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.
I’ll strongly suggest to take out all the cheaply AI generated music from this “back up” and save themselves some space.
I’m not sure how they would go about doing that at scale without also getting some false positives and removing human music too
You could cut off your search around the time AI tracks started to appear. Not sure when that was, maybe 2023. You’d miss a lot of recent stuff, but you’d filter out a lot of spam too
I see your point, but as you say, there would still be the tradeoff of missing more recent stuff. That might only involve missing a couple of years’ worth of stuff now, but AI isn’t going away any time soon, so it would mean that there’d be an increasing amount of human made music not being archived; One of the things I like about Anna’s archive is that they seem to look at this problem as a long term, informational infrastructure kind of way, so I imagine they wouldn’t be keen on stopping the archive at 2023.
It seems they’ve opted for a different tradeoff instead: lower popularity songs are archived at a lower bitrate, and even the higher popularity stuff has some compression. Some archives go for quality, and thus prioritise high quality FLACs, so Anna’s archive are aiming to fulfill a different niche. I can respect that.
do you have any numbers on the AI share? I doubt it’s more than a 2%, so I assume you are just virtue signalling on a completely unrelated topic here :-)
AI slop can be made and distributed in ginourmous numbers. I wouldn’t be suprised if at least 3/4 of uploads from the past 2 years are AI.
See, 75% of output of 2 years vs 100 years of music production. Also popularity was factored in.
A bot could put 100 AI generated tracks on Spotify per hour. 50 bots doing the same is 120,000 tracks per day.
can you run me the numbers for 200 bots?
120,000 x 4 = 480,000
This is easy to do with a calculator.