Edit to add: I also found someone who recorded a voice chat of the same thing. This isn’t that someone uploaded a song, or that AI didn’t actually process the file. These models really are this sycophantic:
Edit to add: I also found someone who recorded a voice chat of the same thing. This isn’t that someone uploaded a song, or that AI didn’t actually process the file. These models really are this sycophantic:
I wonder if they actually linked it to an algorithm that analyses the sound that is looking for certain patterns or something, and that is why you get the “atmosphere piece” thing.
According to their tech/marketing papers, it’s supposedly multi-modal, encoding audio to tokens.
Jesus, what a complety fucking useless waste of resources
yeah but that doesn’t mean anything, does it? I don’t think they just tokenize the raw audio, that wouldn’t make sense, right?
I mean, you could. Just encode 100ms chunks or whatever into tokens then push them through the same model. I’m pretty sure that’s what the claim to do (though with MoE/routing now, maybe).