Are there currently usable FOSS tools for speech to text conversion (transcription) available under GNU/Linux? Purpose is transcribing stuff like downloaded podcasts. I don’t need or want any kind of GUI tool. Just a CLI program that takes an audio file and converts it to text. I know there are various proprietary systems that do this, such as youtube transcription. One of my questions is whether the free stuff that’s out there is anywhere near as good. I’m not too concerned about the input format (I can convert with ffmpeg), or about CPU time within reason (I don’t mind letting my server spend all night crunching a 1 hour audio). I’d prefer to not require a GPU but if that helps a lot, I can get hold one of one as needed.
Question is about speech to text (STT). I’m not asking about the opposite, text-to-speech (TTS). For some reason people often confuse the two of these.
Thanks!


On my potato powered laptop (mid range thinkpad from 2018) it does not run in real time on the CPU. Particularly if you want to use a decent model, which is needed for my foreign accent.
I would say that quality generally exceeds YouTube, even with the worst model.
Thanks. My old i5-something server is probably in the same speed range as your laptop. It’s good to hear about the transcription quality. If conversion is slower than real time, I can live with it. I can just throw a bunch of files at it and let it run overnight. Faster is always nicer of course.