The venerable eSpeak is a mainstay of Linux distributions. It is a clever Text-To-Speech (TTS) program which will read aloud the written word using a phenomenally wide variety of languages and accents. The only problem is that it sounds robotic. It has the same vocal fidelity as a 1980s Speak ‘n’ Spell toy. Monotonous, clipped, and painful to listen to. For some people, this is a feature, not a…



Piper is VERY lightweight. Kinda like espeak. I got it working on a pi 3 once. And its good enough for my phone.
Theres more human spunding stuff but they use very intensive modeling.
I just listened to the samples and it seems a bit hit-or-miss. Some of them still stumble over words, have stilted pacing, or just sound off in some other way (raspy-ness, speed). It seems to vary more voice-to-voice than by the quality setting.
I mean I’m sure some of these voices are fine and probably better than other AI models in terms of performance… though they are a bit uncanny valley and I still think a voice meant to sound robotic (while still having personality) is probably an easier target. I didn’t notice anything like that in the samples, though I did see a couple of YT videos with a GlaDOS voice (sounding fairly accurate) that mention Piper (though I know such a thing likely wouldn’t be front-and-center due to licensing).