so i yes, espeak exists and still sounds terrible even worse than picoTTS (last update 4 yrs ago?). so what else is there? i look at mimic3 and it says they are dead and one should go for piper here: https://github.com/MycroftAI/mimic3 the link to piper followed I get: https://github.com/rhasspy/piper "This repository was archived by the owner on Oct 6, 2025. It is now read-only. "

ok, so coqui? https://github.com/coqui-ai/TTS no update in over 12 months…how bad can it be? https://coqui.ai/ …great it is a page for gambling now.

so, what are you using? gTTS is not offline.

  • solrize@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 hour ago

    Tbh I generally prefer that robot voices actually sound like robots. I don’t want human sounding TTS since I don’t like anthropomorphized machines. So I’d stay with something low tech.

  • Konraddo@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    2 hours ago

    Try alltalk_tts v2. One of the features is you can provide an audio sample and the AI will imitate the voice. The overall quality is pretty good, if you choose a larger model and let it run.

  • NarrativeBear@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    2 hours ago

    Faster Whisper could be a option, there is various GUI options available as well.

    https://github.com/SYSTRAN/faster-whisper

    And if you are looking for something that you can “just install”, I recommend balabolka. The voices are natural and you can use some of the windows built in voices to make it more natural.

    https://www.cross-plus-a.com/balabolka.htm

    Make Azure natural TTS voices accessible to any SAPI 5-compatible application.

    https://github.com/gexgd0419/NaturalVoiceSAPIAdapter

  • damnthefilibuster@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    3 hours ago

    Kokoro is your best bet right now. It works wonderfully even in a docker container with no GPU. There are others but I don’t have the list right now. Will throw another update on here when I do.

    The rhasspy guy was very invested in Coqui. He built a lot of his own stuff, for his home automation and such. But Coqui was superior, so he started spending time on that.

    Unfortunately, the coqui team (based out of Mozilla) was very distracted and didn’t ship a lot of stuff on time or at all. It doesn’t even have basic stuff like SSML support right now, if I recall correctly. So the rhasspy guy also lost steam.

    Of course, with the OpenAI model of audio generation, you’re expected to not use SSML at all and just use the black box API to get “good enough” results. That really sucks.

    Oh, I just remembered which other one I wanted to mention - someone has built an open source version of NotebookLLM, complete with multi voice support. But it requires GPU, I believe. Do what you will with that. I’ll add a link if I find it.

    I prefer kokoro because it’s really solid and works really well on CPU.

  • Iced Raktajino@startrek.website
    link
    fedilink
    English
    arrow-up
    12
    ·
    5 hours ago

    https://github.com/marytts/marytts

    I’ve used MaryTTS semi-recently. It’s older but works well enough for my cases. I have it running on a server (locally) and my endpoints make a call to it and playback the returned audio file.

    On Android, I use SherpaTTS which has good voices, but I’m not aware of a desktop/Linux option. It mentions using voices from Coqui which you linked, so I would guess that would be the way to go for desktop.

  • eodur@piefed.social
    link
    fedilink
    English
    arrow-up
    3
    ·
    3 hours ago

    It really depends on what you want to do with it. I run wyoming-piper as part of my Home Assistant deployment and its been rock solid. The Wyoming protocol is pretty well documented too, so you should be able to integrate with it pretty easily.

  • handsoffmydata@lemmy.zip
    link
    fedilink
    English
    arrow-up
    5
    ·
    5 hours ago

    I highly recommend Mlx-audio for anyone doing tts on Apple Silicon. It offers great performance, leverages kokoro-82M, and plays well with streaming frontends like Open webui. The one shot voice cloning feature is also pretty cool.

    • Rhaedas@fedia.io
      link
      fedilink
      arrow-up
      4
      ·
      4 hours ago

      Kokoro was the one I was going to mention. I played around with it a bit, was very impressed with the speed and quality. And then I realized I had been using it in CPU mode. GPU is incredible.

  • RheumatoidArthritis@mander.xyz
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 hours ago

    Save that post for the next time when someone with too much time on their hands asks what project they should start/contribute to.

    • flux@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      I’m curious how could it be unclear that this post is about the TTS part? Espeak is provided as an example.