Kevin William David

Fish Audio S2 - Real Expressive AI Voices

byβ€’
We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.

Add a comment

Replies

Best
Lien Chueh

Really cool how fast it can be to clone my voice. Should I be giving it multiple recordings at different emotions so that it has a better register of what I sound like?

Kyle Cui

@lienchuehΒ You absolutely can! Just ten seconds of high quality audio recording of your voice with a good mic will take you most of the way there though. With the new open domain emotion tags you can direct emotions in the speech with precision.

Michael Pohl

Just found fish audio this year and was surprised about the API and the S1 model. Well, the S2 is now absolutely mind-blowing. Great work!

Kyle Cui

@michael_pohlΒ Awesome to hear Michael, thank you!

Yotam Dahan

As a content creator - I've been looking for a product like this for a long time! Hope it'll match my expectations.

Helena

@yotam_dahanΒ i think fish s2 would be the best for content creators! excited for you to try it, let us know what you think :)

Christian
Amazing stuff. Congrats to your launch πŸ‘πŸ½
Kyle Cui

@christian73Β Thank you so much Christian!

Weizhi Li

Fish Audio is hands down one of the most impressive TTS tools I've come across. I fed it a short clip and the output genuinely sounded like me. You can make your cloned voice whisper, laugh, get excited β€” it's funny and a little surreal hearing yourself say things in ways you never actually did . Can't believe this is open source. Great stuff, keep it up!

Fraser

As someone who used to lead a team that created dozens of voice overs for different market, these tools are a game-changer.

Elvis Bueno

Congrats on the launch! πŸŽ‰

The focus on emotion and nuance in TTS is really interesting. A lot of voice models sound technically good but still feel a bit flat, so the idea of capturing rhythm and speaking habits is compelling.

Also impressive that voice cloning works with just ~10 seconds of audio. Curious how you’re handling consent and voice ownership safeguards as this gets adopted more widely?

Wood Peng

Congrats on this launch!

sean

Fish Audio has outstanding technical strength. Their voice synthesis is natural, expressive, and highly stable, showing both strong research capability and excellent engineering execution.