Kevin William David

Fish Audio S2 - Real Expressive AI Voices

We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.

Add a comment

Replies

Best
Helena

Hi our beloved PH!
[excited] [slightly nervous]

Today we’re launching Fish Audio S2, our new text-to-speech model.

[long pause]

Hear Fish S2 Read This!

This is a big step beyond S1, redefining expressive voice AI. Write emotion cues anywhere in the text and hear the speech flow exactly how [emphasis] YOU direct it.

And, [inhale] we’re open-sourcing all of it.

GitHub: https://github.com/fishaudio/fish-speech/
HuggingFace: https://huggingface.co/fishaudio/s2-pro/

Shout out to SGLang for powering our stack.

There’s much more to S2.
Try it yourself now: https://fish.audio/s2/

As always, we want to give back to the community. For the launch, we’re offering free generation credits and an exclusive 50% OFF promo code: PH-FishS2

Go build weird things with it :)

We’d love to hear what you make.

Rissa Cao

@hehe6z incredibly proud of this one, amazing job team!

Helena

@rissa_cao teamwork 👾

CY

@hehe6z Congrats on the launch! Curious how you see Fish Audio compared with ElevenLabs — what do you think are the biggest advantages or differences today?

Rajnish Kumar Dubey

@hehe6z best ai tool i ever found for my work

Javi Fandos

Can I use this in a raspberri pi voice assistant that I have at home?
What abour the voice cloning to use it in phone calls?
eleven labs is not that good.. ( or I dont know how to set it up)

Helena

@javierfandos Hi Javi, this is a great point - yes you absolutely can! For example home-assistant has direct fish audio support, you can check out the deets here: https://www.home-assistant.io/integrations/fish_audio/. Voice cloning is also one of the flagship features our users love because of the extreme realism :)

Javi Fandos

I'm lauching something soon! I need to find somenthing! Will take a look! dankeee

Helena

@javierfandos that's awesome looking forward to your launch!!

Javi Fandos

@hehe6z WOW! just cloned my voice. its actually better than eleven labs!

Kelly An

big fish audio fans for a long time, been witness the team always go above and beyond. let's gooooo s2! congrats on this launch

Helena

@kellyann3644 Thank you Kelly for the long time support. We appreciate you so much <3

Ansh Deb

Oh my this is mind blowing. Does it support streaming on self hosted?

Helena

@ansh_deb Oh hey Ansh good to see you again!! Yes it surely does!

Ansh Deb

@hehe6z That's amazing! Would love to give it a try soon!

Helena

@ansh_deb let me know if we can support with anything!

Denis Akindinov

How does Fish Audio maintain consistent emotional prosody and rhythmic nuance across long-form content, and what specific architectural improvements over So-VITS-SVC allow for such high-fidelity cloning from only 10 seconds of source audio?

Helena

@mordrag great question Denis! S2 moves beyond systems like So-VITS-SVC and instead generates speech with a large speech-language model that operates on discrete audio tokens, which lets it maintain the traits over long passages. because S2 is heavily pretrained on large-scale speech data, the reference clip mainly anchors speaker identity and style, so it can clone voices extremely well from just 15 seconds of sample audio.

David Parrelli

This is a big unlock for anyone building voice-driven products. Directing voices with natural language cues like [whisper] or [laughing nervously] instead of fiddling with sliders is so much more intuitive. Love that it's open source too. What languages are you seeing the most community demand for?

Kyle Cui

@dparrelli Besides English a lot of Spanish, Chinese, and Japanese! Thank you for your support David!

Oratis

exactly what we need, gonna try it now

Helena
@oratis thanks oratis! let us know what you think!!
Vladimir Osipov

Excited to see the new version coming! Will it support any new languages?

Helena

@vladimir_osipov Thank you Vladimir! Yeah the language support has expanded significantly compared to S1. S2 Pro supports 80+ languages.

Tier 1: Japanese (ja), English (en), Chinese (zh)

Tier 2: Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)

Other supported languages: sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.

Kshitij Mishra

this is called gold mate! keep making more such products like these

Helena

@kshitij_mishra4 thanks man!!

Lifan Wang

Good job!

Kyle Cui

@lifan_wang Thanks for your support Lifan! Hope you have fun trying it out, let us know your thoughts!

1234
Next
Last