Expressive Text-to-Speech and Voice Cloning

Fish Audio S2 - Real Expressive AI Voices

by•2mo ago

We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.

Replies

Best

Fish Audio

Maker

📌

Hi our beloved PH!
[excited] [slightly nervous]

Today we’re launching Fish Audio S2, our new text-to-speech model.

[long pause]

Hear Fish S2 Read This!

This is a big step beyond S1, redefining expressive voice AI. Write emotion cues anywhere in the text and hear the speech flow exactly how [emphasis] YOU direct it.

And, [inhale] we’re open-sourcing all of it.

GitHub: https://github.com/fishaudio/fish-speech/
HuggingFace: https://huggingface.co/fishaudio/s2-pro/

Shout out to SGLang for powering our stack.

There’s much more to S2.
Try it yourself now: https://fish.audio/s2/

As always, we want to give back to the community. For the launch, we’re offering free generation credits and an exclusive 50% OFF promo code: PH-FishS2

Go build weird things with it :)

We’d love to hear what you make.

Report

2mo ago

Fish Audio

Maker

@hehe6z incredibly proud of this one, amazing job team!

Report

2mo ago

Fish Audio

Maker

@rissa_cao teamwork 👾

Report

2mo ago

Vozo AI — Video localization

@hehe6z Congrats on the launch! Curious how you see Fish Audio compared with ElevenLabs — what do you think are the biggest advantages or differences today?

Report

2mo ago

@hehe6z best ai tool i ever found for my work

Report

17d ago

Calling Clones

Can I use this in a raspberri pi voice assistant that I have at home?
What abour the voice cloning to use it in phone calls?
eleven labs is not that good.. ( or I dont know how to set it up)

Report

2mo ago

Fish Audio

Maker

@javierfandos Hi Javi, this is a great point - yes you absolutely can! For example home-assistant has direct fish audio support, you can check out the deets here: https://www.home-assistant.io/integrations/fish_audio/. Voice cloning is also one of the flagship features our users love because of the extreme realism :)

Report

2mo ago

Calling Clones

I'm lauching something soon! I need to find somenthing! Will take a look! dankeee

Report

2mo ago

Fish Audio

Maker

@javierfandos that's awesome looking forward to your launch!!

Report

2mo ago

Calling Clones

@hehe6z WOW! just cloned my voice. its actually better than eleven labs!

Report

2mo ago

big fish audio fans for a long time, been witness the team always go above and beyond. let's gooooo s2! congrats on this launch

Report

2mo ago

Fish Audio

Maker

@kellyann3644 Thank you Kelly for the long time support. We appreciate you so much <3

Report

2mo ago

Klariqo AI Voice Assistants

Oh my this is mind blowing. Does it support streaming on self hosted?

Report

2mo ago

Fish Audio

Maker

@ansh_deb Oh hey Ansh good to see you again!! Yes it surely does!

Report

2mo ago

Klariqo AI Voice Assistants

@hehe6z That's amazing! Would love to give it a try soon!

Report

2mo ago

Fish Audio

Maker

@ansh_deb let me know if we can support with anything!

Report

2mo ago

How does Fish Audio maintain consistent emotional prosody and rhythmic nuance across long-form content, and what specific architectural improvements over So-VITS-SVC allow for such high-fidelity cloning from only 10 seconds of source audio?

Report

2mo ago

Fish Audio

Maker

@mordrag great question Denis! S2 moves beyond systems like So-VITS-SVC and instead generates speech with a large speech-language model that operates on discrete audio tokens, which lets it maintain the traits over long passages. because S2 is heavily pretrained on large-scale speech data, the reference clip mainly anchors speaker identity and style, so it can clone voices extremely well from just 15 seconds of sample audio.

Report

2mo ago

Cue

This is a big unlock for anyone building voice-driven products. Directing voices with natural language cues like [whisper] or [laughing nervously] instead of fiddling with sliders is so much more intuitive. Love that it's open source too. What languages are you seeing the most community demand for?

Report

2mo ago

Fish Audio

Maker

@dparrelli Besides English a lot of Spanish, Chinese, and Japanese! Thank you for your support David!

Report

2mo ago

HakkoAI

exactly what we need, gonna try it now

Report

2mo ago

Fish Audio

Maker

@oratis thanks oratis! let us know what you think!!

Report

2mo ago

Excited to see the new version coming! Will it support any new languages?

Report

2mo ago

Fish Audio

Maker

@vladimir_osipov Thank you Vladimir! Yeah the language support has expanded significantly compared to S1. S2 Pro supports 80+ languages.

Tier 1: Japanese (ja), English (en), Chinese (zh)

Tier 2: Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)

Other supported languages: sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.

Report

2mo ago