Fish Audio is the most expressive and emotionally rich text-to-speech model. It generates lifelike voices that capture emotion, rhythm, and nuance with remarkable realism. Fish Audio Voice Clone recreates a natural voice from just 10 seconds of audio—preserving accent, tone, and speaking habits. Proudly built by the open-source team behind So-VITS-SVC and Bert-VITS2, giving a soul to every voice.
This is the 4th launch from Fish Audio. View more

Fish Audio S2
Launched this week
We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.






Free
Launch Team / Built With








Excited to see the new version coming! Will it support any new languages?
Fish Audio
@vladimir_osipov Thank you Vladimir! Yeah the language support has expanded significantly compared to S1. S2 Pro supports 80+ languages.
Tier 1: Japanese (ja), English (en), Chinese (zh)
Tier 2: Korean (ko), Spanish (es), Portuguese (pt), Arabic (ar), Russian (ru), French (fr), German (de)
Other supported languages: sv, it, tr, no, nl, cy, eu, ca, da, gl, ta, hu, fi, pl, et, hi, la, ur, th, vi, jw, bn, yo, sl, cs, sw, nn, he, ms, uk, id, kk, bg, lv, my, tl, sk, ne, fa, af, el, bo, hr, ro, sn, mi, yi, am, be, km, is, az, sd, br, sq, ps, mn, ht, ml, sr, sa, te, ka, bs, pa, lt, kn, si, hy, mr, as, gu, fo, and more.
Klariqo AI Voice Assistants
Oh my this is mind blowing. Does it support streaming on self hosted?
Fish Audio
@ansh_deb Oh hey Ansh good to see you again!! Yes it surely does!
Klariqo AI Voice Assistants
@hehe6z That's amazing! Would love to give it a try soon!
Fish Audio
@ansh_deb let me know if we can support with anything!
As a content creator - I've been looking for a product like this for a long time! Hope it'll match my expectations.
Fish Audio
@yotam_dahan i think fish s2 would be the best for content creators! excited for you to try it, let us know what you think :)
Congrats on the launch! 🎉
The focus on emotion and nuance in TTS is really interesting. A lot of voice models sound technically good but still feel a bit flat, so the idea of capturing rhythm and speaking habits is compelling.
Also impressive that voice cloning works with just ~10 seconds of audio. Curious how you’re handling consent and voice ownership safeguards as this gets adopted more widely?
Runner AI
Fish Audio is hands down one of the most impressive TTS tools I've come across. I fed it a short clip and the output genuinely sounded like me. You can make your cloned voice whisper, laugh, get excited — it's funny and a little surreal hearing yourself say things in ways you never actually did . Can't believe this is open source. Great stuff, keep it up!
Congrats on the launch! I'm curious: if I’m building a real-time voice agent where latency and fine-grained emotion are dealbreakers, what specific benchmarks or features make Fish Audio a better bet than ElevenLabs right now?
Just found fish audio this year and was surprised about the API and the S1 model. Well, the S2 is now absolutely mind-blowing. Great work!
Fish Audio
@michael_pohl Awesome to hear Michael, thank you!