Fish Audio is the most expressive and emotionally rich text-to-speech model. It generates lifelike voices that capture emotion, rhythm, and nuance with remarkable realism. Fish Audio Voice Clone recreates a natural voice from just 10 seconds of audio—preserving accent, tone, and speaking habits. Proudly built by the open-source team behind So-VITS-SVC and Bert-VITS2, giving a soul to every voice.
This is the 4th launch from Fish Audio. View more

Fish Audio S2
Launched this week
We've open-sourced Fish Audio S2, a new generation of expressive TTS that lets you direct voices with natural language. Add cues like [whisper] or [laughing nervously], generate multi-speaker dialogue in one pass, and create scary-real voices across 80+ languages.






Free
Launch Team / Built With








Sway
Fish Audio es una joya tecnológica por su fidelidad y velocidad, permitiendo una clonación impecable. Sin embargo, quitar la temperatura es un error garrafal: le quita el alma al relato. Al automatizar la emoción, transforman una herramienta expresiva en un robot plano, privándonos del caos necesario para transmitir sentimientos reales.
I've always wanted a TTS that can do a bunch of tags while preserving fairly good similarity and flow. Fish audio s2 pro surprised me with all these and... I'm loving it! Hope to explore more usage cases with s2-pro model!
Just came here to say I found Fish Audio via some deep research I had Opus 4.6 do on the best TTS providers and after several rounds of discussion about the findings, it was clear that Fish was the winner. Starting to integrate it into our AI Mental Training app now, and so far so good!
Sway
Fish Audio
@christian73 Thank you so much Christian!
Pricing itself makes a huge difference compared to competitors. And the quality is on par with most of "high end" TTS models
Very cool! In voice AI, the lack of emotions is the main problem.