Ryan Hoover

Fish Audio S1 - Expressive Voice Cloning and Text-to-Speech

Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.

Add a comment

Replies

Best
Jason Shen

I tested at least 30 models for cloning voice...BUT this looks so legit, especially for emotion control. Looking forward to using it🚀🚀

Zhizhuo Zhou

@jason_shen3 Thanks so much Jason! We hope you use fish audio for some cool stuff!

Jason Shen

@zhizdev a random thoughts: Can I clone others' voice with customized text to read instead strictly follow the text message you provided? That can allow me to clone others' voice if I only have recorded audio files about them.

Helena

@zhizdev  @jason_shen3 The text on the landing page voice clone demo is just a guide in case it's hard to come up with something to say, but you can essentially say anything according to your needs. We don't put a limit on voice clone slots right now, so you can test out a few different samples with different pace/emotion/tone and see the results for yourself :) It really depends on what you're trying to do with it.

Sean Tiffonnet ▲

Honestly this is impressive, the results are good! Can't wait to see in what directions this could be used.

Zhizhuo Zhou

@seantiffonnet Thanks so much Sean! We really appreciate it!

YG Yichen Guo

Great voice product - congrats!!

Zhizhuo Zhou

@yichen_guo1 Thanks so much! This means a lot!

Cruise Chen

Few audio models produce great quality voice with emotions and I do hope Fish audio could be the one. Will you also provide end to end agent product for prosumers? Or just provide api services?

Zhizhuo Zhou

@cruise_chen Thanks so much for the support! Our immediate roadmap is to make voice better!

Helena

@cruise_chen Our product for prosumers now includes a full story studio to make speech generation workflows a lot easier, i.e. audiobooks, video narration etc. And we're adding a bunch of prosumer focused features in the coming 2 months too :)

Jules-Camille Doré

Congrats

Zhizhuo Zhou

@jules_camille_dore Thanks appreciate it!

Abdul Rehman

This is amazing! How do you handle different accents with just 10 seconds of audio?

Zhizhuo Zhou

@abod_rehman Thanks so much for your support! Our model has been trained across multiple languages so it is able to capture accents as well as speaking behavior!

Michelle Fang

Congrats Helena and team on the launch! Incredible work behind the scenes and so thrilled to see this come to life! How can I help get more startups building with your TTS/are conversational agents on the roadmap next?

Helena

@michelle_fang2 thank you for being there for me all the time behind the scenes ♥️ Lemme cook harder and we make some noise tgt HAHA
and yes, on the roadmap and currently training a new model for it!

Dave Fontenot

insane.

Zhizhuo Zhou

@hellyeah insanity

Max Chang

Good voice tech is usually extremely expensive. Nice to see a high-quality option that's actually affordable

Zhizhuo Zhou

@max_chang3 Thanks so much for your support Max!

Helena

@max_chang3 thank you MAXIMILIAN

Jason Yu

🎧 Just gave Fish Audio a spin — and wow, the emotional depth is next level. I’ve played around with other TTS tools before, but they often fall flat when it comes to tone and expressiveness. This one? Feels like it gets the soul of the voice. 😮‍💨

I’m especially impressed by the 10-second voice cloning — tested it with a friend’s audio snippet, and the result was uncanny.

Curious: how does it compare with commercial models like ElevenLabs in multilingual scenarios? Have you stress-tested accents or emotion transfer in other languages?

Massive props to the team behind So-VITS-SVC and Bert-VITS2 — open-source + this level of polish is rare. 🔥

Zhizhuo Zhou

@kui_jason Thanks for giving us a spin! We support multilingual by default and they can be mixed in a sentence. We are working hard to get emotion right in more languages!

Owen T

@zhizdev how do you mix multiple languages in a sentence? i've been trying to do that and burning through my credits. use case: travel videos in english narration but correctly saying French/Italian/Chinese attraction names. that's the only thing stopping me from getting the annual subscription!

Helena

@zhizdev  @owen_t Oh hmm that's a great point you're raising, right now it'd be hard cuz depending on the voice model you use it tries to adapt to the speaking patterns of that original accent, which is why S1 captures the voice traits so well. We're working on a newer model with more options to satisfy different use cases including the one you mentioned. Pls stay tuned!