Ryan Hoover

Fish Audio S1 - Expressive Voice Cloning and Text-to-Speech

Fish Audio S1 is the most expressive and emotionally rich TTS model—creating lifelike voices that capture emotion, rhythm, and nuance. Clone any voice in 10 seconds, preserving accent, tone, and speaking habits with unmatched realism.

Add a comment

Replies

Best
JaredL

Really impressive demo — the emotion and expressiveness feel far closer to a real human than most TTS I’ve tried.

Helena

@jaredl thank you Jared soo glad to hear that!!

kenny sacsara aspajo

Probé mucho pero fish audio es lo mejor con voces muy realistas y la mejora continúa, gracias fish audio y que sigan innovando..

Helena

@kenny_sacsara_aspajo ¡Muchas gracias por tus palabras! :)

Nos alegra mucho saber que te gustan las voces y que notes la mejora continua. Seguiremos innovando para que cada voz se sienta aún más real y expresiva

Egavas 79

I just hope new fish audio can read text up to 10 minutes, without the voice speeding up.

Helena

@egavas_79 It can read text up to 10 minutes without speeding up! If it gets longer to ~25+ tho some distortion starts to happen. We're working actively to improve this for the next generation model!

mastersystem1111

Really great realism and completely UNCENSORED (unlike elevenlabs or the other big "fishes" in the pond). Really recommend it! Use it for ASMR role plays and it's so much more realistic than the competition!

Helena

@mastersystem1111 thank you!! and love the "fish" pun hahah

Jules-Camille Doré

Congrats

Vladimir Lugovsky

Love seeing such expressive TTS tech becoming accessible. Fantastic launch!

Helena

@vladimir_lugovsky thank you so much Vladimir, we're working hard every day to make even more accessible!

Hang Huang

Impressed by how expressive the voices are, the emotion sits in the pauses and timing. I dropped a 10 second sample and it sounded surprisingly human, with little quirks that made it feel like a real person.

Zhizhuo Zhou

@hanghuang thanks so much for your support! we really appreciate it!

Hang Huang

Impressed by how expressive the voices are, the emotion sits in the pauses and timing

Timecrest Lore
I’m still learning how to use the website, so I’m not sure if you already have something like this in mind, but one feature I’d love to see implemented is the ability to record my own voice and use that as a reference. What I mean is, I’d like the model to capture the cadence, tone, and emotional range of my voice, while still keeping the generated voice intact unless I choose to fully replace it. For example, if I wanted to add more emotion to a line—like sadness, excitement, or frustration—it would be great if the AI could analyze my sample and then mirror that same energy or inflection. Right now, some of the voices, even though they sound great, don’t always capture those subtle nuances or the emotional texture I’m aiming for. It would also be helpful to have an option to upload a short voice sample, maybe a few seconds long, without having to go through complicated prompts or on-screen steps. That would make it much easier for people who can’t see what’s on the screen or find it hard to navigate the interface visually. Ideally, the system could take that single clean sample and let me adjust the tone—like making it slightly higher, softer, or deeper—while maintaining the original emotional feel. Maybe there could also be a toggle or slider for mood, so if I wanted to sound calmer or more intimate, I could easily tweak that. Add the ability to get a female and mail version of the voices directly on the main interface. And it’d be amazing if the system eventually allowed for accent customization too—like being able to choose between English, Scottish, or other regional accents while still reflecting my own speaking rhythm. Basically, what I’m hoping for is a way to be a more active participant in the process—using my voice not just as text input but as an emotional guide for how I want the final result to sound.
Helena

@timecrestlore Thanks you for the really amazing inputs. I hear you on the accent customization part, that would be a really nice feature to have and we're looking into it.

For using a short sample of your voice as reference and have that be the emotional guide but tweak on top of that - that's actually the core functionality we provide currently, if I understood you correctly. You simply make a voice here: https://fish.audio/app/voice-cloning/ and then you can start using that in the playground: https://fish.audio/app/text-to-speech/

Jason Yu

The voice quality is insanely realistic. Can’t believe it only needs 10 seconds of input!🔥