Chatterbox Turbo is a 350M parameter open-source TTS model. It features paralinguistic tags (control laughs, sighs, etc.), zero-shot cloning, and runs 6x faster than real-time. Uniquely includes built-in PerTh watermarking for safety.
This is a really generous release from the Resemble AI team. The "paralinguistic tags" feature is super interesting: being able to simply type [laugh] or [sigh] to control the emotion is a very practical touch for getting natural results.
I also really appreciate that it includes the PerTh watermarking by default. It is rare to see safety features baked directly into an MIT-licensed model like this.
Fast, expressive, and traceable. This model has huge potential in the open source TTS space.
Report
@zaczuo agree ! The quality is quite good and comparable to elevenlabs v3.0 model in alpha . Hope this is made available in voice agents platforms like Livekit soon ?
@zaczuo Congrats on the Chatterbox Turbo launch shipping native watermarking by default is a strong signal in the current voice-AI risk landscape.
I run a security firm focused specifically on AI abuse & adversarial testing, rather than traditional web pentesting. This week we completed our first AI-focused penetration test, where we validated a real-world weakness within the application logic itself not infrastructure related to how AI safety assumptions were enforced under abuse scenarios.
That experience is what prompted me to reach out. We help voice and generative-media companies pressure-test areas like watermark evasion, consent bypass paths, API abuse, and output-based model extraction before those issues are discovered externally.
I’m not reaching out to sell tooling more to see if you’d be open to a short conversation on how adversaries actually try to bypass safeguards like watermarking and voice controls, and what testing has proven most useful so far. happy to share concrete examples if useful.
Neat. I do light VO/podcast stuff and the speech-to-speech + quick edits are what I care about. Zero-shot clone for pickups sounds handy. Big plus on watermarking + detection—feels safer. Curious how natural the laughs/sighs controls come out.
This is amazing, man! Audio editing is usually a pain. If this actually simplifies it, that’s a big win.
Report
I gave it a try; what you did is really nice.
Report
I'll give it a try for my new project!
Report
Wow, this is definitely a game changer!
Report
Very interesting! Which languages are supported? If I provide a sample in one language, can I copy my voice and have the service read something in another language using my voice?
Replies
Flowtica Scribe
Hi everyone!
This is a really generous release from the Resemble AI team. The "paralinguistic tags" feature is super interesting: being able to simply type [laugh] or [sigh] to control the emotion is a very practical touch for getting natural results.
I also really appreciate that it includes the PerTh watermarking by default. It is rare to see safety features baked directly into an MIT-licensed model like this.
Fast, expressive, and traceable. This model has huge potential in the open source TTS space.
Resemble AI
@zaczuo @a_r48 thanks! already available in livekit: https://docs.livekit.io/agents/models/tts/plugins/resemble/
@zaczuo Congrats on the Chatterbox Turbo launch shipping native watermarking by default is a strong signal in the current voice-AI risk landscape.
I run a security firm focused specifically on AI abuse & adversarial testing, rather than traditional web pentesting. This week we completed our first AI-focused penetration test, where we validated a real-world weakness within the application logic itself not infrastructure related to how AI safety assumptions were enforced under abuse scenarios.
That experience is what prompted me to reach out. We help voice and generative-media companies pressure-test areas like watermark evasion, consent bypass paths, API abuse, and output-based model extraction before those issues are discovered externally.
I’m not reaching out to sell tooling more to see if you’d be open to a short conversation on how adversaries actually try to bypass safeguards like watermarking and voice controls, and what testing has proven most useful so far. happy to share concrete examples if useful.
Wow man! I'll give it a try. All the best here
Capacity
I will definitely implement it on storyshort !
Makers Page
Neat. I do light VO/podcast stuff and the speech-to-speech + quick edits are what I care about. Zero-shot clone for pickups sounds handy. Big plus on watermarking + detection—feels safer. Curious how natural the laughs/sighs controls come out.
Triforce Todos
This is amazing, man! Audio editing is usually a pain. If this actually simplifies it, that’s a big win.
I gave it a try; what you did is really nice.
I'll give it a try for my new project!
Wow, this is definitely a game changer!
Very interesting! Which languages are supported? If I provide a sample in one language, can I copy my voice and have the service read something in another language using my voice?