VoxCPM - Tokenizer-free TTS for true-to-life voice
VoxCPM is a new open-source, tokenizer-free TTS model. By modeling speech in a continuous space, it overcomes the limitations of discrete tokens to deliver highly expressive, context-aware speech generation and incredibly realistic zero-shot voice cloning.



Replies
Flowtica Scribe
Hi everyone!
The next big challenge for TTS isn't just clarity, but expressiveness. Many models sound clear, but still feel a bit robotic because they break speech down into discrete tokens, losing the natural flow of the human voice.
VoxCPM from the OpenBMB and ModelBest teams takes a different path. It's a "tokenizer-free" model, and you can really hear the difference in the final output.
Two things really stand out to me. First, its context-aware generation, it can read a piece of text and automatically know whether to sound like a storyteller or a weather reporter. Second, the zero-shot voice cloning is incredibly realistic, capturing not just the timbre but also the unique accent and emotional tone of the speaker.
It's an open-source model and runs efficiently on consumer GPUs.