Voice cloning has a quiet problem nobody talks about

A user on r/ElevenLabs described their cloned voice as "too pristine - like the AI restored my voice to a younger, less character-filled version of itself."

That one sentence nailed it. Most cloning models are trained on clean, professional audio.

So when your voice goes in, it doesn't just get copied, it gets corrected.

The rasp smoothed out, the pacing evened, and the natural drop in energy at the end of a thought, gone.

Technically better audio. Emotionally hollow output.

A November 2025 study from Queen Mary University found listeners are fooled by synthetic voices more than half the time.

But fooled isn't the same as convinced. Something feels off. They just can't name it.

What they're feeling is the absence of imperfection. The fingerprints that make a voice actually yours.

Most tools optimise for clean. However, we are taking a little different road than just clean.

We built Velo's voice clone to optimise for you - preserving the rhythm, texture, and natural unevenness that makes your voice recognisable as yourself.

Our new feature launch is also around perfecting voice clones and streaming them faster.

This is such an exciting time to be in this AI space. Would love to connect with any fellow builder in AI voice cloning space.

Let's catch-up and exchange ideas, what say?

60 views

Voice cloning has a quiet problem nobody talks about

Replies