Sourav Sanyal

We asked what felt off about AI voices, you told us. We’re fixing it.

by

Over the past few months, we’ve been talking to a lot of you using Velo.

Real conversations, and people trying it out, sending clips, pointing things out.

And almost everyone said some version of the same thing: “It sounds like me but something feels missing.”

At first, we thought it was about accuracy. Maybe the voice wasn’t close enough. But the more we listened, the clearer it became - that wasn’t the issue.

The issue was how it felt. The tone stays a bit too samey. The emphasis doesn’t always land where you expect it to. And the little natural shifts that make your voice yours just aren’t fully there yet. It sounds right, but it doesn’t feel alive.

So we went back and started reworking how we think about voice cloning at Velo. Not just matching how you sound, but capturing how you express. The way your voice changes when you’re explaining something, when you’re just talking casually, or when you actually care about what you’re saying.

That’s what we’re building now. The next version of Velo is focused on higher fidelity voice cloning. More nuance. Better pacing. More natural expression.

Something that doesn’t feel like a generated voice reading your script, but closer to you actually speaking.

We’re still building it, but it’s coming together fast. We’re planning to ship this soon.

If you’ve used Velo before, we’d love to know - what do you think about Velo's voice cloning or other workflows? What would make it feel right?

We’re listening.

211 views

Add a comment

Replies

Best
Judit

Totally agree. The problem isn’t accuracy anymore, it’s expression.
The “it sounds like me but doesn’t feel like me” is exactly where most tools break.

Elena K

This feels like exactly the right insight.

“Sounds like me” and “feels like me” are completely different product thresholds. A lot of AI products can get surprisingly far on resemblance, but people notice very quickly when expression, emotional timing, and natural variation are missing.

That’s where the uncanny feeling usually lives - not in the obvious errors, but in the absence of subtle life.

We think about something similar at SpeakUp
In any workflow connected to people, communication, and trust, the missing layer is often not functionality - it’s nuance.

Really strong direction. If you can make voice cloning feel less like playback and more like presence, that’s a meaningful leap.

Sai Tharun Kakirala

The uncanny valley of AI voices is so real. The specific thing that gets me is the rhythm - most AI voices nail individual word pronunciation but miss the natural flow of how humans accelerate or slow down across a sentence based on meaning and emphasis. It ends up feeling like someone reading words rather than saying them. Building Hello Aria (text-based AI assistant via WhatsApp and iOS), we deliberately stayed text-first partly for this reason - the text medium has more tolerance for AI-style communication than voice does. But the teams cracking the voice problem are doing something genuinely hard. Really looking forward to hearing how the fixes you're shipping actually change the listening experience.