Over the past few months, we ve been talking to a lot of you using Velo.
Real conversations, and people trying it out, sending clips, pointing things out.
And almost everyone said some version of the same thing: It sounds like me but something feels missing.
At first, we thought it was about accuracy. Maybe the voice wasn t close enough. But the more we listened, the clearer it became - that wasn t the issue.
The issue was how it felt. The tone stays a bit too samey. The emphasis doesn t always land where you expect it to. And the little natural shifts that make your voice yours just aren t fully there yet. It sounds right, but it doesn t feel alive.
So we went back and started reworking how we think about voice cloning at Velo. Not just matching how you sound, but capturing how you express. The way your voice changes when you re explaining something, when you re just talking casually, or when you actually care about what you re saying.
That s what we re building now. The next version of Velo is focused on higher fidelity voice cloning. More nuance. Better pacing. More natural expression.
Something that doesn t feel like a generated voice reading your script, but closer to you actually speaking.
We re still building it, but it s coming together fast. We re planning to ship this soon.
If you ve used Velo before, we d love to know - what do you think about Velo's voice cloning or other workflows? What would make it feel right?
We re listening.
Building Velo made one thing very clear to us — browser agents don’t fail because they’re weak, they fail because they’re blind.
Most tools try to fix this with better models. We went a different route — better grounding.
A simple user walkthrough (clicks, intent, flow) becomes the instruction layer for the agent. That single shift reduced unnecessary steps and made the system far more reliable.
Still early, but we’re excited about where this can go. Would genuinely love feedback from the community.
Velo
@sundeepjoshi That shift to grounding made a huge difference for us.
Feels like this is just the beginning
Grass
@sundeepjoshi This is such an underrated observation. The grounding problem is real and it shows up in coding agents too, not just browser agents.
We ran into a version of this building Grass. Agents running in the cloud with no feedback loop back to the developer. They'd go sideways at minute 10 and just keep going. The fix wasn't a better model either. It was better visibility. Real-time tool call approval, diffs mid-session, the ability to steer before things go wrong.
Different domain, same root problem. Excited to see where you take this.
@sunnyjoshi This resonates a lot. The “no feedback loop” point is exactly where agents tend to drift.
Also interesting that "better models didn’t help" — better visibility and steering did. Feels like we’re converging on the same idea: grounding helps at the start, but ongoing visibility is what keeps agents reliable.
Different domain, same root problem indeed. Excited to see where this goes.
Velo
@sunnyjoshi @sundeepjoshi Couldn't agree more
Velo
@sundeepjoshi @sunnyjoshi You should read this blog we wrote about how we got the browser agent to work. https://www.usevelo.ai/articles/agentic-screen-recording-by-velo
@ajaykumar1018 If the AI generates a polished script and voiceover that is significantly shorter or longer than my original screen recording, how does the tool handle the visual timing ? Does it automatically speed up/slow down the footage, or does it intelligently use freeze frames or jump cuts to keep the visuals synced to the new audio ?
Velo
@ajaykumar1018 @deepali_mathur
Yup we handle the entire audio video sync automatically. You should try it out
recording a “quick video” and then re-recording it 5 times is way too real 😅. feels like the hardest part isn’t recording, it’s sounding like a normal human while doing it. if this actually fixes that, it’s a big deal
Velo
@webappski Would love for you to try it!!
Konfide
Nice idea
A lot of friction to just start using including the Chrome browser install. It would be good a 2-3 clicks max experience for a user to try.
Velo
@felipe_daguila Thanks for your feedback, We're constantly improving the product experience and we'd love for you to keep trying everything we launch.
DronaHQ
Congratulations team Velo! Love the focus on closing the gap between raw intent and polished output.
Regarding the AI Voiceover sync, if a user decides to edit the generated script after the video is processed, how does the engine handle the re-syncing of the visual timing? Does the browser agent actually "re-record" the sequence to match the new pacing of the speech?
Velo
@gayatri_sachdeva Thanks Gayatri, this is a great question! If someone edits the script after the video is generated, we don’t re-record everything. Instead, we adjust the timing of the scenes to match the updated voiceover.
We use visual cues in the video (like clicks, hovers, and page changes) as anchors, and then tweak the pacing so everything stays in sync.
Not bad, time to move away from loom
Velo
@klashkil Spot onnnn
ProdShort
I just recorded a video demo for my launch, with another app.
I will definitely test Velo to compare.
Velo
Velo
@bengeekly Would love to hear how it compares
Let me know what you think once you try it
ProdShort
@sourav_sanyal @ajaykumar1018
I used Velo to make a video and I enjoyed building it. the process is straight forward, and the initial result is already good. What I liked the most is also what bothers me a bit. Having the audio completely regenerated, fixes my English mistakes, which is great, but what I loose is the video authenticity. In the future, having something that keep my voice but correct mistakes would be perfect.
Here is the result video, and it was easy to do: https://app.usevelo.ai/share/0a21d88e-071b-41d5-8131-85cd42074ac7
I will be launching tomorrow, I might use the video on my launch.
Velo
@ajaykumar1018 @bengeekly Thank you so much for your feedback, glad you liked it.