VertoX update — getting very close to launch
by•
Hey everyone
Quick update on VertoX.
We’ve made a lot of progress on the backend and core systems. We’re building our own open-source ASR → NMT → TTS pipeline and aiming for ~1 second real-time translation.
Right now, we support 17 output languages and 10 input languages, with plans to expand further.
We’re also working on voice cloning from the start, so translations keep your tone, emotions, and your actual voice. Plus, adding voice detection to make conversations feel more natural.
We’re very close to our first release now.
If you’re curious, you can check the website through my profile (still in progress).
Would love to have early testers soon, your feedback will be extremely valuable as we launch 🚀
97 views

Replies
Wow,real-time translation with voice cloning? That's incredible progress! I'll definitely check out the site and would love to test it out.
If you're up for it,I'm launching The Sponge on PH soon...AI-powered flashcard app that turns webpages into study material with spaced repetition. Would appreciate a follow (PRODUCT HUNT LAUNCH link in profile).
@rianbrob Really appreciate it!
Yeah, the idea is that instead of tools like Zoom or Google Meet, you’ll be able to use VertoX Meet with real-time translation built in. And on mobile, it’ll be more like 1-on-1 conversations where you can speak any language and still be understood naturally.
Also, I’ve already come across The Sponge before; it looks like a cool product, and I’m already following it.
Happy to support your Product Hunt launch!
Good luck on the launch! Curious, how does this product handle accents? Does the cloned voice also transfer the accent or does it get neutralized?
@olga_kargopolova thanks a lot, really appreciate the support!
Right now, we’re mainly focused on preserving your tone, emotions, and your actual voice in real-time translation.
Accent handling will definitely be part of the product as we scale. There are hundreds of accents globally, so it’s an important direction for us. I’m also open to potential partnerships in this area if it helps us move faster and do it right. We’ll keep improving the voice quality over time to make it as natural and close as possible
VertoX is a fascinating build — real-time ASR+NMT+TTS at sub-second latency is genuinely hard. I've shipped streaming backend systems in NestJS + Node. If you need a second pair of hands on infra or pipeline scaling before launch, check my profile and reach out — would love to help.
@mohammad_zeeshan13really appreciate that, means a lot!
Right now, we’re keeping the team lean and focused, so we’re not actively looking for additional engineering support yet. That said, I’d be happy to stay in touch, feel free to follow along, and I can add you to the waitlist to share updates as we get closer to launch. Once we scale further, it would definitely make sense to reconnect 👍
@mohammad_zeeshan13 I noticed you’re a full-stack engineer, nice background.
We’re currently opening a role for a Senior Frontend Engineer at VertoX. Given your 7–8 years of experience, I think it could be a great fit. If you’re interested in working in a startup environment, I’d be happy to take a look at your CV and set up a call to discuss the role further.
Let me know 👍
@nemos
This looks incredible! That ~1 second latency goal is a game-changer for real-time flow.Since you're implementing voice cloning and emotion retention from the start, how are you handling the prosody transfer? Are you using a reference encoder in the TTS stage, or are you injecting speaker embeddings directly into the NMT output?
@g_nithish_kumar Great question, right now we’re experimenting with multiple approaches around prosody and voice preservation. We’re testing both reference-based methods and speaker embeddings in the TTS stage, keeping the pipeline flexible as we iterate. At the moment, we’re leveraging open-source models as a foundation, while building and refining our own system over time. Long term, we plan to develop our own models as we scale.
Still early, so we’re focused on finding what delivers the most natural result in real-time settings.