Chris Messina

Visual Translate by Vozo - Translate text in your videos without recreating visuals

Fully translated videos — finally. Visual Translate adds the final layer — translating text inside videos — on top of voice dubbing, lip-sync, and subtitles. It detects and translates on-screen text, from slides and diagrams to callouts and labels, while preserving the original layout, style, and animation. Turn slide videos and explainers into multilingual versions and reach a global audience — without recreating visuals from scratch.

Add a comment

Replies

Best
CY

👋 Hi Product Hunt! CY here, founder of Vozo.

I’m an ex-Googler researcher who helped build core video technology for Android, Glass, and Photos.

Visual Translate is Vozo’s 3rd launch on Product Hunt — bringing the last missing layer of video translation: the text inside videos. It builds on our previous successful PH launches around AI dubbing, lip-sync, subtitles, and translation quality.

👉 Fully translated videos — finally possible.

With Visual Translate, Vozo can now translate the text inside videos — slides, diagrams, UI labels, and callouts — while keeping the translated text fully editable.

This turns out to be surprisingly tricky: the system has to decide what to translate, what to keep, and how to recreate visuals without breaking layout, style, or animation — but we’ve finally made it work.

We’re starting with slide videos and explainer videos, where much of the information lives directly in the visuals. With this final layer solved, important videos can finally travel across languages instead of being locked inside one.

🚀 We’re opening FREE beta access todaysign up with Gmail and try Visual Translate. Let us know what videos you’d translate first.

Josie OY

@lightfield Hi everyone — I’m Josie, the PM & designer behind Visual Translate at Vozo.

Really excited that Visual Translate is finally live after several weeks of development and early user trials.

Here are a few sample demos:

• DJI promo video

• A slide-based video

• A training video

• A Gemini intro video

• Geoffrey Hinton’s Royal Institution talk on AI


You can also check out a short How-to video showing how it works.

Over the past few weeks, users from different industries have already used Visual Translate to localize videos such as medical explainers, internal training, and safety instruction videos. It’s exciting to see it being used in real workflows.

Happy to answer any questions! Feel free to ask about how Visual Translate works under the hood, or tell us what kind of videos you’d like to translate.

Steven Austen Lynn

@lightfield Translating the actual text inside videos feels like a huge missing piece in video localization. Subtitles and dubbing solve the audio layer, but the visuals are where a lot of the real information lives in explainer and slide videos.

Keeping the translated text editable while preserving the layout seems like the hardest part of that problem.

Curious how often the system has to recreate visuals versus just replacing text elements.

CY

@hpsimulator Great observation! Often the text comes with visual elements like backgrounds or borders, which sometimes need to be recreated. Aside from those cases, we try to keep the original visuals unchanged and only replace the text.

swati paliwal

@lightfield Congrats on this to your team! Quick question: how does editable text ensure brand consistency across 110+ languages without manual tweaks every time?

CY

@swati_paliwal Great question! We provide a glossary feature to help keep brand terms and key phrases consistent across translations. And since everything stays editable, teams can easily fine-tune wording if needed.

Helena

@lightfield congrats on the launch bro! super useful tool

Josie OY

@lightfield  @hehe6z Thanks Helena! Really appreciate the support!

Linh Hwang

@lightfield Tried your product with Eng to Vietnamese translation and honestly, the accuracy surprised me a bit. It reads quite natural and gets the context right, which is not easy.

The only thing is the waiting time feels a bit long. If I’m testing a few times in a row, it starts to slow things down. Might be worth looking into scaling or optimizing the processing a bit so users don’t have to sit in the queue too long.

Overall though, this is really solid.

Josie OY

A small backstory on how Visual Translate started.

The idea goes back to October 2025. Around that time we noticed that many great educational videos weren’t being translated well. A big reason was that a lot of key information wasn’t only in the narration, but in text inside the visuals — slides, diagrams, labels, and callouts.

When we looked at existing video translation tools, we realized that this layer was still largely unsolved.

So we decided to try building it.

Huge credit to our engineer Naro. She started experimenting with the idea back in October and built the very first prototype and pipeline herself. The demo she showed the team was still rough, but the results were already surprisingly impressive.

Naro is honestly one of those engineers who are both brilliant and delightful to work with — sharp, curious, and incredibly creative when exploring new ideas. That early experiment she built convinced us this was worth turning into a real product, and the rest of the team quickly rallied around the idea.

Josie OY

The project was officially green-lit in December, when we started the actual product design. Our founder CY, Tech Lead Fei, and I worked closely together to define the product direction and what Visual Translate should really be.

I also want to give a big thanks to our former engineer Yetong, who co-created the editor UX with me. Building the editing experience was one of the most challenging parts. There really wasn’t anything comparable on the market to reference.

Designing something this new has been incredibly exciting for me. I’m genuinely proud that we were able to create an editor that makes this workflow possible.

Josie OY

After the project was officially started, a huge part of the work moved into the modeling and engineering side.

I want to give special thanks to our algorithm engineers Xin Jin, Pengfei, and Boya. They built and shipped the models and pipelines behind this feature.

Handling text inside videos is actually a very challenging problem. The text can appear in many different forms — different languages, fonts, layouts, styles, and even animations. Every video can present a new combination of cases.

Thanks to their work, we’ve already gone through three iterations of the system, and each version has become more stable and more accurate.

They are truly brilliant algorithm engineers, and their work laid a strong foundation for this entire project.

Josie OY

Another person we’re deeply grateful to is our Tech Lead Fei.

He designed the core text layout algorithm behind Visual Translate. When translating from one language to another, the length of the text often changes significantly. Making sure the translated text still fits naturally in the video while keeping the visuals clean is not trivial.

It becomes even more complex when there are multiple text elements on the same frame — their spatial relationships and layout all need to be preserved.

Fei’s algorithm makes this possible. It ensures the translated text can be reflowed and presented properly inside the original visuals. His work played a key role in making the whole system usable in real-world videos.

Josie OY

I also want to thank the engineers who made the product actually come to life.

Our frontend engineers Junjun and Kiya built the entire frontend workflow and the Visual Translate editor. As mentioned earlier, this is a completely new type of product — there wasn’t really anything similar on the market to reference. Many of the interactions and editing behaviors had to be designed and implemented from scratch.

Their work turned these ideas into a real, elegant editing experience. The editor you see today — smooth, intuitive, and flexible — would not exist without their craftsmanship and attention to detail.

I also want to thank our backend engineer Lucas. He built the backend architecture for this feature almost entirely by himself. At the same time, he was also responsible for several other feature developments across Vozo, which made the work especially demanding.

Despite that, he managed to design and implement a solid backend foundation for Visual Translate. We’re very grateful for his dedication and the reliability of the work he delivered.

Finally, I want to give another special thanks to Naro, who served as the engineering lead for this entire project.

Her role went far beyond frontend development. She was deeply involved in shaping the overall technical approach — from algorithms and pipelines to the product workflow. She also coordinated engineering progress across the team and kept the project moving forward.

Thanks to her leadership and persistence, the whole project was delivered on time and at a very high quality. We’re incredibly grateful for everything she contributed to making Visual Translate possible.

Josie OY

We’d also like to thank our QA engineers Menghan and Meixia, who were responsible for testing this entire feature.

As a brand new product, there were many aspects that needed to be tested — from product logic and interaction flows to algorithm performance and the quality of the generated results.

The first version of this feature was especially challenging. The system went through multiple rounds of model and algorithm updates, and the behavior could change significantly between iterations. That made the testing process far more demanding.

Thanks to their persistence and attention to detail, we were able to keep improving the system while maintaining stability. Their work played a crucial role in making sure Visual Translate could launch on time with reliable quality.

Josie OY

I’d also like to thank our website designer Ushuan. She led the upgrade of our entire website’s visual design and was almost entirely responsible for the design, build, and implementation herself.

For example, the landing page for Visual Translate was designed and built by her. She turned a complex product concept into a clear and elegant presentation that helps people quickly understand what this feature can do.

If you’re curious to see it, take a look here:
https://www.vozo.ai/visual-translate

Jessica Miller

What happens when the translated text is longer than the original space allows?

Josie OY

@jessica_miller_7 Great question — especially since different languages can vary a lot in length. For example, Chinese text can become much longer when translated into English.

Our system analyzes the video frame, text length, and layout to compute a new layout that fits best. It can automatically adjust font size, reflow the text, and handle line breaks.

This way, the translated text stays within the visual boundaries and keeps the video looking clean and natural.

Elaine Lu

@jessica_miller_7 Nice catch! This is where the magic happens. Give it a try and you’ll see how deeply our AI model understands the correct layout based on the surrounding context and text.

JoJo

@jessica_miller_7 Great question, Jessica! I can tell you’re a localization expert 😄 Hope Josie's reply helps. And feel free to give it a try, would love to hear what you think!

Ali Goldberg

Omg, living in a foreign country you have no idea how amazing this is! I am so excited to try. Do you have Hebrew?

JoJo

@ali_goldberg Hi Ali, thanks for your kind words! Visual Translate Beta doesn’t support Hebrew yet, but our Translate & Dub feature does. Feel free to give it a try and let us know what you think :)

Elaine Lu

@ali_goldberg Thanks for the encouragement! We don’t currently support Hebrew for the Visual Translation feature because it’s a right-to-left language, which makes layout reconstruction more challenging. In some cases, visual elements may also need to be flipped to look correct.

That said, we’re definitely working to improve our AI models and system so we can support this important language in the future.

Josie OY

One design choice we cared a lot about while building Visual Translate is editability.

A lot of AI tools today focus on full generation. That works well for creating something from scratch, but in many real workflows people aren’t starting from zero. They already have a finished video and just need to adapt it for another language or audience.

Instead of regenerating the video, Visual Translate separates the text layer inside the video, translates it, and rebuilds it back into the visuals while keeping everything editable. You can adjust wording, layout, or styling directly in the editor.

For us, this approach fits much better with how video localization actually happens in practice. It’s been really exciting to see teams in different industries already using it for training videos, explainers, and internal communication.

Elaine Lu

@josie_oy Thanks for highlighting this important feature! This will definitely help make it production-ready for users.

JoJo

@josie_oy A round of applause for our product and R&D team! Thank you for making it possible for important information to travel across language barriers! 🎉

Josie OY

@jojo_li Thank you Jojo! And really appreciate all the effort you put into this campaign as well. Couldn't have done it without the team. 🙌

Fu

How does Vozo handle very small or faint text?

Josie OY

@flora07  In most cases, if the text is visible to the human eye, Vozo can detect and translate it.

Very small or faint text can sometimes be more challenging, and like any model we can’t guarantee perfect handling for every edge case. We’re continuously improving the detection and translation quality to make it more robust over time.

Elaine Lu

@flora07 One more thing worth mentioning: it’s not a one-shot process. If the model misses some text, you can select the region and trigger a more detailed detection just for that area.

This greatly increases the chances of capturing and translating the text correctly. More details are available in our docs.

CY

@flora07 From our testing, small text is often detected surprisingly well. If anything gets missed, Visual Translate lets you manually select the area and trigger translation for that region.

Lily Liu

Can Vozo translate screenshots embedded inside videos?

Josie OY

@lily_liu8 Vozo can detect and translate explanatory text that appears inside videos.

However, we usually don’t automatically translate screenshots or UI elements embedded in the video. In many cases those are meant to stay exactly as they are.

If you do want them translated, you can manually select the text area in the editor and click “Regenerate” to translate it. Our editor is designed to be flexible, so you can easily adjust and translate elements that weren’t processed automatically.

JoJo

@lily_liu8 Hi Lily, thanks for your question. as @josie_oy replied, we currently dont support automatically translate screenshot, but you can select to add. Here is a how-to video, hope it helps :)

CY

@lily_liu8  Great question! Our model tries to infer whether text should be translated based on the context. For example, logos or text that belongs to real-world objects are usually left unchanged.

Screenshots can vary, so it may depend on the specific case. But you can always manually tell Vozo which areas you want or don’t want translated in the editor.

Dmitry Zakharov
Hi Vozo team, Looks amazing! 🙌 I’m curious what’s the difference between your product and HeyGen?
Elaine Lu

@dmitry_zakharov_ai Thanks!

We’re launching a new feature that translates on-screen text inside videos and makes it editable after translation.

This means you can now add localized subtitles, dubbing with lip-sync, and localize text on screen all in one place. In practice, it lets you fully localize a video instead of just translating the audio. On the other hand, HeyGen focuses on AI Avatar + Dubbing with LipSync for translation.

Give it a try :)

CY

@dmitry_zakharov_ai Great question! Vozo focuses on video localization — translating voice, subtitles, and on-screen text so a video can be fully adapted for different languages.

HeyGen is more focused on AI avatars / digital humans. There’s some overlap around dubbing, but the core use cases are a bit different.

Sandy Liu

Congrats on the launch, The demo looks great! I’m definitely interested in trying it out.

JoJo

@sandy_liusy Thanks so much, Sandy, for your kind words! Really appreciate it! looking forward to hearing your feedback.

Josie OY

@sandy_liusy Thank you so much for the support Sandy, really appreciate it!

We’d love for you to give it a try, and we’re also very curious to see what kind of videos you might use it for or share with it. Looking forward to hearing how it works for you.

Thea

What scenarios do you think Vozo works best for today, and where does it struggle the most?

Elaine Lu

@thea5 Thanks for asking! For visual translation, slides and product demo–style videos work best. This includes content such as e-learning, training materials, and marketing videos.

At the moment, it doesn’t work perfectly for videos with animated backgrounds or moving text. Like those entertainment callouts. We’re actively working to improve those cases and bring a more universal experience to all users.

CY

@thea5 Great question!

Right now the current version works best with slide-style and explainer videos, where a lot of key information appears visually on screen.

Think of scenarios like training materials, presentations, product introductions, financial briefings, or talking-head videos with text overlays. These formats are usually information-heavy, with slides, labels, diagrams, or callouts that stay on screen long enough for the system to detect and translate while preserving the layout.

Where it can still struggle today is with highly dynamic visuals, like moving text or complex animated backgrounds. We’re actively improving those cases so the experience becomes more universal across different video styles.

123
•••
Next
Last