Chris Messina

Visual Translate by Vozo - Translate text in your videos without recreating visuals

Fully translated videos — finally. Visual Translate adds the final layer — translating text inside videos — on top of voice dubbing, lip-sync, and subtitles. It detects and translates on-screen text, from slides and diagrams to callouts and labels, while preserving the original layout, style, and animation. Turn slide videos and explainers into multilingual versions and reach a global audience — without recreating visuals from scratch.

Add a comment

Replies

Best
Sylvia

Is visual translation a separate module or part of the main workflow?

Elaine Lu

@sylvia_weng99 Great thoughts! Currently, it’s a dedicated workflow, but we’re planning to merge all video translation capabilities — subtitles, dubbing with lip-sync, and visual translation — into a single, unified experience.

Harry Zhang

Hey team, congrats on the launch! Super polished product with a validated real world use case. Professional demo. Excited to try it out. Wondering if you offer an open API?

Josie OY

@harryzhangs Thanks a lot for the kind words — really appreciate it!

We’re currently in beta, so we haven’t opened up a public API yet. If we see strong enterprise demand, we may consider offering API access in the future.

That said, we believe the SaaS workflow works best for this kind of product. Video localization usually requires review and edits during the process. Our editor lets you visually compare the original and translated video side by side, and directly adjust the text, layout, and styling in context, which makes the workflow much more intuitive.

Elaine Lu

@harryzhangs Thanks! We’re currently in beta, and we’ll definitely consider offering an open API in the future, including possible support for AI agents to interact with it.

Asti Pili

Does it preserve Voice and emotion? or it sounds like Netflix's international movie dubbing ? :)

Josie OY

@asti_pili Our dubbing feature is designed to preserve the speaker’s voice and emotional tone, so it doesn’t sound like traditional movie-style dubbing.

For this launch, though, we’re introducing Visual Translate, which focuses on translating text that appears inside the video itself — things like slides, labels, diagrams, and on-screen callouts — while keeping the original layout and visuals intact.

So together with dubbing, subtitles, and lip-sync, it helps localize the entire video.

JoJo

@asti_pili Hahaha, should we tag Netflix here? Just kidding 😄

BTW, really great to see you here! I love your product, and your intro video is so well done: the storytelling is brilliant and super engaging.

CY

@asti_pili Great question! Our translate & dub feature is designed to preserve the speaker’s voice tone and emotion during translation.

Many users are already using it to localize international films and even the recent wave of mini-dramas, with pretty natural-sounding results.

Frank Li

This could save a lot of manual After Effects work.

Elaine Lu

@frank_li13 Yes!

It makes large-scale on-screen text translation much easier. Give it a try — we’d love to hear your feedback.

JoJo

@frank_li13 Exactly! Our in-house designer loved Visual Translate so much

Klaus - WONG CHI HUNG

How much manual cleanup is usually needed after auto visual translation?

Josie OY

@hkklaus97  Our system already handles the extraction and rebuilding of the visual text elements automatically. In most cases, the remaining work is mainly reviewing the result and making small adjustments if needed.

That’s also why we made the editor fully editable — so users can quickly refine wording, layout, or styling when necessary.

CY

@hkklaus97 Good question! In most cases the results are already quite complete, especially for slide videos and explainer videos. Sometimes people still tweak the generated text style a bit to match their preferences.

Jamie

When space is limited, how does Vozo handle it? Does it prioritize readability or literal accuracy?

Elaine Lu

@tabmanj Thanks for asking! We handle this in a few different ways:

• Adjusting the font size

• Breaking the text into multiple lines

• Shortening the translation when necessary

A well-tuned AI system dynamically selects the best option based on the context and layout of the video.

CY

@tabmanj Great question! Our model considers multiple factors — layout, readability, and context — to choose the best possible way to fit the translated text into the available space.

Denis Akindinov

Hey, congrat for a launch

Josie OY

@mordrag Thanks so much! Really appreciate the support.

Leo Jiang

How well does Vozo handle translating UI-heavy SaaS walkthroughs?

Josie OY

@leo_aj Great question — Visual Translate is currently optimized mainly for slide-based videos and explainer-style videos, where a lot of explanatory text appears on screen.

Vozo can detect and translate explanatory text in the interface — things like tooltips, labels, highlights, or annotations that appear during the walkthrough.

For actual UI screenshots or product interfaces, we usually keep them unchanged by default, since those elements often need to stay consistent with the real product UI. If you do want them translated, you can simply select the text area in the editor and click “Regenerate” to translate it.

This way you can keep the UI authentic while still localizing the explanatory layer around it.

CY

@leo_aj Great question! The current beta isn’t specifically optimized for SaaS walkthroughs yet, but it can still handle many cases. Improving support for UI-heavy videos is something we’re hoping to roll out soon.

Kiya L.

This is perfect for educational videos where visuals carry as much meaning as the narration. Congrats on the launch!

One quick question, do you offer API?

Josie OY

@kiyaaa_ Thanks for the kind words!

We’re currently in beta, so we haven’t opened up a public API yet. If we see strong enterprise demand, it’s something we may consider in the future.

For now, we’ve focused on building a SaaS workflow, because video localization usually involves review and edits along the way. Our editor lets you compare the original and translated visuals side by side, and directly adjust the text, layout, and styling when needed.

Kiya L.

@josie_oy Oh nice👍, it's great that it is able to edit the translated visuals directly. Curious if the system detects and translates some on-screen text but the user actually wants to keep the original text, is it also possible to skip or revert that translation?

Josie OY

@kiyaaa_ Yes, we support that.

In the very first version we launched, there wasn’t an easy way to handle this case. But we quickly realized it can create problems in real production scenarios. For example, a brand name or product term might appear on screen and shouldn’t be translated, but the system may translate it automatically.

So in an update we shipped last week, we added a “Revert to Original” option. You can simply select the translated text and revert it back to the original text and styling from the source video, without affecting any other translated elements in the frame.

Kiya L.

@josie_oy What about texts that the system didn't detect though? I know we can add new text elements in the editor, but since the original text is still visible in the frame, how do you deal with those cases?

JoJo

@kiyaaa_ Thank you so much, Kiya! Really appreciate it. And yes! exactly!
A lot of important videos contain key on-screen text, and we want to make sure that information can still be clearly understood across languages.

Janice

In what scenarios does visual translation make the biggest difference compared to subtitles?

Elaine Lu

@janicelewis00 Thanks for asking! Visual translation is especially useful when important information appears in the video itself rather than being spoken.

For example, in slide-based training videos or product demos with detailed specifications, the visuals often convey much more information than the audio or subtitles.

CY

@janicelewis00 Great question! Visual translation makes the biggest difference in videos like product demos, technical talks, or business presentations with lots of slides and on-screen text.

Subtitles preserve the spoken information, while Visual Translate preserves the information shown on screen — so viewers don’t miss either layer.