Translate every layer: voice, subtitles & on-screen text

Start new thread

Visual Translate by Vozo - Translate text in your videos without recreating visuals

Raycast

•22d ago

Fully translated videos — finally. Visual Translate adds the final layer — translating text inside videos — on top of voice dubbing, lip-sync, and subtitles. It detects and translates on-screen text, from slides and diagrams to callouts and labels, while preserving the original layout, style, and animation. Turn slide videos and explainers into multilingual versions and reach a global audience — without recreating visuals from scratch.

Replies

Best

Sounds really cool! How many languages are supported, and do you clone voices?

Report

20d ago

Vozo AI — Video localization

Maker

@mykyta_semenov_ Thanks! Visual Translate currently supports 68 target languages, and our dubbing supports 73 languages. Our dubbing feature also supports voice cloning to preserve the speaker’s voice.

Report

19d ago

The onboarding experience is seamless, the UI is exceptionally well-designed, and the final results are impressive. Excellent work on this, though the queue time seem long, i guess it has to do with the launch day traffic

Report

22d ago

Vozo AI — Video localization

Maker

@x_ronxo @x_ronxo Thanks a lot for the kind words!

There actually isn’t a queue on our side. The waiting time mainly comes from processing the video visuals themselves, so depending on the video length and complexity it may take a bit of time to finish.

Report

22d ago

Elser AI

Does Vozo show which areas of the frame were detected as text?

Report

22d ago

Vozo AI — Video localization

Maker

@airmusic Yes, our AI model separates the video into different visual layers across the entire frame, allowing it to analyze each area throughout the video. It also detect the exact starting and ending frame that text appears and disappear to make an accurate text replacement.

Report

22d ago

Elser AI

How well does Vozo work for tutorial videos with heavy UI overlays?

Report

22d ago

Vozo AI — Video localization

Maker

@fanyifanzaiqingdao Good question.

Vozo can work well with tutorial videos that have UI overlays, especially when the overlays include explanatory text such as labels, callouts, or annotations.

For actual UI screenshots or product interfaces, we usually keep them unchanged by default since they often need to stay consistent with the real product UI. If you do want them translated, you can simply select the text area in the editor and regenerate the translation.

Right now Visual Translate works best with videos like training videos, slide videos, and explainers, where the text layer helps explain what’s happening on screen.

Report

22d ago

Vozo AI — Video localization

Maker

@fanyifanzaiqingdao Complex overlaps are something our model can handle reasonably well in many cases today. Feel free to give it a try and see how it works on your videos!

Report

22d ago

ZenMux

Could I generate EN / JP / ES versions from one source video?

Report

22d ago

Vozo AI — Video localization

Maker

@olivia_ma Yes! You could generate multiple language versions with one click.

Report

22d ago

Documentation.AI

That's really interesting. What are you using at the backend to do so?

Report

22d ago

Vozo AI — Video localization

Maker

@roopreddy Thanks!
We develop our own AI models and system pipeline, combined with some of the most advanced LLMs, to address this problem as there is no solution to achieve this on the market.

Report

22d ago

Would this allow my loom video to be translated? Are there any integrations, or I will have to upload the video to the platform?

Report

22d ago

Vozo AI — Video localization

Maker

@zerotox Yes, Loom videos can definitely be translated.

At the moment, the workflow is to download the video and upload the file to Vozo for processing. We don’t have a direct Loom integration yet.

Once uploaded, Vozo can translate the voice, subtitles, and on-screen text together.

If Loom integration would be useful for your workflow, we’d love to hear more about it!

Report

22d ago

Really impressive work on the on-screen text layer — that's been the missing piece for years. I run explainer videos for a SaaS product and dubbing audio was easy, but our slide content always stayed in English. Quick question: do you support batch processing for multiple videos at once, or is it currently one-by-one? Would love to know if enterprise/API access is on the roadmap since we'd use this heavily.

Report

22d ago

Vozo AI — Video localization

Maker

@ilya_lee Thanks for the thoughtful comment — really glad the on-screen text layer resonates with you!

We’re currently in beta. Processing is handled concurrently, but batch uploading multiple videos isn’t supported yet. It’s on our roadmap and something we plan to add soon.

For API access, we’ll consider opening it up once we see stronger enterprise demand. In the meantime, you’re very welcome to try our SaaS product and share any feedback.

If you’d like to discuss enterprise use cases in more detail, feel free to reach out to our BD team at bd@vozo.ai.

Report

22d ago

Vozo AI — Video localization

Maker

@ilya_lee Appreciate the thoughtful question!

Right now videos are processed individually. Batch uploads and APIs are on our roadmap.

Report

22d ago

Really impressive approach to full-layer translation — most tools only handle subtitles but ignoring on-screen text is a huge gap. How accurate is the lip-sync for languages with very different syllable structures like Turkish or Japanese?

Report

22d ago

Vozo AI — Video localization

Maker

@listsgenie Thanks!

Our lip-sync system is language-independent and works based on audio signals rather than specific languages. In general, if the sounds are similar across languages, the lip movements will also appear similar.

Our LipReal model is trained on a large multilingual dataset, which helps handle these cases well. However, some languages involve different mouth movements that can produce similar sounds, which may occasionally lead to minor inaccuracies.

Feel free to give it a try and see how it works for your use case — we’d love to hear your feedback.

Report

22d ago

Vozo AI — Video localization

Maker

@listsgenie Thanks for the thoughtful question!

Our lip-sync is audio-driven rather than language-specific, so it generally adapts well even across languages like Turkish or Japanese. It’s still improving, but we’ve seen solid results across many multilingual videos.

Report

22d ago

How does Vozo handle very small or faint text?

Report

21d ago

Vozo AI — Video localization

Maker

@zack_zheng Generally, if the text is visible and readable, our system can detect and translate it.

If some text isn’t detected automatically, you can simply select it in the editor and regenerate that region — the system will then process and translate it.

Report

21d ago

•••

6 7 8 9