Visual Translate by Vozo - Translate text in your videos without recreating visuals
by•
Fully translated videos — finally.
Visual Translate adds the final layer — translating text inside videos — on top of voice dubbing, lip-sync, and subtitles. It detects and translates on-screen text, from slides and diagrams to callouts and labels, while preserving the original layout, style, and animation. Turn slide videos and explainers into multilingual versions and reach a global audience — without recreating visuals from scratch.



Replies
Vozo AI — Video localization
@nah_na Yes, Vozo supports a glossary.
Your glossary acts as a reusable asset and can be applied across our different translation tools, including Visual Translate, Translate & Dub, and Translate Subtitles.
This helps ensure that key terms, brand names, and preferred translations stay consistent across all your videos, no matter which workflow you use.
Vozo AI — Video localization
@nah_na Yes, Vozo supports a glossary to keep brand terms and key phrases consistent across translations. And we’re continuing to improve the glossary feature as we learn from real user workflows.
Great! The product presenters and YouTubers (like me) have been longing for is here! I'm so excited to try this out because this empowers presenters to go global. I have a few questions.
If there's "Moving" text on the screen, huge enough to cover the whole screen like a book page, can it be fully translated without cutting out the text on the boundaries?
Which video formats does it support?
Congrats on the launch!
Vozo AI — Video localization
@atwijukire_ariho_seth Thanks for the thoughtful questions!
First, we currently support MP4, MOV, WEBM, AVI, and WMV formats.
Regarding the case you mentioned:
At the moment we mainly support entry and exit animations for on-screen text.
For text that keeps moving continuously across the frame, the results may not be perfect yet. Improving this is one of the next areas we’re actively working on.
About the situation you described where the text covers almost the entire screen like a book page — I’d love to understand that case a bit better:
Is it because the font size is very large?
Or because the text content itself is very long?
Our current layout logic tries to avoid letting long translated text overflow beyond the screen boundaries.
If possible, could you share a YouTube link and mention the timestamp where this happens? That would really help us take a closer look at the exact case.
Ohhh, thanks for the clarification @josie_oy, I don't have a specific video I have in mind that has that specific case, but I was trying to imagine any possible scenario.
But I've surely loved the idea behind Vozo. Thanks for your time.
Vozo AI — Video localization
@atwijukire_ariho_seth Thanks for the kind words! Really appreciate the encouragement.
We’d love for you to give Vozo a try and see how it works on real videos. We’re continuing to improve the product, and feedback from creators like you is incredibly helpful.
Vozo AI — Video localization
@atwijukire_ariho_seth Thanks for the thoughtful questions! These are great points. Let me answer them one by one.
Moving text
This is indeed a challenging case. At the moment, we don’t support continuously moving text very well (for example, text that scrolls across the screen like a webpage). Entry and outro animations usually work fine, but screen recordings with page scrolling can still be difficult. It’s an area we’re actively working on improving.
Text near the boundaries
Our AI model analyzes the text it detects as a whole across multiple frames. Even if part of the text is only partially visible in a single frame, the system can reference the frames before and after to better understand it. When placing the translated text, the layout is carefully recalculated so the full text appears properly within the video frame.
I hope this helps clarify things! Feel free to give it a try, and we’d love to hear your feedback as a presenter/YouTuber.
BIOS
Congrats on the launch! Translations seem super natural! 🎉
Vozo AI — Video localization
@mihailojovanovich Thank you Mihailo, glad you liked it!
IMAI Studio
amazing tool i love the concept
Vozo AI — Video localization
@sammy_xf Thank you for your support!
Tonkotsu
Congrats on the launch! This is really cool. How do you think about realtime translation? I can see this being useful at conferences and live events.
Vozo AI — Video localization
@derekattonkotsu Thanks! Real-time translation is definitely an exciting direction, especially for conferences and live events.
Right now we focus mainly on post-production video localization, where accuracy and layout reconstruction are critical for translating subtitles, dubbing, and on-screen text together. Real-time scenarios introduce additional challenges around latency and visual reconstruction, but it’s something we’re actively exploring for the future.
Tried Vozo and was really impressed by the lip-sync accuracy—it’s a huge step up from generic tools! My main curiosity is around edge cases: How well does the model handle profile shots or moments of high emotion (like shouting or laughing) where mouth shapes are very dynamic? Curious how robust the "human-level" sync is in those tricky scenarios.
Vozo AI — Video localization
@wasil_abdal Great question. Vozo’s lip-sync actually models a fairly large region — from the face down to the neck — which helps capture a wider range of expressions and motion.
That said, very high-emotion moments (shouting, laughing, etc.) are still challenging and really push the boundary of current lip-sync tech. We’re continuing to improve those edge cases as the models evolve.
Cue
Love this. The on-screen text translation is the piece most video localization tools completely skip over. Being able to translate slides and diagrams inside the video without rebuilding the visuals is a huge time saver. Curious how it handles text that's baked into animations or motion graphics?
Vozo AI — Video localization
@dparrelli In most cases our ai model reconstructs the background behind the original text and then renders the translated text back into the scene.
So even when the text is baked into animations or graphics, the system can remove the original layer and place the translated version while keeping the visuals consistent.
Happycapy
Can subtitle translation and visual translation be handled together?
Vozo AI — Video localization
@min_zhou Great question. This is something we’ll be supporting very soon. The goal is to let users handle subtitle translation and visual text translation together in the same workflow.
@min_zhou @josie_oy Same question here. Look forward to the day when all Vozo capabilities are fully connected🙌
Minara
Nice work! @lightfield Congrats on the launch!
Can Vozo keep text aligned with moving objects?
Vozo AI — Video localization
@lightfield @amberjolie Thanks for the support!
Right now in the beta version, we mainly support entry and exit animations for on-screen text. Continuous motion (for example text that keeps moving with an object across the frame) is still a challenging case and not something we handle very well yet.
At the moment, Visual Translate works best with videos that have simpler motion, such as slide videos and explainer videos where text appears with basic animations.
Supporting more complex motion and alignment is definitely something we’re actively working on next.
Vozo AI — Video localization
@amberjolie Great question! For now, complex motion isn’t handled very well yet. Visual Translate works best with simpler animations, and better support for complex movement is something we’re working on.
Typeless
How does Vozo handle text over complex backgrounds or gradients?
Vozo AI — Video localization
@yuki1028
Hi Yuki, first of all, I really love Typeless! Great work!
For complex background and gradients, I would say it depends because AI needs to estimate what is behindn the text and it could be hard if the background is complex, our AI model performs better on those simpler backgrounds. You are welcome to give it a try!
Vozo AI — Video localization
@yuki1028 Thanks for the question! Overall our model handles different backgrounds reasonably well in many cases. Feel free to give it a try and see how it works with your videos.