
Vozo AI — Video localization
Translate every layer: voice, subtitles & on-screen text
4.5•13 reviews•3.1K followers
Translate every layer: voice, subtitles & on-screen text
4.5•13 reviews•3.1K followers
Vozo AI delivers complete video translation — across voice, subtitles, lip-sync, and on-screen text.
Unlike traditional dubbing tools, Vozo translates every layer while keeping speech natural, lips perfectly synced, and visuals consistent. Turn one video into multilingual versions that look and feel native.
This is the 3rd launch from Vozo AI — Video localization. View more
Visual Translate by Vozo
Launched this week
Fully translated videos — finally.
Visual Translate adds the final layer — translating text inside videos — on top of voice dubbing, lip-sync, and subtitles. It detects and translates on-screen text, from slides and diagrams to callouts and labels, while preserving the original layout, style, and animation. Turn slide videos and explainers into multilingual versions and reach a global audience — without recreating visuals from scratch.






Free Options
Launch Team / Built With





Vozo AI — Video localization
A small backstory on how Visual Translate started.
The idea goes back to October 2025. Around that time we noticed that many great educational videos weren’t being translated well. A big reason was that a lot of key information wasn’t only in the narration, but in text inside the visuals — slides, diagrams, labels, and callouts.
When we looked at existing video translation tools, we realized that this layer was still largely unsolved.
So we decided to try building it.
Huge credit to our engineer Naro. She started experimenting with the idea back in October and built the very first prototype and pipeline herself. The demo she showed the team was still rough, but the results were already surprisingly impressive.
Naro is honestly one of those engineers who are both brilliant and delightful to work with — sharp, curious, and incredibly creative when exploring new ideas. That early experiment she built convinced us this was worth turning into a real product, and the rest of the team quickly rallied around the idea.
Vozo AI — Video localization
The project was officially green-lit in December, when we started the actual product design. Our founder CY, Tech Lead Fei, and I worked closely together to define the product direction and what Visual Translate should really be.
I also want to give a big thanks to our former engineer Yetong, who co-created the editor UX with me. Building the editing experience was one of the most challenging parts. There really wasn’t anything comparable on the market to reference.
Designing something this new has been incredibly exciting for me. I’m genuinely proud that we were able to create an editor that makes this workflow possible.
Vozo AI — Video localization
After the project was officially started, a huge part of the work moved into the modeling and engineering side.
I want to give special thanks to our algorithm engineers Xin Jin, Pengfei, and Boya. They built and shipped the models and pipelines behind this feature.
Handling text inside videos is actually a very challenging problem. The text can appear in many different forms — different languages, fonts, layouts, styles, and even animations. Every video can present a new combination of cases.
Thanks to their work, we’ve already gone through three iterations of the system, and each version has become more stable and more accurate.
They are truly brilliant algorithm engineers, and their work laid a strong foundation for this entire project.
Vozo AI — Video localization
Another person we’re deeply grateful to is our Tech Lead Fei.
He designed the core text layout algorithm behind Visual Translate. When translating from one language to another, the length of the text often changes significantly. Making sure the translated text still fits naturally in the video while keeping the visuals clean is not trivial.
It becomes even more complex when there are multiple text elements on the same frame — their spatial relationships and layout all need to be preserved.
Fei’s algorithm makes this possible. It ensures the translated text can be reflowed and presented properly inside the original visuals. His work played a key role in making the whole system usable in real-world videos.
Vozo AI — Video localization
I also want to thank the engineers who made the product actually come to life.
Our frontend engineers Junjun and Kiya built the entire frontend workflow and the Visual Translate editor. As mentioned earlier, this is a completely new type of product — there wasn’t really anything similar on the market to reference. Many of the interactions and editing behaviors had to be designed and implemented from scratch.
Their work turned these ideas into a real, elegant editing experience. The editor you see today — smooth, intuitive, and flexible — would not exist without their craftsmanship and attention to detail.
I also want to thank our backend engineer Lucas. He built the backend architecture for this feature almost entirely by himself. At the same time, he was also responsible for several other feature developments across Vozo, which made the work especially demanding.
Despite that, he managed to design and implement a solid backend foundation for Visual Translate. We’re very grateful for his dedication and the reliability of the work he delivered.
Finally, I want to give another special thanks to Naro, who served as the engineering lead for this entire project.
Her role went far beyond frontend development. She was deeply involved in shaping the overall technical approach — from algorithms and pipelines to the product workflow. She also coordinated engineering progress across the team and kept the project moving forward.
Thanks to her leadership and persistence, the whole project was delivered on time and at a very high quality. We’re incredibly grateful for everything she contributed to making Visual Translate possible.
Vozo AI — Video localization
We’d also like to thank our QA engineers Menghan and Meixia, who were responsible for testing this entire feature.
As a brand new product, there were many aspects that needed to be tested — from product logic and interaction flows to algorithm performance and the quality of the generated results.
The first version of this feature was especially challenging. The system went through multiple rounds of model and algorithm updates, and the behavior could change significantly between iterations. That made the testing process far more demanding.
Thanks to their persistence and attention to detail, we were able to keep improving the system while maintaining stability. Their work played a crucial role in making sure Visual Translate could launch on time with reliable quality.
Vozo AI — Video localization
I’d also like to thank our website designer Ushuan. She led the upgrade of our entire website’s visual design and was almost entirely responsible for the design, build, and implementation herself.
For example, the landing page for Visual Translate was designed and built by her. She turned a complex product concept into a clear and elegant presentation that helps people quickly understand what this feature can do.
If you’re curious to see it, take a look here:
https://www.vozo.ai/visual-translate
The onboarding experience is seamless, the UI is exceptionally well-designed, and the final results are impressive. Excellent work on this, though the queue time seem long, i guess it has to do with the launch day traffic
Vozo AI — Video localization
@x_ronxo @x_ronxo Thanks a lot for the kind words!
There actually isn’t a queue on our side. The waiting time mainly comes from processing the video visuals themselves, so depending on the video length and complexity it may take a bit of time to finish.
Is visual translation a separate module or part of the main workflow?
Vozo AI — Video localization
@sylvia_weng99 Great thoughts! Currently, it’s a dedicated workflow, but we’re planning to merge all video translation capabilities — subtitles, dubbing with lip-sync, and visual translation — into a single, unified experience.
Minara
When space is limited, how does Vozo handle it? Does it prioritize readability or literal accuracy?
Vozo AI — Video localization
@tabmanj Thanks for asking! We handle this in a few different ways:
• Adjusting the font size
• Breaking the text into multiple lines
• Shortening the translation when necessary
A well-tuned AI system dynamically selects the best option based on the context and layout of the video.
Vozo AI — Video localization
@tabmanj Great question! Our model considers multiple factors — layout, readability, and context — to choose the best possible way to fit the translated text into the available space.
i'm a handmade craft creator and i run my own shop on Esty. I've already tried this product and i'm honestly amazed!
i have some videos of myself sculpting clay. before i found this visual translator, i had to manually translate them from Chinese into English. it really took me lot of work. i needed to prepare the translated text myself and then produce a separate English version of video.
Now i can just use this tool to upload my Chinese video and then the Chinese text displayed in the video is automatically translated and be replaced into English. it's incredibly fast and saves me so much time! i'm sure i'll keep using this product!!
Vozo AI — Video localization
@ushuanc Thank you so much for sharing this. It’s really great to hear how you’re using it for your clay sculpting videos.
Helping creators translate videos without recreating everything from scratch is exactly what we hoped to make easier. Really glad it’s saving you time.
If you ever have ideas or feedback while using it, we’d love to hear them!
Vozo AI — Video localization
@nah_na Yes, Vozo supports a glossary.
Your glossary acts as a reusable asset and can be applied across our different translation tools, including Visual Translate, Translate & Dub, and Translate Subtitles.
This helps ensure that key terms, brand names, and preferred translations stay consistent across all your videos, no matter which workflow you use.
Vozo AI — Video localization
@nah_na Hope it’s useful for your team!
Vozo AI — Video localization
@nah_na Yes, Vozo supports a glossary to keep brand terms and key phrases consistent across translations. And we’re continuing to improve the glossary feature as we learn from real user workflows.
AdFox (formerly GoodsFox)
In what scenarios does visual translation make the biggest difference compared to subtitles?
Vozo AI — Video localization
@janicelewis00 Thanks for asking! Visual translation is especially useful when important information appears in the video itself rather than being spoken.
For example, in slide-based training videos or product demos with detailed specifications, the visuals often convey much more information than the audio or subtitles.
Vozo AI — Video localization
@janicelewis00 Great question! Visual translation makes the biggest difference in videos like product demos, technical talks, or business presentations with lots of slides and on-screen text.
Subtitles preserve the spoken information, while Visual Translate preserves the information shown on screen — so viewers don’t miss either layer.