Half your video isn’t being translated
We just launched on-screen text translation — not just subtitles.
As a quick example, we translated Jensen Huang’s GDC talk into Chinese:
audio → Chinese voice, tone preserved
on-screen content → localized, style preserved
👉 audio + visuals, fully aligned
This is one of the core use cases behind our recent launch (#2 Product of the Month, right behind Google’s Stitch 2).
More examples coming — curious what you think.
85 views



Replies
This feels way more complete than subtitle only localization. How big a difference has on screen text translation made in output quality?
Vozo AI — Video localization
@oliver_nathan2 Bigger than expected.
On-screen text often carries info that never shows up in the audio—and when it’s not translated, it breaks immersion instantly.
Once that layer is localized, it finally feels complete.
Audio translations gets most of the attention, but visual text is usually what breaks immersion. Was that the main gap you wanted to solve?
Vozo AI — Video localization
@paige_lauren1 Yeah—that was a big part of it. Audio gets most of the attention, but on-screen text often carries key info that never makes it into the voice.
"Audio + visuals fully aligned" is the part that stands out most here, what was the hardest part technically?
Vozo AI — Video localization
@jade_melissa1 yes, solving that meant treating video as a multimodal problem—not just voice, but visuals + layout together.
This seems like one of those features that feel obvious once it exists, what made you prioritize it now?
Vozo AI — Video localization
@ian_maxwell2 It felt obvious in hindsight, but much harder to actually solve.
We finally worked out an AI approach that can handle both audio and on-screen elements together—but getting there took longer than expected. The edge cases around layout + visuals are no joke.
Felt like the right time once the tech actually caught up.
Vozo AI — Video localization
@chrismessina thanks again for the push to share more examples here — and again for hunting us 🙏
This Jensen video is one of the first concrete use cases from the launch. We’ll share more in the coming weeks.
Curious how you handle layouts. Does preserving style across different languages create a lot of edge cases?
Vozo AI — Video localization
@miles_anthony2 Yeah — as expected, keeping style consistent across languages is surprisingly hard
This feels especially useful for global YouTube content are longer form videos the strongest use case right now?
Vozo AI — Video localization
@oliver_nathan3 yes, Youtuber is one of the strongest use case.
Translating on screen content probably changes the perceived quality a lot more than people expect. Have users reacted more to that than the voice layer?
Vozo AI — Video localization
@sadie_charlotte1 Great question — we’re still figuring that out. Early signals suggest people really notice the on-screen text layer more than expected, but it’s a bit hard to separate from the overall “everything feels more native” effect.
This is solving a real pain point for performance marketers running international campaigns.
The problem with "translated" video ads right now: you dub the voice, add subtitles, and call it done — but the on-screen text still says "Limited Time Offer" in English while the voiceover is in Thai. The disconnect kills trust and conversion rates in non-English markets.
Building ad-vertly.ai, we work with brands running paid social across markets and the localization gap is consistently one of the top creative complaints. A "translated" ad that's only 70% translated isn't really localized — it's just accented.
Full-layer translation (audio + on-screen + subtitles all in sync) is table stakes for any brand taking APAC or LATAM seriously. The Jensen Huang example is a perfect demo because it shows how much on-screen text carries the actual meaning of a technical talk.
Curious: are you seeing more adoption from marketing/creative teams or from content creators? I'd expect the use case to be quite different between the two.
This is a big gap most people overlook.
Subtitles alone don’t carry tone, context, or on-screen meaning — especially for content-heavy videos.
Full localization (voice + visuals + intent) feels like the real unlock for global reach.