fmerian

TwelveLabs Marengo 3.0 - The most powerful embedding model for video understanding

Marengo 3.0 is TwelveLabs' most significant model to date, delivering human-like video understanding at scale. A multimodal embedding model, Marengo fuses video, audio, and text for holistic video understanding to power precise video search and retrieval.

Add a comment

Replies

Best
Emily Kurze

Hi, Emily here from @TwelveLabs!

Why we built Marengo 3.0: Modern multimodal models break down on the things that actually matter in production: long videos, fast-moving sports, mixed-modality queries, noisy real-world audio, and multilingual content. We built Marengo 3.0 to solve those exact pain points. Instead of optimizing for short clips or English-only benchmarks, we focused on understanding the world as it really is—messy, long-form, multilingual, and multimodal.

What’s new and unique: Marengo 3.0 introduces a more efficient unified embedding space that works across video, audio, text, images, and even composed queries (e.g., image + text together). That unlocks new capabilities like action-level sports retrieval, long descriptive queries, accurate speech and non-speech audio retrieval, and native multilingual search across 36 languages. And it does this while being 3–6× more storage-efficient than alternative models.

What we’re most proud of: The biggest milestone: there’s no longer a trade-off between multimodality and performance. Marengo 3.0 hits state-of-the-art results across composed retrieval, sports, OCR, long-form understanding, audio, and multilingual tasks—while staying lightweight and production-friendly. Instead of chasing synthetic benchmarks, we designed a model that excels in real-world use.

Curious to hear what the Product Hunt community thinks! What would you build with access to multimodal video understanding that actually works at production scale?

Masum Parvej

@emilykurze The storage efficiency detail caught my eye

Nika

Only people with a movie historical background will understand the logo :) Love the idea behind it :)

fmerian

do you refer to the TriStar Pictures logo? good ol' memories indeed!

Emily Kurze

@fmerian Nope! The first movie was of a horse running to see if all four hooves left the ground at the same time.

fmerian
Nika

@fmerian I mean this logo:

As @emilykurze said, the video as we know it today comes from capturing the motion of a horse (at a certain point, a horse, when running, has all 4 legs above the ground) – that's how we identified movement on the camera/photos. https://en.wikipedia.org/wiki/The_Horse_in_Motion

Emily Kurze

@busmark_w_nika I love that you got the reference!

Milo McCloud

Congratulations guys!! Could you use TwelveLabs to review a final cut of a video edit before publishing in the context of content creation and YouTube, it could be really interesting for final cut reviews and missed keyframes?

Emily Kurze

@milo_mccloud That's a great use case. I believe we've have customers do this, but let me check with the team and get back to you with more details.

Milo McCloud

@emilykurze Brilliant stuff, it's definitely some of the biggest time investment for creators atm

fmerian

I had a blast collaborating with @emilykurze and the @TwelveLabs team on this launch.

Read the behind-the-scenes here in the Product Forum /p/twelvelabs and go to playground.twelvelabs.io to start playing around with the product. Enjoy!

Amelia Brooks

I like the direction you’re taking with this. What kind of feedback from early users influenced this version?

Emily Kurze

@amelia_brooks3 We work really closely with those building with our models so I'd say most features are influenced by feedback from our users.

Shubham

it's insanely fast! do you think i can use this to detect is some animation is broken?

(it's hard to define what broken even means)

check out unfold to see what i mean (we just launched yesterday on PH), and all the best guys - you have #1 vibes!

Emily Kurze

@unlikefraction I always recommend testing it (you get 10 free hours in our API playground) to see how it performs for your use case. I'm not even sure I know what broken animation looks like!

Justin Jincaid

It looks amazing! Does it handle fast-moving sports like action retrieval? I am starting a new product and kinda need something like that.

Emily Kurze

@justin2025 Yes, Marengo 3.0 has enhanced sports understanding for American football, ice hockey, baseball, basketball and soccer/football. Sports lingo can be so tricky so this version of the model we really focused on strengthening the model for sports action recognition.

Siful

Congratulations on launch! Curious: when working with long videos, how fast and accurate is the search, can it reliably find moments based on vague queries?

Emily Kurze

@getsiful One of my favorite examples of vague queries is "find scenes with unresolved romantic tension" and surprisingly Marengo can find it.

THAS REVIEWS

The unified embedding space for video+audio+text is huge. Most tools treat these as separate streams, but real-world content is inherently multimodal.

Question for Emily and team: What's the typical use case where Marengo 3.0 outperforms separate video/audio/text models? Sports analysis? Content moderation?

Also curious about the multilingual capabilities - how many languages are supported? This could be game-changing for global content creators. Congrats on the launch!

Emily Kurze

@thas_reviews Hi Thas, honestly since we API-accessible models our use cases can be pretty varied based on the industry but our two biggest sectors are media & entertainment and public sector. Really anywhere with lots of video content that is core to generating value. We see any-to-any semantic search, long to short form content, video content personalization and rough cut assembly.

And Marengo 3.0 now supports 36 languages.

Giovanni Parlangeli

Wow. I was working couple months ago with open source VLMs to dev database intelligence and was wondering how would they work on videos... your product seems to be smashing it. Can't wait to try it on tennis matches clips

Emily Kurze

@giovanni_parlangeli Thanks Giovanni, let us know how your testing goes, sports can be tricky with all the lingo. Marengo 3.0 has enhanced understanding for American football, soccer/football, baseball, basketball and ice hockey.

12
Next
Last