AI platform for deep video understanding

Marengo 3.0 by TwelveLabs - The most powerful embedding model for video understanding

by•5mo ago

Marengo 3.0 is TwelveLabs' most significant model to date, delivering human-like video understanding at scale. A multimodal embedding model, Marengo fuses video, audio, and text for holistic video understanding to power precise video search and retrieval.

Replies

Best

DesignRevision

TwelveLabs is impressive in pushing the limits of video AI. It seems powerful and efficient. How does it handle complex scenes to ensure accurate context understanding across different video genres?

Report

5mo ago

TwelveLabs

Maker

Hey Product Hunt! 👋 This is Allie from @TwelveLabs!

Today we’re launching Marengo 3.0 (M3) — our biggest upgrade yet in multimodal AI.

If you’ve ever tried to build on top of models that say they understand video but collapse on long content, sports, or anything beyond short clips… M3 is built for you.

🚀 What’s M3?

M3 is a unified multimodal foundation model powering our Search API and Embed API.
It understands video, audio, images, and text in a single space — fast, efficient, and built for production.

🔥 Highlights

⚡ Breakaway speed on long-form video processing — practical at massive scale
💾 512-d embeddings → up to 6× more storage-efficient with top-tier accuracy
🎥 True multimodality across video, audio, image, and text
🌍 Native multilingual support (English, Korean, Japanese, and more)
🏀 Elite sports intelligence: fine-grained action recognition, player tracking, and temporal reasoning
🧠 Handles hour-long videos, long queries, and composed queries (image + text)

💡 What you can build

Search platforms, AI agents that watch content, sports analytics tools, compliance systems, media workflows — anything that needs real video understanding.

🛠️ Try Marengo 3.0

Available via:

TwelveLabs SaaS (Search API + Embed API)
AWS Bedrock

I’m so proud of the research-first team behind this release — and excited to see what you build with M3.

Ask me anything below 👇

Report

5mo ago

Hi,can I use it for my game promo video?

Report

5mo ago

Do you have plans to integrate Marengo 3.0 with professional video editing tools (e.g., Adobe Premiere Pro) so teams can pull retrieved clips directly into timelines? Also, will the model support analysis of extra-long-form footage (e.g., 12+ hour raw interviews), common in documentary and investigative work?

Report

5mo ago

Great work on Twelve Labs. The notion of AI that understands video — visuals, audio, context — like a human does, but at scale, feels like the next big leap for video workflows. Curious how well it handles noisy, real-world footage

Report

5mo ago

Hey there!

This looks incredibly useful for anyone drowning in video content. The ability to actually search through video based on what's happening, not just transcripts or tags, would save our team hours every week. I can't count how many times I've had to skim through a long design tutorial or a recording just to find one specific segment I know was in there somewhere. If Pegasus is as good as it sounds, it might finally stop me from saying, "I know I saw that somewhere in the video..." Solid solution to a very real problem.

Report

5mo ago

I recommend TwelveLabs—it's a powerful AI platform that truly understands video. Using advanced multimodal models like Marengo and Pegasus, the service takes searching, analyzing, and generating text from video content to a whole new level.

Report

5mo ago

Congrats on the launch! The multimodal performance looks seriously impressive, especially the long-form and multilingual handling. How it performs on noisy user-generated content in real workflows?

Report

5mo ago

Congratulations on the new release! We once made a similar service: we recognized text from videos, translated it, and generated videos with the translation. This way, YouTube bloggers could automatically create videos in 70+ languages. YouTube even officially recommended this service later.

Report

5mo ago

Unloop

This is great!! Congrats on the launch

Would love to test this for auto generating summaries of short films. Does it handle narrative structure well or its more optimized for action/object detection?

Report

5mo ago

1 2