Pegasus 1.5 by TwelveLabs - AI model for transforming video into Time-Based Metadata
by•
Pegasus 1.5 transforms raw video into consistent, structured, timestamped data on-the-fly. Video becomes a queryable and computable asset, based on your company’s custom requirements. Define a schema of what matters in your domain, point it at any video up to 2 hours, and get back structured, time based metadata in a single API call. And, it’s multimodal – pass in an image, and find anytime this reference appears in your video. Your video library, finally queryable for humans and agents.


Replies
TwelveLabs
Hi, Jae here. I’m the CEO of @TwelveLabs
Today, we’re launching Pegasus 1.5, the first video language model turning video into queryable data assets. What would you build if your video was as queryable as text? Try the free Playground: twelvelabs.io
Video is the most opaque data source; it’s hard to know the content of your video without simply watching the video. Pegasus 1.5 lets you understand your video library, autonomously, on-the-fly, and at scale. More than that, it future proofs your archive and enables agents to actually navigate it with enriched and custom-defined metadata.
What’s New:
Time Based Metadata: Generate custom, time-coded metadata, based on your exact need. Some examples include: segment every time the speaker changes, segment every time my favorite basketball player dunks, and segment every time my logo appears on screen.
On the Fly Processing: Start with just one video, and get value immediately. If you’re a creator who needs to chapterize your content for youtube, with transcription and key events, upload the video on TwelveLabs, and Pegasus 1.5 will give you exactly what you need.
Multimodal Prompting: Pass in an image, and tell the model to show you every time the object in the image appears. Try it for product placement or for tracking your favorite player across a game.
We're proud to make a model that actually helps you understand your video content, in the way you want. We outperform top general models on segmentation and on multimodal inputs. We support 2 hours of video, which is more than two times other models. And, we’re way more cost efficient. Check it out, and would love your feedback!
@TwelveLabs @jaelee_ is this a purely cloud-based api or is there an on-prem/vpc option for enterprise security? for raw video data, moving 2-hour files to the cloud is always the bottleneck. love the 'on-the-fly' processing promise.
TwelveLabs
@priya_kushwaha1 Excellent question! While we follow rigorous security standards in our cloud-based API, we know that for many industries & use cases, the compute needs to go to the data and not the other way around.
We're actively building deployment options to meet customers where their data already lives - whether that's a VPC, on-prem or an air-gapped environment. Video AI shouldn't force you to move your most sensitive content to get value from it. We will have something exciting to announce in the near future. Stay tuned ;).
I had a blast collaborating with @emilykurze and the @TwelveLabs team on this launch.
Pegasus 1.5 is a significant leap in generative video AI: autonomous and reliable segmentation, long-form video support (up to 2 hours), with SOTA performances (30% better than Gemini 3 Pro, 3.1 Pro, and 3 Flash).
Try the free Playground at twelvelabs.io - Looking forward to see what you're building!
Serand
Impressive to see long-form support up to 2 hours. That’s where most current tools struggle, so this feels like a meaningful improvement.
TwelveLabs
@anthony_adams_ Yes! And we hope to keep expanding on this!
I'm curious, how does the technology itself work? Do you cut the frames into images and analyze each one individually?
TwelveLabs
@natalia_iankovych I can't say too much but treating videos as a sequence of images have limitations such as losing temporal information, so we treat the video input holistically instead!
The time-based metadata angle is really clever. Most video AI just gives you a single summary but being able to query specific timestamps changes how you can build products on top of it. I've been working with AI coding tools and one thing I keep wishing for is better video tutorial indexing — like "show me the part where they set up the database" instead of scrubbing through a 45 min tutorial. Does Pegasus 1.5 handle that kind of instructional content well?
TwelveLabs
@ethanfrostlove Yes! While instructional videos were not specifically trained for, Pegasus 1.5 is a video language model and should be performant on topic changes. Try a segment definition like: "Segment only on sections where the database is being set up." You could include metadata like a transcription of the user content or instruction_type, where you might consider asking if the video tutorial is showing a demo or a visual aid.
Try it for yourself on the playground at twelvelabs.io, and let me know what you end up trying!
does this handle multi-speaker video well, like panel discussions where different people are talking over each other? been trying to build a search layer over recorded meetings and the timestamp accuracy is always the bottleneck.
TwelveLabs
@lumm Yes it does! Multi-speaker videos were one of the key use-cases that we had in mind when developing Pegasus 1.5. Please give it a try!