MiniMax M2.5 - The first open model to beat Sonnet made for productivity

by•2mo ago

Introducing M2.5, an open-source frontier model designed for real-world productivity. SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. Optimized for efficient execution, 37% faster at complex tasks. At $1 per hour with 100 tps, infinite scaling of long-horizon agents now economically possible.

Replies

Best

Hunter

📌

Big news for open models: MiniMax-M2.5 is out with SOTA performance at coding (SWE-Bench Verified 80.2%). The first open model to beat Sonnet. Only @Claude by Anthropic's Opus and @OpenAI 's GPT-5.2 Codex score higher.

Paths between open and proprietary models are converging...

Pro tip: If you want to quickly experiment with it, @MiniMax-M2.5 is free for a week on @Kilo Code (until Thursday, Feb 19).

OSS ftw!

Report

2mo ago

@fmerian How do you define “productivity” in the context of an AI model? How should users expect the model to change daily workflows?

Report

2mo ago

80%+ on SWE-Bench Verified for an open model is wild — especially if it’s actually usable in real workflows and not just benchmark-flexing. Curious how it holds up on messy, legacy codebases vs clean benchmark repos?

Report

2mo ago

vibecoder.date

Awesome!

is it available for opencode yet?

Report

2mo ago

Hunter

apparently! see pricing: https://opencode.ai/docs/zen/#pricing

Report

2mo ago

The claim of beating Sonnet on SWE-Bench is bold for an open model! :o How does the context window size compare to Sonnet when handling large codebases?

Report

2mo ago

AI Meal Planner

Impressive benchmarks especially on SWE-Bench and tool-calling.

I’m curious though: in real-world workflows, where does M2.5 feel meaningfully different from existing frontier models?

For example, does the 37% speed gain translate into noticeably better agent reliability on longer tasks, or is it mostly execution time?

Would love to understand where it actually changes day-to-day usage.

Report

2mo ago

That SWE-Bench score is wild for an open model. I've been running Sonnet for most of my coding workflows and honestly the cost adds up fast when you're doing long agentic runs. $1/hr with 100 tps would be a game changer if the quality holds up in practice. Curious - how does it handle multi-file refactors? That's where I see most models fall apart, they lose context across files even when the benchmarks look great.

Report

2mo ago