Rohan Chaubey

Composer 2 by Cursor - Fast, token-efficient frontier-level coding model

by
Composer 2 by Cursor is a frontier-level coding model built for complex, long-horizon development tasks. It combines strong benchmark performance with highly efficient pricing ($0.50/M input, $2.50/M output). Powered by continued pretraining and reinforcement learning, it delivers smarter code generation with better cost-performance, plus a faster variant for real-time workflows.

Add a comment

Replies

Best
Rohan Chaubey
Hunter
📌

Composer 2 by @Cursor is a frontier-level coding model designed to solve complex, long-horizon programming tasks with high efficiency and strong benchmark performance.

It tackles the problem of limited coding accuracy and high costs in AI dev tools by combining improved intelligence with optimized pricing.

What makes it different is its continued pretraining + reinforcement learning on multi-step coding tasks, enabling it to handle hundreds of actions with better results across benchmarks like Terminal-Bench and SWE-bench Multilingual.

Key highlights:

  • Strong coding performance (61.7 on Terminal-Bench 2.0)

  • More cost-efficient ($0.50/M input, $2.50/M output)

  • Fast variant with same intelligence but quicker responses

  • Built for real-world, long-horizon dev workflows

Great for developers, teams, and builders working on complex codebases, automation, and AI-assisted programming. If you're building with AI, this is worth checking out!

P.S. Here's an interesting comparison between Composer 2 vs Opus 4.6 vs GPT 5.4 (unscientific). Composer 2 is 10× cheaper than Opus 4.6 and supposed to rival it.

P.P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified @rohanrecommends

swati paliwal

@rohanrecommends How does Composer 2's long-horizon reasoning via RL on multi-step tasks compare to Claude 4 Opus in real-world dev workflows like refactoring large codebases; any early user benchmarks or tips for switching?

Chinedu Chidi Ikejiani

@rohanrecommends The "long-horizon" claim is the one I keep testing with every coding model. In my experience the real failure mode isn't losing context across files, it's when the model starts making local decisions that are individually correct but globally inconsistent. Does Composer 2 do anything differently there, or is it still up to the developer to catch that drift?

Chris Messina

While Windsurf confuses their pricing model, Cursor keeps trucking with their own tech. Inspiring stuff.

Rohan Chaubey

@chrismessina Thanks for sharing the forum thread, Chris. I didn't know Windsurf was in a soup.

Mike Staub

Is this a fine-tuned Kimi 2.5 model?

fmerian

@mikestaub yes, that's the base they started from. @leerob clarified on X:

Composer 2 started from [Kimi K2.5] (...) ~1/4 of the compute spent on the final model came from the base, the rest is from our training. (...) And yes, we are following the license through our inference partner terms.

Source: Twitter/X

Marc Kean Paker

@mikestaub  @leerob  @fmerian 
I notice they mention Fireworks here, I thought they were using Together AI for composer 2, at least that is what Together announced. Or are they using both ?

fmerian

@openmarkai Kimi confirmed:

@Cursor accesses k2.5 via Fireworks' hosted RL and inference platform as part of an authorized commercial partnership.

Source: Twitter/X

Zac Zuo

@mikestaub Cursor has openly acknowledged this, so it's no secret (the Kimi OSS license just requires conditional attribution).

The real takeaway for me is how a strong base model like @Kimi AI - Now with K2.5 + heavy RL can push performance to this level🤔

Himani Sah

That's fascinating. Cursor isn't just an app or AI model company; it's both. I think this dual identity is the biggest differentiator Cursor has among hundreds of coding agents and editors.

Mykola Kondratiuk

the token efficiency angle is interesting - most coding models optimize for correctness first and leave efficiency as an afterthought. curious what the tradeoffs look like in practice. do you find it handles multi-file refactors well or is that still where longer context wins?

Mihir Kanzariya

The pricing is what gets me. $0.50/M input is wild for a model that's beating Opus 4.6 on coding benchmarks. Been burning through tokens on long refactors and this could cut my bill in half.

Curious how it handles multi-file edits across a full monorepo though. That's where I've seen most coding models start to lose context and make weird decisions. The "long-horizon" claim sounds promising but I'll believe it when I see it on a real 50-file refactor.

Matteo Avalle

Just gave it a spin. Loving the speed and the cost efficiency, even if it still needs a lot of hand-holding. That's great for planning though, and to carry out simple tasks :) It will totally become my daily driver

M. Aziz Ulak

My early tests of Composer 2 look very promising. It feels like using Claude 4.6 Opus, but faster and more cost-efficient. I was considering switching to Zed or Windsurf before this update, but this release has kept me on Cursor(for now). That said, Cursor is still a heavy RAM consumer in my workflows, and I’d prefer a more memory-efficient IDE that offers the same level of capability.

Jay

How do benchmark gains translate to messy, real codebases with legacy patterns, unclear requirements, or incomplete context?

Aleksandr Kichev

My Cursor is incredibly fast now on auto model

12
Next
Last