Composer 2 by Cursor - Fast, token-efficient frontier-level coding model

by•13d ago

Composer 2 by Cursor is a frontier-level coding model built for complex, long-horizon development tasks. It combines strong benchmark performance with highly efficient pricing ($0.50/M input, $2.50/M output). Powered by continued pretraining and reinforcement learning, it delivers smarter code generation with better cost-performance, plus a faster variant for real-time workflows.

Replies

Best

Hunter

📌

Composer 2 by @Cursor is a frontier-level coding model designed to solve complex, long-horizon programming tasks with high efficiency and strong benchmark performance.

It tackles the problem of limited coding accuracy and high costs in AI dev tools by combining improved intelligence with optimized pricing.

What makes it different is its continued pretraining + reinforcement learning on multi-step coding tasks, enabling it to handle hundreds of actions with better results across benchmarks like Terminal-Bench and SWE-bench Multilingual.

Key highlights:

Strong coding performance (61.7 on Terminal-Bench 2.0)
More cost-efficient ($0.50/M input, $2.50/M output)
Fast variant with same intelligence but quicker responses
Built for real-world, long-horizon dev workflows

Great for developers, teams, and builders working on complex codebases, automation, and AI-assisted programming. If you're building with AI, this is worth checking out!

P.S. Here's an interesting comparison between Composer 2 vs Opus 4.6 vs GPT 5.4 (unscientific). Composer 2 is 10× cheaper than Opus 4.6 and supposed to rival it.

P.P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified → @rohanrecommends

Report

14d ago

@rohanrecommends How does Composer 2's long-horizon reasoning via RL on multi-step tasks compare to Claude 4 Opus in real-world dev workflows like refactoring large codebases; any early user benchmarks or tips for switching?

Report

13d ago

@rohanrecommends The "long-horizon" claim is the one I keep testing with every coding model. In my experience the real failure mode isn't losing context across files, it's when the model starts making local decisions that are individually correct but globally inconsistent. Does Composer 2 do anything differently there, or is it still up to the developer to catch that drift?

Report

13d ago

Raycast

While Windsurf confuses their pricing model, Cursor keeps trucking with their own tech. Inspiring stuff.

Report

13d ago

Hunter

@chrismessina Thanks for sharing the forum thread, Chris. I didn't know Windsurf was in a soup.

Report

13d ago

Is this a fine-tuned Kimi 2.5 model?

Report

13d ago

@mikestaub yes, that's the base they started from. @leerob clarified on X:

Composer 2 started from [Kimi K2.5] (...) ~1/4 of the compute spent on the final model came from the base, the rest is from our training. (...) And yes, we are following the license through our inference partner terms.

Source: Twitter/X

Report

13d ago

@mikestaub @leerob @fmerian
I notice they mention Fireworks here, I thought they were using Together AI for composer 2, at least that is what Together announced. Or are they using both ?

Report

13d ago

@openmarkai Kimi confirmed:

@Cursor accesses k2.5 via Fireworks' hosted RL and inference platform as part of an authorized commercial partnership.

Source: Twitter/X

Report

13d ago

Flowtica Scribe

@mikestaub Cursor has openly acknowledged this, so it's no secret (the Kimi OSS license just requires conditional attribution).

The real takeaway for me is how a strong base model like @Kimi AI - Now with K2.5 + heavy RL can push performance to this level🤔

Report

13d ago

That's fascinating. Cursor isn't just an app or AI model company; it's both. I think this dual identity is the biggest differentiator Cursor has among hundreds of coding agents and editors.

Report

13d ago

the token efficiency angle is interesting - most coding models optimize for correctness first and leave efficiency as an afterthought. curious what the tradeoffs look like in practice. do you find it handles multi-file refactors well or is that still where longer context wins?

Report

13d ago

The pricing is what gets me. $0.50/M input is wild for a model that's beating Opus 4.6 on coding benchmarks. Been burning through tokens on long refactors and this could cut my bill in half.

Curious how it handles multi-file edits across a full monorepo though. That's where I've seen most coding models start to lose context and make weird decisions. The "long-horizon" claim sounds promising but I'll believe it when I see it on a real 50-file refactor.

Report

13d ago

Bench for Claude Code

Just gave it a spin. Loving the speed and the cost efficiency, even if it still needs a lot of hand-holding. That's great for planning though, and to carry out simple tasks :) It will totally become my daily driver

Report

13d ago

Ollang DX

My early tests of Composer 2 look very promising. It feels like using Claude 4.6 Opus, but faster and more cost-efficient. I was considering switching to Zed or Windsurf before this update, but this release has kept me on Cursor(for now). That said, Cursor is still a heavy RAM consumer in my workflows, and I’d prefer a more memory-efficient IDE that offers the same level of capability.

Report

13d ago

How do benchmark gains translate to messy, real codebases with legacy patterns, unclear requirements, or incomplete context?

Report

13d ago

My Cursor is incredibly fast now on auto model

Report

12d ago

1 2