Composer 2 by Cursor - Fast, token-efficient frontier-level coding model
by•
Composer 2 by Cursor is a frontier-level coding model built for complex, long-horizon development tasks. It combines strong benchmark performance with highly efficient pricing ($0.50/M input, $2.50/M output). Powered by continued pretraining and reinforcement learning, it delivers smarter code generation with better cost-performance, plus a faster variant for real-time workflows.


Replies
Composer 2 by @Cursor is a frontier-level coding model designed to solve complex, long-horizon programming tasks with high efficiency and strong benchmark performance.
It tackles the problem of limited coding accuracy and high costs in AI dev tools by combining improved intelligence with optimized pricing.
What makes it different is its continued pretraining + reinforcement learning on multi-step coding tasks, enabling it to handle hundreds of actions with better results across benchmarks like Terminal-Bench and SWE-bench Multilingual.
Key highlights:
Strong coding performance (61.7 on Terminal-Bench 2.0)
More cost-efficient ($0.50/M input, $2.50/M output)
Fast variant with same intelligence but quicker responses
Built for real-world, long-horizon dev workflows
Great for developers, teams, and builders working on complex codebases, automation, and AI-assisted programming. If you're building with AI, this is worth checking out!
P.S. Here's an interesting comparison between Composer 2 vs Opus 4.6 vs GPT 5.4 (unscientific). Composer 2 is 10× cheaper than Opus 4.6 and supposed to rival it.
P.P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified → @rohanrecommends
@rohanrecommends How does Composer 2's long-horizon reasoning via RL on multi-step tasks compare to Claude 4 Opus in real-world dev workflows like refactoring large codebases; any early user benchmarks or tips for switching?
@rohanrecommends The "long-horizon" claim is the one I keep testing with every coding model. In my experience the real failure mode isn't losing context across files, it's when the model starts making local decisions that are individually correct but globally inconsistent. Does Composer 2 do anything differently there, or is it still up to the developer to catch that drift?
Raycast
While Windsurf confuses their pricing model, Cursor keeps trucking with their own tech. Inspiring stuff.
@chrismessina Thanks for sharing the forum thread, Chris. I didn't know Windsurf was in a soup.
Is this a fine-tuned Kimi 2.5 model?
@mikestaub yes, that's the base they started from. @leerob clarified on X:
Source: Twitter/X
@mikestaub @leerob @fmerian
I notice they mention Fireworks here, I thought they were using Together AI for composer 2, at least that is what Together announced. Or are they using both ?
@openmarkai Kimi confirmed:
Source: Twitter/X
Flowtica Scribe
@mikestaub Cursor has openly acknowledged this, so it's no secret (the Kimi OSS license just requires conditional attribution).
The real takeaway for me is how a strong base model like @Kimi AI - Now with K2.5 + heavy RL can push performance to this level🤔
That's fascinating. Cursor isn't just an app or AI model company; it's both. I think this dual identity is the biggest differentiator Cursor has among hundreds of coding agents and editors.
the token efficiency angle is interesting - most coding models optimize for correctness first and leave efficiency as an afterthought. curious what the tradeoffs look like in practice. do you find it handles multi-file refactors well or is that still where longer context wins?
The pricing is what gets me. $0.50/M input is wild for a model that's beating Opus 4.6 on coding benchmarks. Been burning through tokens on long refactors and this could cut my bill in half.
Curious how it handles multi-file edits across a full monorepo though. That's where I've seen most coding models start to lose context and make weird decisions. The "long-horizon" claim sounds promising but I'll believe it when I see it on a real 50-file refactor.
Bench for Claude Code
Just gave it a spin. Loving the speed and the cost efficiency, even if it still needs a lot of hand-holding. That's great for planning though, and to carry out simple tasks :) It will totally become my daily driver
Ollang DX
My early tests of Composer 2 look very promising. It feels like using Claude 4.6 Opus, but faster and more cost-efficient. I was considering switching to Zed or Windsurf before this update, but this release has kept me on Cursor(for now). That said, Cursor is still a heavy RAM consumer in my workflows, and I’d prefer a more memory-efficient IDE that offers the same level of capability.
How do benchmark gains translate to messy, real codebases with legacy patterns, unclear requirements, or incomplete context?
My Cursor is incredibly fast now on auto model