New AI models pop up every week. Some developer tools like @Cursor, @Zed, and @Kilo Code let you choose between different models, while more opinionated products like @Amp and @Tonkotsu default to 1 model.
Curious what the community recommends for coding tasks? Any preferences?
3K views
Replies
Best
I am using Sonnet 4.5 vast majority of time and it works fast and precise, it is very robust !
@fmerian For my side projects, I currently use DeepSeek for agentic work. I usually refine features in ChatGPT first, then hand them off to DeepSeek for execution. So far, the setup works really well with minimal cost—about $2–3 per day of coding, plus my ChatGPT subscription, which I’d have anyway even if I weren’t coding.
Write/negotiate product brief with ChatGPT 5.2 web thinking high.
Write/negotiate architecture+plan, based on the brief, withOpus 4.5 in Cursor
Implement the doc set (brief+architecture+plan) withCodex 5.2 CLI
Debug if necessary with Gemini 3 -> Sonnet 4.5 -> Opus 4.5, in Cursor, in that order, if the bug is being difficult.
Grok or Gemini 3 for codebase questions (where's this thing?)
Attempting to get some of the more complex thinking from Opus 4.5 without burning tokens implementing everything with it. I sometimes get through 3-5 briefs/features in a day by running in parallel, so my token burn gets pretty steep.
I find that enough documentation helps most models get decent results, but I do feel a difference with the frontier high thinking in Opus and Codex. Less to clean up when finalizing the feature. Fewer bad-coder behaviors like deleting tests that are failing to get a passing test suite.
@navedux ICYMI Moonshot just announced Kimi K2.5, "the strongest open-source model to date for coding, with particularly strong capabilities in front-end development." (source) currently free on @Kilo Code btw.
Definitely Sonnet 4.5, with occasional Opus 4.5 mixed in when it can't handle the task. It's pretty crazy how quickly it's improving too.
Still significant hallucinations, but a good AGENTS.md can dramatically reduce the ones that repeatedly pop up (e.g. assuming a certain testing framework, etc.)
@fmerian problem for most of us: who has the budget to pay for most and compare? yikes
How can a single developer know what actually works across different setups and what's just marketing hype? especially on "hope it is free" budget...
Most of us do NOT have unlimited personal budgets so, I for one, have relied on free tools that , in exchange, I contribute to train with a thumbs up or a thumbs down.
My current setup is VSCode running Windsurf, the default model, which is free. I also have a local version of deepseek running on my Mac. Windsurf has saved me hours it is very good at quick autocomplete without trying to write everything itself. it has also helped me refine existing code to be shorter and more readable. it's helped me find places where I am not error checking as well.
Deepseek I use mostly to explain code but I find it to be subpar as far as generating good code that is usable. However it also took me a long time and lots of reading to settle on Windsurf in particular. It was overwhelming, the number of choices.
I have started narrow basically focusing on refactoring code to be more efficient or autocomplete for long JSON files. However I am seeing that I can probably expand to generating documentation.
What are you using in your own development environment and what do you see as the pros and cons? I am seeing that cost is definitely a limiter in my case, so I settled on a free tool that also limits the number of models it uses.
Report
@cassi_cassi Totally relatable. I spent a long time on free tiers myself until I realized I was spending more time "making it work" than actually saving money. Here’s what helped: many tools (including Cursor) offer trial periods or $5–10 credits - you can run your real tasks through different models over a week and see what actually speeds things up. Then the math’s simple: if a model saves you 2 hours a week, it pays for itself even at $20/month
Windsurf is a solid pick too, their request caching actually works and doesn’t eat tokens on repeated edits
Replies
I am using Sonnet 4.5 vast majority of time and it works fast and precise, it is very robust !
Humans in the Loop
@Claude by Anthropic is leading the way
@fmerian For my side projects, I currently use DeepSeek for agentic work. I usually refine features in ChatGPT first, then hand them off to DeepSeek for execution. So far, the setup works really well with minimal cost—about $2–3 per day of coding, plus my ChatGPT subscription, which I’d have anyway even if I weren’t coding.
Text Affirmations
D) All of the above?
Write/negotiate product brief with ChatGPT 5.2 web thinking high.
Write/negotiate architecture+plan, based on the brief, with Opus 4.5 in Cursor
Implement the doc set (brief+architecture+plan) with Codex 5.2 CLI
Debug if necessary with Gemini 3 -> Sonnet 4.5 -> Opus 4.5, in Cursor, in that order, if the bug is being difficult.
Grok or Gemini 3 for codebase questions (where's this thing?)
Attempting to get some of the more complex thinking from Opus 4.5 without burning tokens implementing everything with it. I sometimes get through 3-5 briefs/features in a day by running in parallel, so my token burn gets pretty steep.
I find that enough documentation helps most models get decent results, but I do feel a difference with the frontier high thinking in Opus and Codex. Less to clean up when finalizing the feature. Fewer bad-coder behaviors like deleting tests that are failing to get a passing test suite.
CreateOS
Humans in the Loop
@navedux Devstral 2 (Small) by @Mistral AI is released under Apache 2.0 afaik
CreateOS
Humans in the Loop
@navedux good q! personally i had good frontend results with both Opus 4.5 and Gemini 3. hope it helps
Humans in the Loop
@navedux ICYMI Moonshot just announced Kimi K2.5, "the strongest open-source model to date for coding, with particularly strong capabilities in front-end development." (source) currently free on @Kilo Code btw.
hope it helps!
Lightfern for Email
Definitely Sonnet 4.5, with occasional Opus 4.5 mixed in when it can't handle the task. It's pretty crazy how quickly it's improving too.
Still significant hallucinations, but a good AGENTS.md can dramatically reduce the ones that repeatedly pop up (e.g. assuming a certain testing framework, etc.)
Humans in the Loop
spot on - btw if you use @Next.js, they recently included bundled docs for agents and it significantly improves their performance results [1]
[1]: AI Agents Evaluations for Next.js
My go-to is GPT-5.2
Humans in the Loop
@dastion do you use @OpenAI's model for every type of task (plan, code, debug)?
Claude Sonnet 4.5 for me.
Consistent, predictable, and easier to work with over longer sessions.
Humans in the Loop
@Claude by Anthropic is leading the way!
BayesLab
Used all except devstral. Sonnet 4.5 gives the best "to the point" ability for larger projects without overdoing or derail.
NoteThisDown
Opus 4.5 missing from this list, isn't it? ;)
Humans in the Loop
+1 @Claude by Anthropic family
@fmerian problem for most of us: who has the budget to pay for most and compare? yikes
How can a single developer know what actually works across different setups and what's just marketing hype? especially on "hope it is free" budget...
Most of us do NOT have unlimited personal budgets so, I for one, have relied on free tools that , in exchange, I contribute to train with a thumbs up or a thumbs down.
My current setup is VSCode running Windsurf, the default model, which is free. I also have a local version of deepseek running on my Mac. Windsurf has saved me hours it is very good at quick autocomplete without trying to write everything itself. it has also helped me refine existing code to be shorter and more readable. it's helped me find places where I am not error checking as well.
Deepseek I use mostly to explain code but I find it to be subpar as far as generating good code that is usable. However it also took me a long time and lots of reading to settle on Windsurf in particular. It was overwhelming, the number of choices.
I have started narrow basically focusing on refactoring code to be more efficient or autocomplete for long JSON files. However I am seeing that I can probably expand to generating documentation.
What are you using in your own development environment and what do you see as the pros and cons? I am seeing that cost is definitely a limiter in my case, so I settled on a free tool that also limits the number of models it uses.
@cassi_cassi Totally relatable. I spent a long time on free tiers myself until I realized I was spending more time "making it work" than actually saving money. Here’s what helped: many tools (including Cursor) offer trial periods or $5–10 credits - you can run your real tasks through different models over a week and see what actually speeds things up. Then the math’s simple: if a model saves you 2 hours a week, it pays for itself even at $20/month
Windsurf is a solid pick too, their request caching actually works and doesn’t eat tokens on repeated edits
RiteKit Company Logo API
It's a credit-burner, but I find Opus 4.5 the best
Humans in the Loop
@osakasaul may Sonnet 4.6 solves it - "The power of Opus 4.5 at lower cost."
RiteKit Company Logo API
@fmerian Yes. I try to be thrifty, already paying out the nose for Claude... Gotta get off Lovable as well, next.