Sonnet 4.6 - The most capable Sonnet model yet
byβ’
Claude Sonnet 4.6 is a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta. Sonnet 4.6 has improved on benchmarks across the board. It approaches Opus-level intelligence at a price point that makes it practical for far more tasks. It also shows a major improvement in computer use skills.



Replies
Stellify
Claude has been a game changer when it comes to development with Stellify. A year ago I wouldn't have been able to imagine the progress that we've made.
What stands out to me is how calm and focused the experience feels. No ads, no noise, just a space to think and work through problems.
Migma AI
Letss gooooo!!! <3
Been running 3.5 Sonnet pretty heavily for code generation in my side projects and the jump to 4.6 is wild. The computer use improvements are what I'm most curious about though - I've been building agent workflows that need to interact with UIs and the reliability gap was always the blocker. Has anyone stress tested the 1M context window with actual production workloads? My experience with long contexts is they tend to degrade in the middle sections even when the benchmark numbers look good.
ResumeUp.AI
Big congrats on the launch! π
Sonnet 4.6 looks like a huge step forward - love the focus on real task execution. Excited to try it out!
I've been using Claude for some projects and I'm fairly new to the operations. Do you see the Sonnet line converging toward agentic autonomy, or remaining a human-in-the-loop reasoning engine?
I tried Cowork for more than 2 weeks now and it is very good and easy to use in total. But I have to understand all the feature that Cowork brings in. The first development together with Claude code working well and saves a lot of time in my daily work.
Running bill.dock.io means I live inside document parsing pipelines all day, so I'm genuinely curious how Sonnet 4.6 handles edge cases like rotated invoice scans or mixed-language documents with inconsistent formatting β the stuff that breaks most models in production. The expanded context window is interesting for our use case, but raw extraction accuracy on messy real-world docs is what actually moves the needle for SMB tooling. Are there any benchmarks specifically on structured document understanding rather than general reasoning tasks?