Gemini 3.1 Pro - A smarter model for your most complex tasks
by•
3.1 Pro is designed for tasks where a simple answer isn’t enough. Building on the Gemini 3 series, 3.1 Pro represents a step forward in core reasoning. 3.1 Pro is a smarter, more capable baseline for complex problem-solving.
Replies
Best
Hunter
📌
The AI race continues. OpenAI launched GPT-5.3-Codex 2 weeks ago. Anthropic, Sonnet 4.6 this week. And Google? They just announced @Gemini 3.1 Pro, "a smarter, more capable model for complex problem-solving."
@fmerian I am now enjoying doing so much research work on Gemini, things that I used to be doing with deep research. It's like I'm swinging between capabilities.
Gemini is alwasy good at benchmarks, but usually not great at agentic behaviour. The models have very weird behaviour. Almost like the Gemini team is not really testing them themselves.
Report
I can't keep using Antigrativy, there is no update available; and I can't use the previous model.
For SaaS use cases involving long-context multimodal inputs (e.g., analyzing full user-uploaded PDFs + screenshots + code snippets to generate UI code, migration scripts, or automated test plans), what's the practical sweet spot you've seen for token efficiency and accuracy at the 200k–1M range?
Report
Feels like reasoning quality is becoming the real differentiator now, not just speed. Curious to see how Gemini 3.1 Pro performs on real dev workflows compared to others.
Report
Multi-step reasoning is where I actually see model improvements matter - not on benchmarks but when you're chaining tool calls and the model needs to track state across a longer context. How does 3.1 compare to 2.0 Pro on that kind of work? I've been testing various models on agentic workflows lately and the gap between 'can reason' and 'reasons reliably without losing context' is pretty big in practice.
Report
Nice benchmark numbers. My concern is always the gap between benchmarks and the actual developer experience. I use Claude primarily for coding because, from my personal experience, it follows instructions pretty closely (though there's always room for improvement). For me, Gemini has historically been frustrating for me, inserting comments and refactoring code I didn't ask it to do. Would love to hear from anyone who's tested 3.1 Pro on real coding workflows, not benchmarks, and whether that's actually improved.
Replies
The AI race continues. OpenAI launched GPT-5.3-Codex 2 weeks ago. Anthropic, Sonnet 4.6 this week. And Google? They just announced @Gemini 3.1 Pro, "a smarter, more capable model for complex problem-solving."
Available in products like @Google AI Studio, @Kilo Code, and @Raycast.
Game on!
VYVE
@fmerian I am now enjoying doing so much research work on Gemini, things that I used to be doing with deep research. It's like I'm swinging between capabilities.
getviktor.com
Gemini is alwasy good at benchmarks, but usually not great at agentic behaviour. The models have very weird behaviour. Almost like the Gemini team is not really testing them themselves.
I can't keep using Antigrativy, there is no update available; and I can't use the previous model.
vibecoder.date
Does google read these?
I'll give it a shot in gemini CLI and see what's up
Folderly
i like it
Hey there, congrats on this launch!!
For SaaS use cases involving long-context multimodal inputs (e.g., analyzing full user-uploaded PDFs + screenshots + code snippets to generate UI code, migration scripts, or automated test plans), what's the practical sweet spot you've seen for token efficiency and accuracy at the 200k–1M range?
Feels like reasoning quality is becoming the real differentiator now, not just speed. Curious to see how Gemini 3.1 Pro performs on real dev workflows compared to others.
Multi-step reasoning is where I actually see model improvements matter - not on benchmarks but when you're chaining tool calls and the model needs to track state across a longer context. How does 3.1 compare to 2.0 Pro on that kind of work? I've been testing various models on agentic workflows lately and the gap between 'can reason' and 'reasons reliably without losing context' is pretty big in practice.
Nice benchmark numbers. My concern is always the gap between benchmarks and the actual developer experience. I use Claude primarily for coding because, from my personal experience, it follows instructions pretty closely (though there's always room for improvement). For me, Gemini has historically been frustrating for me, inserting comments and refactoring code I didn't ask it to do. Would love to hear from anyone who's tested 3.1 Pro on real coding workflows, not benchmarks, and whether that's actually improved.