Nemotron 3 Super - Open hybrid Mamba-Transformer MoE for agentic reasoning
Nemotron 3 Super is NVIDIA"s open 120B model with 12B active parameters, a 1M-token context window, and a hybrid Mamba-Transformer MoE design. It is built for coding, long-context reasoning, and multi-agent workloads without the usual thinking tax.



Replies
Flowtica Scribe
Hi everyone!
Nemotron 3 Super really stands out because NVIDIA is framing it around two very real agent problems: the "thinking tax" of using a huge reasoning model for every step, and the "context explosion" that happens when long tool loops keep resending history and drift off goal.
This is an open 120B model with 12B active parameters, built to make those workloads more practical. It uses a hybrid Mamba-Transformer LatentMoE design, a native 1M-token context window, and multi-token prediction for faster long generations.
NVIDIA is also releasing more than just weights here. Datasets, recipes, and deployment cookbooks are part of the package too.
P.S. It is also free on @opencode Zen and @OpenRouter right now!