Olmo Hybrid is a fully open 7B model that combines transformer attention with linear RNN layers. Utilizing a 3:1 pattern of Gated DeltaNet to attention, it matches the accuracy of Olmo 3 on MMLU while using 49% fewer tokens.
AI2’s new Olmo Hybrid is the first hybrid 7B that clearly beats a pure transformer baseline (Olmo 3) in a fair fight.
Same size as Olmo 3, trains at the same speed, but matches its accuracy with half the data and crushes long-context evals. The 3:1 RNN+attention mix just works.
Super clean weights on HF. And you can even run the full model 100% locally in your browser on WebGPU!
Replies
Flowtica Scribe
Hi everyone!
AI2’s new Olmo Hybrid is the first hybrid 7B that clearly beats a pure transformer baseline (Olmo 3) in a fair fight.
Same size as Olmo 3, trains at the same speed, but matches its accuracy with half the data and crushes long-context evals. The 3:1 RNN+attention mix just works.
Super clean weights on HF. And you can even run the full model 100% locally in your browser on WebGPU!