Olmo Hybrid - 7B open model mixing transformers and linear RNNs

Flowtica Scribe

•2mo ago

Olmo Hybrid is a fully open 7B model that combines transformer attention with linear RNN layers. Utilizing a 3:1 pattern of Gated DeltaNet to attention, it matches the accuracy of Olmo 3 on MMLU while using 49% fewer tokens.

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone!

AI2’s new Olmo Hybrid is the first hybrid 7B that clearly beats a pure transformer baseline (Olmo 3) in a fair fight.

Same size as Olmo 3, trains at the same speed, but matches its accuracy with half the data and crushes long-context evals. The 3:1 RNN+attention mix just works.

Super clean weights on HF. And you can even run the full model 100% locally in your browser on WebGPU!

Report

2mo ago