Mercury 2 ditches sequential decoding for parallel refinement. As the first reasoning diffusion LLM, it generates tokens simultaneously to hit 1,000+ tokens/sec. This delivers reasoning-grade quality inside tight latency budgets for your agentic loops.
Mercury, from Inception Labs, is the first commercial diffusion LLM. Up to 10x faster than autoregressive models, with comparable or better quality on coding tasks.