Zac Zuo

GLM-5 - Open-weights model for long-horizon agentic engineering

A 744B MoE model (40B active) built for complex systems & agentic tasks. #1 open-source on Vending Bench 2, narrowing the gap with Claude Opus 4.5. Features DeepSeek Sparse Attention and "slime" RL infra.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

To put it simply: This is the Pony Alpha on @OpenRouter.

GLM-5 is a monster. It scales to 744B params, with 40B active, and integrates @DeepSeek’s Sparse Attention (DSA) to keep costs down while maintaining long context.

But the real story is agentic capability.

On Vending Bench 2, simulating a business over a year, it ranks #1 among open-source models with a balance of $4,432. That is comparable to Claude Opus 4.5 ($5k range).

They built a new async RL infra called "slime" to fix post-training inefficiency, and it shows.

Also, Z.ai has evolved. You can now toggle Agent mode, instead of just Chat, to let it actually execute tasks. Give it a Spin!

Piroune Balachandran

@zaczuo How does Z.ai Agent mode sandbox tools and persist state across long runs? Clear permissions plus replayable traces would make GLM-5 easier to trust when it's doing real work.

Djbutterrock

744B MoE with 40B active is serious scale impressive to see it close the gap with frontier models. Would love more transparency on real-world agent benchmarks beyond synthetic evals.

Curious Kitty
If a team already gets strong results from closed-model coding agents, what are the two or three concrete scenarios where GLM‑5 wins enough to justify switching?
Zac Zuo

@curiouskitty I'd say these:

  1. If your agent loop runs for hours, you need Opus-level planning but likely can't justify the API bill. GLM-5 hits that specific "smart enough + cost-effective" sweet spot.

  2. Since it's open weights, you can deploy it on your own infra (or your preferred provider) for sensitive codebases that can't leave your VPC.