Where SLMs beat GPT-5

Plurai

•13h ago

We’ve been seeing a consistent pattern across agent systems:

GPT-5 works well as a judge on average cases—
but breaks down on edge cases and policy boundaries.

That’s exactly where reliability matters.

In our recent work, we took a different approach:

What we’re seeing:

This leads to a different stack:

Curious if others are seeing similar behavior in production.

(If relevant, we also turned this into a product: https://www.producthunt.com/products/plurai

15 views

Best

insightful read. what struck me the most: SLMs could produce 43% fewer failures, at a 8x lower cost and in less than 100 ms.

@ilankad23 curious if you could get similar results with other models?

Report

11h ago