Regrada: CI for AI behavior
Hey Product Hunt 👋
I’m Matias, founder of Regrada.
Over the last year I kept running into the same problem while building with LLMs. Everything looked fine in CI, tests passed, deploy went through… and somehow the behavior still changed in production.
A prompt tweak, a model update, a provider change. Suddenly the responses users see are a little worse. Not broken, just different enough to cause issues.
Traditional CI was never built for systems where outputs are probabilistic.
So I started building Regrada.
Regrada captures real LLM traces from your application and lets you replay them during CI to check whether behavior has drifted beyond the threshold you set.
Instead of guessing whether a change is safe, you can actually verify it before shipping.
Right now Regrada can:
• Capture prompts, responses, latency, and token usage
• Store traces from real production traffic
• Replay them in CI against new models, prompts, or providers
• Compare outputs and fail builds when drift exceeds your threshold
• Show diffs so you can review exactly what changed
The goal is simple: bring release discipline to AI-powered software.
If you’re building with LLMs, I’d love to hear how you’re currently handling regressions.
Early access is open and I’m actively looking for teams to test it with real workloads.


Replies