TareqAziz

We tried blocking bad PRs with AI before merge — here’s what actually worked (and what failed)

by

We kept running into the same annoying issue:

PR looks clean

Review is done

Tests pass

→ merge

→ something breaks in production anyway

After a few of these, it stopped feeling like “edge cases” and more like a gap in the process.

So I started building something to sit right before merge and basically ask:

“are we really safe to ship this?”

Right now it:

- looks at PR changes (not just test results)

- tries to catch risky patterns (logic gaps, weird assumptions, missing coverage)

- and can block the merge if it feels off

One thing I got wrong early:

I tried to automate everything → way too many false positives

What worked better:

Giving it a confidence score + letting humans override when needed

Still figuring this out, but curious —

Have you ever shipped something that passed review + CI and still broke production?

What was the reason?

If you want to take a look, I just launched it here:

https://www.producthunt.com/products/mergai

16 views

Add a comment

Replies

Best
Rian Robertson

Love this approach to catching those sneaky production bugs before they hit! As a dev, I've definitely shipped code that passed review and CI...only to break in prod due to overlooked edge cases in async flows.

Reminds me of why I built The Sponge: needed tools to master massive knowledge for Jeopardy, so created an AI-powered flashcard app with a browser extension that turns webpages into spaced repetition study material.

If you're up for it, I'm launching on PH soon...would appreciate a follow (for the launch; link is in my profile).

TareqAziz

@rianbrob Yeah async flows are exactly where things get tricky.

We saw the same pattern — everything looks fine in isolation, but once timing/order changes, things break in ways CI just doesn’t catch.

A lot of the issues we’ve been seeing are around:

- assumptions about execution order

- missing edge-case handling in async chains

- “works locally” but behaves differently under load

Curious — in your case, was it more about race conditions or just missed edge cases in logic?

Also The Sponge sounds interesting — turning real content into spaced repetition is a cool angle 👀