p/replay-production-api-failures-locally

Fix production bugs by replaying them locally

Start new thread

Flux - Fix production bugs by replaying them locally

Flux

•12h ago

Flux records API executions so you can replay failures locally, fix them, and resume execution safely. Instead of guessing from logs, you get the exact request, inputs, and behavior. Same request. Same IO. Same outcome.

Replies

Best

Flux

Maker

📌

Hey everyone 👋 I built Flux because debugging production bugs always felt like guesswork. You look at logs, try to reproduce locally, add more logs, redeploy… and repeat. Flux changes that. It records every request (including external calls), so you can replay the exact failure locally. Fix the bug → replay safely → then resume the same execution with real IO. No mocks. No staging. No duplicate side effects. I’m especially curious: Would you trust something like this in your debugging workflow? Happy to answer anything — especially how replay/resume works under the hood.

Report

1d ago

Flux

Maker

One thing that surprised me while building this:

The hardest part wasn’t capturing requests — it was making them replayable deterministically.

Especially when:

- external APIs change

- async workflows are involved

- retries behave differently

That’s where most debugging tools break.

Curious — for people working with APIs or AI pipelines:

What’s the hardest bug you’ve had to debug in production?

Report

10h ago

@shashisrun Had a webhook that started sending different payload shapes on weekends. The third party's A/B testing was hitting a different serializer, but only on Saturdays. Took two days of adding logs and waiting for the next Saturday to reproduce it. Staging never saw it because their test environment didn't have the same A/B config.

Being able to just replay the actual request would've cut that from days to minutes.

Report

9h ago

Flux

Maker

@alan_silverstreams that’s such a perfect example — the “only on Saturdays” bugs are the worst 😅

A/B configs + third-party behavior is exactly where things become impossible to reproduce reliably.

And yeah — that’s the core idea. Instead of adding more logs and waiting for it to happen again, just replay the exact request with the same context.

Curious — in cases like this, do you usually end up adding more observability, or building custom replay/debug tooling internally?

Report

8h ago

@shashisrun How do you deal with non-deterministic bits like timestamps or external API flakiness during replay?

Report

10h ago

Flux

Maker

@swati_paliwal great question — this is actually the hardest part.

What I’ve been doing is separating deterministic vs non-deterministic parts of execution.

– For things like timestamps/randomness → they get recorded and replayed as-is

– For external APIs → responses are captured and stubbed during replay

– For retries/async flows → the sequence + timing is preserved from the original execution

So instead of trying to simulate behavior, you’re effectively “re-running” the same execution with controlled inputs.

Still evolving this, but that’s the general approach so far.

Report

8h ago

The resume-after-fix part is the piece I haven't seen before. Most replay tools let you reproduce the bug, but you still have to re-trigger the whole flow manually. How does the resumption work in practice - does Flux hold state between the failure and the fix, or is it more like re-running from a checkpoint?

Report

4h ago