SURYANSH GUPTA

We’re experimenting with AI-assisted DevOps incident recovery — would you trust this in production?

by

We’re building Unideploy, a DevOps automation platform that integrates directly with Claude / ChatGPT via MCP — no separate UI, no new dashboards.

The idea we’re exploring now is AI-assisted incident recovery:

Instead of jumping between CloudWatch, kubectl, CI/CD logs, and Slack during an incident, you ask the AI:

“Production API latency is high. What changed and what’s the safest way to fix it?”

Behind the scenes:

  • The AI gets real metrics, logs, and recent change history

  • It does not execute anything on its own

  • It proposes safe recovery options (rollback, scale, restart, config revert)

  • Each option includes risk, blast radius, and cost impact

  • A human explicitly approves before anything runs

The goal is not “AI agents replacing DevOps”, but:
👉 Reducing decision stress during incidents
👉 Making production changes safer
👉 Capturing incident knowledge so fixes aren’t lost

Curious to hear from:

  • DevOps / SREs: what part of incident response hurts most?

  • Founders: would this increase confidence in on-call teams?

  • Skeptics: what would make you not trust this?

We’re early and validating — honest feedback welcome.

23 views

Add a comment

Replies

Be the first to comment