We’re experimenting with AI-assisted DevOps incident recovery — would you trust this in production?
We’re building Unideploy, a DevOps automation platform that integrates directly with Claude / ChatGPT via MCP — no separate UI, no new dashboards.
The idea we’re exploring now is AI-assisted incident recovery:
Instead of jumping between CloudWatch, kubectl, CI/CD logs, and Slack during an incident, you ask the AI:
“Production API latency is high. What changed and what’s the safest way to fix it?”
Behind the scenes:
The AI gets real metrics, logs, and recent change history
It does not execute anything on its own
It proposes safe recovery options (rollback, scale, restart, config revert)
Each option includes risk, blast radius, and cost impact
A human explicitly approves before anything runs
The goal is not “AI agents replacing DevOps”, but:
👉 Reducing decision stress during incidents
👉 Making production changes safer
👉 Capturing incident knowledge so fixes aren’t lost
Curious to hear from:
DevOps / SREs: what part of incident response hurts most?
Founders: would this increase confidence in on-call teams?
Skeptics: what would make you not trust this?
We’re early and validating — honest feedback welcome.

Replies