Gautam Khosla

I stopped being on-call for my own software. Here's how.

by

Six months ago, I had a wake-up call. Literally.

2am page. App down. Spent 4 hours diagnosing an issue; my logs had the answer the entire time. That incident is why I and @madaj2 built Solen.

AI is great at reading error logs, understanding code context, and diagnosing common failure modes. Developers are doing this manually every time there's an incident. That's a workflow that should be automated.

Solen is an AI platform that builds web applications from natural language and then monitors and repairs them autonomously. The self-healing loop: detect failure, diagnose root cause, generate fix, validate, redeploy, and rollback if it made things worse.

We've had 9 autonomous repairs in production. 100% success rate. Average downtime: 4 minutes vs. what would have been a multi-hour manual process.

I'm sharing this here specifically because I think the PH community will have strong opinions on where this breaks down. The obvious objection is: "what if the AI makes the wrong fix?" That's exactly why we have confidence thresholds, attempt limits, and automatic rollback. But I'm sure there are failure modes I haven't thought of.

If you're a developer who's ever lost sleep over a production incident, I'd love your feedback, both on the concept and on where you'd expect it to fail.

solenai.ca, early access open.

10 views

Add a comment

Replies

Be the first to comment