Launched this week

Relvy
Your AI On-call Engineer
111 followers
Your AI On-call Engineer
111 followers
Relvy is a specialized AI agent that investigates on-call alerts autonomously. Compared to general purpose agents (Claude Code + Datadog MCP), Relvy is more accurate and cheaper per alert, and each investigation produces a notebook with rich visualizations for engineers to review and build trust with the AI. Teams in fintech and consumer software use Relvy as the first line of defence on their on-call Slack channels. SOC 2 Type II compliant with self-host options. Setup takes under 15 minutes.






On-call was the single biggest source of engineer burnout on my team when I was CTO scaling from 15 to 120 engineers. The problem was never that people couldn't diagnose issues - it was that the investigation step ate all the cognitive energy. An engineer gets paged at 2 AM, spends 40 minutes just figuring out which service is actually broken, and by the time they identify the root cause they're too fried to think clearly about the fix. The fact that Relvy produces a notebook with the full investigation trail is the right design choice because even when the AI doesn't nail the root cause perfectly, having all the relevant logs, metrics, and traces pre-gathered in one place cuts that investigation phase from 40 minutes to 5. That alone would have saved my team hundreds of hours per quarter. How are you handling the cold start problem - when Relvy connects to a new codebase it hasn't seen before, how quickly does it get useful at correlating service dependencies?
@avrisimon Agree that the UX matters a lot here. You can only trust the AI if you can easily visualize and review the underlying data. Glad the notebook choice resonates.
Re: service dependencies, we do a few things, in order:
we infer service dependencies from distributed tracing data, when present.
you can also import your software catalog from sources like pagerduty, datadog or backstage.
we build a 'deepwiki' sort of pre-processed knowledge base of your connected repos and code changes to keep track of external dependencies, and how to validate different scenarios.
These steps are part of the set up process so that the AI has a good initial map of your systems. Beyond that, it builds memories over time that help it narrow down hypotheses faster than what it'd do on day 1.
Hey folks, this is Bharath, one of the founders of Relvy.
A lot of teams are using AI in some form to reduce their on-call burden. You may be pasting logs into Cursor, or using Claude Code with Datadog’s MCP server to help debug. What we’ve seen is that autonomous root cause analysis is a hard problem for AI. This shows up in benchmarks - Claude Opus 4.6 is currently at 36% accuracy on the OpenRCA dataset, in contrast to coding tasks.
There are three main reasons for this: (1) Telemetry data volume can drown the model in noise; (2) Data interpretation / reasoning is enterprise context dependent; (3) On-call is a time-constrained, high-stakes problem, with little room for AI to explore during investigation time. Errors that send the user down the wrong path are not easily forgiven.
At Relvy, we are tackling these problems by building specialized tools for telemetry data analysis. Our tools can detect anomalies and identify problem slices from dense time series data, do log pattern search, and reason about span trees, all without overwhelming the agent context.
Anchoring the agent around runbooks leads to less agentic exploration and more deterministic steps that reflect the most useful steps that an experienced engineer would take. That results in faster analysis, and less cognitive load on engineers to review and understand what the AI did.
All of this shows up in our benchmark performance. We are proud that our system improves Claude Opus 4.6's accuracy on the OpenRCA benchmark by 12 percentage points. You can read details here: https://relvy.ai/blog/relvy-improves-claude-accuracy-by-12pp-openrca-benchmark
Give Relvy a try today - you can run it entirely on your local machine, connected to your LLM of choice, with no data shared with Relvy. Appreciate any comments or suggestions!
SOC 2 self hosting is a strong combo especially for fintech teams. That alone removes a big barrier to adoption.