Automated QA for Voice AI and Chat AI agents

Start new thread

Cekura - Observe and analyze your voice and chat AI agents

Y Combinator

•8d ago

Out-of-the-box 30+ predefined metrics for analysis on CX, accuracy, conversation and voice quality. Compile perfect LLM judges by annotating just ~20 conversations and auto-improve in Cekura labs. Real-time, segmented dashboards to identify trends in Conversational AI. Smart statistical alerts so that you get notified only when metrics shift from historical baselines. Automated system pings to catch silent production failures.

Replies

Best

Cekura

Maker

📌

Hi Product Hunt! 👋

We are excited to launch Cekura Monitoring for Voice and Chat AI companies. Most monitoring tools tell you if your AI is up. Cekura tells you if it is behaving.

When we had first launched Cekura QA, we thought we had solved the problem for both testing and monitoring . But as our users scaled, we noticed a painful pattern: While pre-production QA was automated, teams were still spending dozens of hours manually listening to thousands of calls.

The two big blockers we saw were:

The Scaling Wall: Defining and optimizing custom metrics was taking too long, forcing teams back into manual spot-checks.
Production Blindspot: Standard LLM metrics misses the Customer Experience in Voice AI - things like agent tone and customer sentiment that actually defines customer success.

We have rebuilt the monitoring layer from the ground up to solve this. Cekura Monitoring turns that "wall of noisy logs" into actionable signals.

🚀 What’s New in Cekura Monitoring:

30+ Predefined Metric Suite: We track what actually breaks Voice and Chat agents across four critical categories:
- Speech Quality: Voice clarity, pronunciation, and gibberish detection.
- Conversational Flow: Silences, interruptions (barge-ins), and termination triggers.
- Accuracy & Logic: Hallucinations, transcription accuracy, and relevancy.
- Customer Experience: CSAT, Sentiment analysis, and drop-off points.
Metric Optimizer: Stop "vibes-based" prompt engineering. Define a metric (e.g., Successful User Authentication), tag 20 calls in our Labs interface, and our optimizer "compiles" a prompt that aligns with your specific feedback.
Statistical Intelligence: No more fixed, noisy thresholds. Our Alerting Engine learns your agent's baseline and only pings Slack when metrics shift 2σ from historical norms.
Automated Cron Jobs: Set up recurring health checks to simulate production conversations. Catch silent failures and regressions before your customers do.
Visual Dashboards: Real-time distribution charts for each metric. Views customized for each stakeholder

Who is this for?

Teams scaling Voice & Chat AI who are tired of listening to calls manually and need a way to prove their agents are actually working.

Sign up and try for free at cekura.ai or drop your questions below! We would love to hear how you’re currently handling Voice and Chat AI in production👇

Report

13d ago

@kabra_sidhant Many congratulations on the launch, Sidhant! I've been tracking it since the Vocera days, it's evolved impressively and keeps getting better. Thrilled to see the buzz in voice AI communities especially on Reddit. Onwards and upwards! :)

Report

7d ago

Cekura

Maker

@rohanrecommends Thanks Rohan - also for all your guidance on best practices for Product Hunt Launches!

Report

7d ago

Cekura

Maker

Blind spots in production voice agents are brutal — you don't know your agent is skipping verification steps or missing required disclosures until a compliance team surfaces it weeks later. Monitoring 100% of live calls at the session level rather than spot-checking is the only real fix. The P50/P90 latency tracking and interruption detection on production traffic is also underrated — that's where infrastructure regressions hide.

Report

8d ago

Cekura

Maker

We are thrilled to share Cekura Monitoring with the PH community!

Most teams focus solely on whether a voice AI agent reaches the 'correct' outcome, but they often overlook the nuances that actually define the user experience: tone, transcription accuracy, TTS quality, and pronunciation.

While working on scaling to handle thousands of parallel calls, we realized just how easily these small details can degrade at volume. Cekura was built to ensure your agents don’t just work but they sound perfect.

Check out the product and let us know what you think!

Report

8d ago

Cekura

Maker

One of the most common issues we see voice agent makers run into is their agent keeps interrupting the caller. It's frustrating for users and easy to miss during development. With our interruption metric, teams can catch this early and fix it before it reaches real users, and that's just one of the many predefined metrics we offer out of the box, try it now!

Report

8d ago

DaoLens

How are you different from tracing platforms like Braintrust and Galileo ? Except Voice metrics.

Report

8d ago

Cekura

Maker

@nimishg We are E2E conversational AI QA - Some of the big differences:

We run E2E multi turn simulations instead of trace level logging
These platforms does not offer Metric optimizer - without metric optimizer, it takes huge time to fine tune LLM-as-a-judge metrics
We also offer replay of production conversations to ensure the fix is incorporated.

In short we are very deep and verticalized in Conversational AI evals - they are more horizontal general agentic AI evals platforms

Report

8d ago

Cekura

Maker

@nimishg Braintrust/galileo are very horizontal for all llm agents. We are specialised for conversations, our UI, Metrics, dashboards are highly specialised for conversations.

Report

8d ago

Cekura

Maker

So excited to see this live! 🎉

Been working closely on Cekura's monitoring features and what makes this special is how much it closes the loop for conversational AI teams — you're not just testing in pre-prod and hoping for the best, you're getting visibility into what's actually happening in production calls.

This one's been a long time coming! 🚀

Report

8d ago

Cekura

Maker

Really excited to see this out 🎉

Working on alerting and simulation quality made it clear how hard it is to catch subtle regressions early—this is a big step toward making that reliable in production.

Glad to finally have this live 🚀

Report

8d ago

Congratulations on the launch!!

Do you guys also support on prem deployment to ensure privacy?

Report

8d ago

Cekura

Maker

@nikunjagarwal321 We support VPC deployments on customer instance. Additionally:

We sign BAA and DPA with customers
We have PII redaction on our side both from audio as well as transcript

Report

8d ago

Cekura

Maker

@nikunjagarwal321 yes we do

Report

8d ago

The "is it behaving" vs "is it up" distinction is spot on. We've had AI chat agents pass every health check while giving completely wrong answers to customers. Uptime metrics are useless if the AI is confidently hallucinating.

How granular does the sentiment tracking get? Like can it detect when an agent starts being passive aggressive or gives a technically correct but unhelpful response? That's the stuff that kills user trust slowly.

Report

8d ago

Cekura

Maker

@mihir_kanzariya We are currently building turn level sentiment tracking - should be live in a week's time. Currently it gives overall sentiment score but granular feedback on where sentiment turned negative.

We have a metric called relevancy which test whether the agent response is relevant to the user question or not

Report

8d ago

Cekura

Maker

@mihir_kanzariya Sentiment analysis can be made as specific as you want. Our pre-defined metric has 3 states: neutral, positive, negative. But it is very seamless to tune this metric and have many other states. You should be able to create a highly accurate custom metric within 5 mins

Report

8d ago

⁠Is the metrics customizable ? For example I need to define success criteria by peak latency and not mean latency

Report

8d ago

Cekura

Maker

@rishav_mishra3 Yes, Cekura is modular in a way that lets you go from full automation to full control, depending on your needs.

One of our key features is Python based metrics with access to all processed data, so you can measure exactly what you care about, like peak latency instead of mean latency. We also support defining your own success criteria using a flexible rubric style configuration.

Report

8d ago

Cekura

Maker

@rishav_mishra3 yes they are customisable. We expose the code of our latency metric which you can customise to get peak latency instead.

Report

8d ago

1 2 3

•••