Cekura - Observe and analyze your voice and chat AI agents
by•
Out-of-the-box 30+ predefined metrics for analysis on CX, accuracy, conversation and voice quality. Compile perfect LLM judges by annotating just ~20 conversations and auto-improve in Cekura labs. Real-time, segmented dashboards to identify trends in Conversational AI. Smart statistical alerts so that you get notified only when metrics shift from historical baselines. Automated system pings to catch silent production failures.



Replies
Cekura
Hi Product Hunt! 👋
We are excited to launch Cekura Monitoring for Voice and Chat AI companies. Most monitoring tools tell you if your AI is up. Cekura tells you if it is behaving.
When we had first launched Cekura QA, we thought we had solved the problem for both testing and monitoring . But as our users scaled, we noticed a painful pattern: While pre-production QA was automated, teams were still spending dozens of hours manually listening to thousands of calls.
The two big blockers we saw were:
The Scaling Wall: Defining and optimizing custom metrics was taking too long, forcing teams back into manual spot-checks.
Production Blindspot: Standard LLM metrics misses the Customer Experience in Voice AI - things like agent tone and customer sentiment that actually defines customer success.
We have rebuilt the monitoring layer from the ground up to solve this. Cekura Monitoring turns that "wall of noisy logs" into actionable signals.
🚀 What’s New in Cekura Monitoring:
30+ Predefined Metric Suite: We track what actually breaks Voice and Chat agents across four critical categories:
Speech Quality: Voice clarity, pronunciation, and gibberish detection.
Conversational Flow: Silences, interruptions (barge-ins), and termination triggers.
Accuracy & Logic: Hallucinations, transcription accuracy, and relevancy.
Customer Experience: CSAT, Sentiment analysis, and drop-off points.
Metric Optimizer: Stop "vibes-based" prompt engineering. Define a metric (e.g., Successful User Authentication), tag 20 calls in our Labs interface, and our optimizer "compiles" a prompt that aligns with your specific feedback.
Statistical Intelligence: No more fixed, noisy thresholds. Our Alerting Engine learns your agent's baseline and only pings Slack when metrics shift 2σ from historical norms.
Automated Cron Jobs: Set up recurring health checks to simulate production conversations. Catch silent failures and regressions before your customers do.
Visual Dashboards: Real-time distribution charts for each metric. Views customized for each stakeholder
Who is this for?
Teams scaling Voice & Chat AI who are tired of listening to calls manually and need a way to prove their agents are actually working.
Sign up and try for free at cekura.ai or drop your questions below! We would love to hear how you’re currently handling Voice and Chat AI in production👇
@kabra_sidhant Many congratulations on the launch, Sidhant! I've been tracking it since the Vocera days, it's evolved impressively and keeps getting better. Thrilled to see the buzz in voice AI communities especially on Reddit. Onwards and upwards! :)
Cekura
@rohanrecommends Thanks Rohan - also for all your guidance on best practices for Product Hunt Launches!
Cekura
Blind spots in production voice agents are brutal — you don't know your agent is skipping verification steps or missing required disclosures until a compliance team surfaces it weeks later. Monitoring 100% of live calls at the session level rather than spot-checking is the only real fix. The P50/P90 latency tracking and interruption detection on production traffic is also underrated — that's where infrastructure regressions hide.
Cekura
We are thrilled to share Cekura Monitoring with the PH community!
Most teams focus solely on whether a voice AI agent reaches the 'correct' outcome, but they often overlook the nuances that actually define the user experience: tone, transcription accuracy, TTS quality, and pronunciation.
While working on scaling to handle thousands of parallel calls, we realized just how easily these small details can degrade at volume. Cekura was built to ensure your agents don’t just work but they sound perfect.
Check out the product and let us know what you think!
Cekura
One of the most common issues we see voice agent makers run into is their agent keeps interrupting the caller. It's frustrating for users and easy to miss during development. With our interruption metric, teams can catch this early and fix it before it reaches real users, and that's just one of the many predefined metrics we offer out of the box, try it now!
DaoLens
How are you different from tracing platforms like Braintrust and Galileo ? Except Voice metrics.
Cekura
@nimishg We are E2E conversational AI QA - Some of the big differences:
We run E2E multi turn simulations instead of trace level logging
These platforms does not offer Metric optimizer - without metric optimizer, it takes huge time to fine tune LLM-as-a-judge metrics
We also offer replay of production conversations to ensure the fix is incorporated.
In short we are very deep and verticalized in Conversational AI evals - they are more horizontal general agentic AI evals platforms
Cekura
@nimishg Braintrust/galileo are very horizontal for all llm agents. We are specialised for conversations, our UI, Metrics, dashboards are highly specialised for conversations.
Cekura
So excited to see this live! 🎉
Been working closely on Cekura's monitoring features and what makes this special is how much it closes the loop for conversational AI teams — you're not just testing in pre-prod and hoping for the best, you're getting visibility into what's actually happening in production calls.
This one's been a long time coming! 🚀
Cekura
Really excited to see this out 🎉
Working on alerting and simulation quality made it clear how hard it is to catch subtle regressions early—this is a big step toward making that reliable in production.
Glad to finally have this live 🚀
Congratulations on the launch!!
Do you guys also support on prem deployment to ensure privacy?
Cekura
@nikunjagarwal321 We support VPC deployments on customer instance. Additionally:
We sign BAA and DPA with customers
We have PII redaction on our side both from audio as well as transcript
Cekura
@nikunjagarwal321 yes we do
The "is it behaving" vs "is it up" distinction is spot on. We've had AI chat agents pass every health check while giving completely wrong answers to customers. Uptime metrics are useless if the AI is confidently hallucinating.
How granular does the sentiment tracking get? Like can it detect when an agent starts being passive aggressive or gives a technically correct but unhelpful response? That's the stuff that kills user trust slowly.
Cekura
@mihir_kanzariya We are currently building turn level sentiment tracking - should be live in a week's time. Currently it gives overall sentiment score but granular feedback on where sentiment turned negative.
We have a metric called relevancy which test whether the agent response is relevant to the user question or not
Cekura
@mihir_kanzariya Sentiment analysis can be made as specific as you want. Our pre-defined metric has 3 states: neutral, positive, negative. But it is very seamless to tune this metric and have many other states. You should be able to create a highly accurate custom metric within 5 mins
Is the metrics customizable ? For example I need to define success criteria by peak latency and not mean latency
Cekura
@rishav_mishra3 Yes, Cekura is modular in a way that lets you go from full automation to full control, depending on your needs.
One of our key features is Python based metrics with access to all processed data, so you can measure exactly what you care about, like peak latency instead of mean latency. We also support defining your own success criteria using a flexible rubric style configuration.
Cekura
@rishav_mishra3 yes they are customisable. We expose the code of our latency metric which you can customise to get peak latency instead.