Automated QA for Voice AI and Chat AI agents

Start new thread

Cekura - Observe and analyze your voice and chat AI agents

Y Combinator

•8d ago

Out-of-the-box 30+ predefined metrics for analysis on CX, accuracy, conversation and voice quality. Compile perfect LLM judges by annotating just ~20 conversations and auto-improve in Cekura labs. Real-time, segmented dashboards to identify trends in Conversational AI. Smart statistical alerts so that you get notified only when metrics shift from historical baselines. Automated system pings to catch silent production failures.

Replies

Best

What aspects of voice does it capture? I wanted to test on tonality and personality of my voice agent, is it achievable?

Report

8d ago

Cekura

Maker

@pratyush1505 We have voice clairty, gibberishness as a metric to capture the voice aspect of the agent

Report

8d ago

Cekura

Maker

@pratyush1505 For testing the personality of the agent, you can also checkout the Customer Satisfaction (CSAT) and Sentiment metrics

Report

8d ago

Cekura

Maker

@pratyush1505 you can also use voice clarity metric which will check how clear the voice is

Report

8d ago

Can we use Cekura to benchmark STT / TTS separately as well or its only used for Voice AI agents ?

Report

8d ago

Cekura

Maker

@yash_jain49 Yes we have TTS specific metrics like Pronunciation Issues and Voice Quality as well as we measure Transcription accuracy to compare STT.

While simulations are run on Voice AI agents - you can run simulations with same set of test cases and same config on main agents except changing the STT or TTS provider

Report

8d ago

Cekura

Maker

@yash_jain49 Not able to understand you completely . What do you mean by separately here?

Report

8d ago

Are these prefedined metrics all on Audio or text based ?

Report

8d ago

Cekura

Maker

@dhruvjaglan Its a mix. All the voice specific metrics (Silence, latency, interruptions, pronunciation issues etc) need audio. Accuracy metrics (relevancy, hallucination, reponse consistency etc) is text based

Report

8d ago

Cekura

Maker

@dhruvjaglan Some are on text and some are on voice

Report

8d ago

Cekura

Maker

Excited to see this go live! 🚀

Working on our voice simulations and agent stack taught me that reliability is all about the nuances. We built Cekura to give developers the specific visibility needed to master those details and move past the guesswork.

Can't wait to see everyone dive into the labs and start leveling up their agents!

Report

8d ago

The silent production failure detection is what catches my eye. When you're running AI agents in prod, the scariest failures are the ones where nothing errors out - it just gives bad output for days without anyone noticing. Curious how Cekura handles the baseline drift problem - do you need a human to label 'good' vs 'bad' outputs, or does it pick that up automatically?

Report

8d ago

Cekura

Maker

@mykola_kondratiuk Human labelling is recommended for any metric you define - you label only 20 calls in our optimizer to ensure the LLM-as-a-judge covers all the edge cases

Report

8d ago

20 calls to bootstrap the judge is surprisingly low - that's actually pretty approachable for most teams. The LLM-as-judge approach makes sense for scale once you've got those calibration samples.

Report

8d ago

Cekura

Maker

@mykola_kondratiuk Human labelling help fine tune the metric and make it highly accurate for the good/bad identification. And at scale this metric then goes on and evaluate 1000s of calls with very high accuracy

Report

8d ago

Right - the labelling bootstraps the judge, then the judge scales. Makes sense as a two-phase approach.

Report

7d ago

Cekura

Maker

@mykola_kondratiuk Exactly!

Report

7d ago

glad it landed well. good luck with the launch!

Report

7d ago

Cekura

Maker

Huge congrats to the team! 🚀 such a solid group of builders. This solves a lot of different use cases - instant alerting, human in the loop reviews, A/B testing and more without feeling cluttered.

Report

8d ago

Congrats on #2, @Cekura

Just flagged a UX loop on mobile signup ,it's showing 'User Not Found' and forcing a logout for new users. It looks like a system crash rather than a filter.

I've got the fix details ready to help you keep your conversion high today. Where can I send the report?

Report

8d ago

Cekura

Maker

@sergioding Oh Can you share a report at support@cekura.ai - will be really helpful

Report

8d ago

@kabra_sidhant Thanks, Just sent the fix report and the UX optimization steps to your support email.

Report

8d ago

Cekura

Maker

@sergioding Likely caused by unsupported email domains Gmail, iCloud, and other public providers aren’t allowed, which triggers the ‘User Not Found’ . Recommend using a work email (e.g., @cekura.ai).

Report

8d ago

@dddharamveeer Exactly, it’s the Gmail/iCloud filter triggering a 'User Not Found' state. On mobile, that feels like a system crash to a new user. I've mapped out the fix to keep your enterprise funnel clean while you're at #3. Let's keep the momentum going!

Report

8d ago

Congrats on the launch, team!

What are the challenges come when teams tries to build this internally?

Report

8d ago

Cekura

Maker

@himank_jain1 Building and optimizing each metric over a dataset takes months of engineering left and fine-tuning. Lot of these metrics are not even LLM based but uses huerestics and statistical models. Having said that, team can build a basic analytics dashboard if voice metrics or smart alerts is not that important and they only need to analyze few specific workflow metric only

Report

8d ago

Cekura

Maker

@himank_jain1 Another challenge arises when a new LLM enters the market. If we want to switch because the new model is better or because the old one is being deprecated—we have to re-optimize all our prompt metrics against the eval set, which is a huge undertaking. This makes the eval set the most important factor; it stays constant, while the prompts change regularly to adapt to new LLMs.

Report

7d ago

Cekura

Maker

Super excited to see this out!

Got to work closely on the metrics side of things. Seeing it come together into something teams can actually rely on in production is incredibly satisfying.

Huge shoutout to the team for pushing this across the finish line.

Report

8d ago

Cekura

Big congrats to the @Cekura team on the launch! 🚀

Report

8d ago

1 2 3

•••