Automated QA for Voice AI and Chat AI agents

Start new thread

Cekura - Observe and analyze your voice and chat AI agents

Y Combinator

•8d ago

Out-of-the-box 30+ predefined metrics for analysis on CX, accuracy, conversation and voice quality. Compile perfect LLM judges by annotating just ~20 conversations and auto-improve in Cekura labs. Real-time, segmented dashboards to identify trends in Conversational AI. Smart statistical alerts so that you get notified only when metrics shift from historical baselines. Automated system pings to catch silent production failures.

Replies

Best

Crustdata

Congratulations on the launch team @Cekura

Report

8d ago

Cekura

Maker

Thanks@manmohit

Report

8d ago

Cekura

Maker

Thanks a lot @manmohit

Report

8d ago

⁠Is the metrics customizable @kabra_sidhant

Report

8d ago

Cekura

Maker

@humza_sheikh1 You can define Python-based custom metrics in Cekura with direct access to all processed call data, so you measure exactly what matters to you. You can also define your own success criteria using a rubric-style setup tailored to your use case. The platform is fully modular, so you can go from full automation to fine-grained control depending on what you need.

Report

8d ago

Cekura

Maker

@humza_sheikh1 @janhvi_nandwani1 Just to add to it - you can even use our pred-defined metrics for eg: interruptions to define your success criteria. eg: if agent interrupts customer more than n times in the call, you can define an interruption metric failure

Report

8d ago

Spill

Love the sped at which this team ships! I was curious do you also have plans to roll out observability for images/video agents?

Report

8d ago

Cekura

Maker

@vishruth_n Currently we are focussed only on voice and chat modality. We have it our vision to support simulations and observability across modalities

Report

8d ago

Placekey

this is super duper cool. future of voice

Report

8d ago

Cekura

Maker

Thanks@auren

Report

8d ago

Cekura

Maker

@auren Thank you so much!!

Report

8d ago

This is something we've been looking for. We deploy voice and chat AI agents for businesses (support, qualification, scheduling) and QA has always been the manual bottleneck — listening to call recordings, checking if the agent followed the script, catching edge cases.

The 30+ predefined metrics and CI/CD integration is exactly what's needed to ship agent updates with confidence. Do you support Vapi-based voice agents out of the box, or does it require custom integration?

Report

8d ago

Cekura

Maker

@ksagachev Yes, Vapi is supported out of the box, no custom integration needed. Takes <5 min to setup.

Report

8d ago

Cekura

Maker

@ksagachev We have a very deep integration with vapi. It should feel seemless

Report

8d ago

Cekura

Maker

@ksagachev We have a native integration with Vapi for sending production conversations, tool calls and to run outbound simulations automatically

Report

7d ago

@kabra_sidhant congrats on the launch and great to see as how Cekura shifts the focus from “ is the AI up ?” to " is the AI behaving correctly ? " for voice and chat agents. it was a missing layer for teams shipping real‑world conversational AI at scale. but how do you handle wildly different voice/chat‑agent use cases , any approach ?

Report

8d ago

Cekura

Maker

@kabra_sidhant @randhir_kumar7 We find that all conversational agents (chat or voice) need similar metrics to evaluate the content of the conversation - metrics like relevancy, hallucination and customer satisfaction .
Voice agents add complexity, so we have metrics for interruption, latency, pronunciation, and voice quality.
For use-case-specific evaluation (did the agent book the appointment? collect insurance info?) teams can write custom LLM Judge metrics in plain English

Report

8d ago

Struct

This is a massive launch for such a critical problem in conversational agents today. Curious, what are the most important metrics tracked by customers in the healthcare space?

Report

8d ago

Cekura

Maker

@nimeshmc Thanks! Healthcare is one of our most active verticals. Expected outcome is critical - did the agent follow required protocols like HIPAA disclaimers, consent, and verification steps?.

Hallucination detection is equally important - the agent must not invent symptoms, dosages, or medical advice.

Report

8d ago

This feels like Datadog but for AI behavior instead of infrastructure. That's a good positioning. Congratulations!!

Report

8d ago

Cekura

Maker

@zerotox Actually we do test for infrastructure (customers run cron jobs) as well as workflows both but yes we are building Datadog for Conversational AI

Report

7d ago

Nas.io

How do you handle false positives in sentiment or hallucination detection?

Report

7d ago

Cekura

Maker

@nuseir_yassin1 that's where our metric optimizer comes in. You can use it not only for your custom metrics but can also give feedback to our pre-defined metric in case of false positives and auto-improve

Report

7d ago

Congrats on the launch 🚀
Really important problem to solve!

Report

7d ago

Cekura

Maker

@mikita_aliaksandrovich Thanks!

Report

7d ago

1 2 3 4

•••