Launching today

Spec27
Spec-driven testing for AI agents and AI apps
17 followers
Spec-driven testing for AI agents and AI apps
17 followers
Spec27 is a validation platform for AI agents. It helps teams move beyond manual, vibes-based testing by using machine-readable specifications to generate broader test coverage, catch regressions earlier, and validate both in-house and third-party systems without needing SDK integration or code-level access.








Spec27
@steven_willmott a getting deep relevant test coverage for LLMs is basically Mission Impossible right now because the input space is infinite. Specifying what the agent should be robust to, rather than just hoping it doesn't break, is a huge mental shift. Does Spec27 handle adversarial prompt generation as part of the automatic test sets?
Spec27
Thanks @priya_kushwaha1, yes, that's the biggest part of the challenge: infinite input space + it's not necessarily continuous (so a tiny shift in input might lead to a massive shift in the output). In response to your question, yes, in the platform we have a growing list of adversarial methods that perturb the inputs in different way. You can select which to use, and it'll effectively do a search in that adjacent input space. We use semantic similarity to keep the tests similar despite the variation.
@steven_willmott really helpful context, especially the bit about non-continuous input spaces. i'll check out the platform and see how it handles our specific edge cases. congrats on the ship!
Typeform
Spec27
Thanks so much@picsoung - means a lot! Doesn't matter which framework you're using to build the agents. We have a Javascript WASM engine that connects to pretty much anything (and we'll help if it's custom). The testing makes no assumptions about how the agent is built or about your access to the backend.
What are you building agents in?
Spec27
Excited to be part of the team launching /Spec27 today! I care a lot about making AI safer in practice, so it’s really nice to share something we’ve been building around that. Happy to chat with anyone working on agents, evals, or validation :)