Tessl

Optimize agents skills, ship 3× better code.

321 followers

Optimize agents skills, ship 3× better code.

321 followers

Visit website

AI Metrics and Evaluation

•

LLM Developer Tools

Tessl helps developers evaluate and optimize agent skills, so you focus on building with smarter AI agents instead of fixing bugs and hallucinations - no signup required ➡️ tessl.io/registry/skills/submit

Free

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team

Figr AI: UX Agent for Product Teams — Learns your product. Thinks through UX

Learns your product. Thinks through UX

Promoted

Really strong launch. The "package manager for agent skills" framing is exactly where teams are heading as multi-agent workflows get real.

What stood out to me is the eval + optimization loop: most teams can feel output drift but can’t isolate whether the issue is model choice, prompt context, or skill quality. If Tessl can make that diagnosis explicit (before/after score deltas per skill revision), that’s high leverage for shipping faster with fewer hallucination regressions.

Curious if you’re planning CI hooks so teams can gate skill changes on eval thresholds the same way we gate tests/lint in code pipelines.

Report

14d ago

Tessl

Maker

@danielsinewe Spot on about the diagnostic gap - isolating whether drift is coming from the model, prompt context, or skill quality is exactly what the eval loop surfaces. Before/after score deltas per skill revision are live today - perhaps we need to surface it better?

The CI hooks idea is really interesting, and we've been thinking a lot about it. I want to make sure I'm tracking what you're imagining though - are you thinking gating at the PR level, deployment level, or something else? Keen to get your thoughts on this!

Report

13d ago

BrandingStudio.ai

The "package manager for agent skills" framing clicks immediately, especially coming from the Snyk founder. The dependency management and security signal problem in traditional code is exactly what's now happening with agent skills, and most teams don't have the tooling to even see it yet.

The ElevenLabs 2x result is a concrete proof point that avoids the usual vague benchmark claims. That kind of before/after is what actually convinces teams to adopt a new tool in their workflow.

I use Claude Code daily for building my own AI platform and the skill quality problem is very real. You genuinely can't tell if a skill is helping or quietly degrading outputs without proper evals. This fills a gap that's been easy to ignore until it hurts. Congrats on the launch!

Report

14d ago

Tessl

Maker

@joao_seabra bad dependency in traditional code throws an error, a bad skill just makes your agent slightly worse, and you end up blaming the model instead of the context. 😄 skills are in that exact moment right now. as you're using Claude Code daily, try running an eval on one of your core skills. Would love to hear what you're building with them!

Report

14d ago

Tools like Tessl help bring the engineering mindset to context engineering. It's like Grammarly for skills , something actionable , finally we can go beyond the simple "vibe check".

Report

14d ago

Tessl

Maker

@patrickdebois I like the Grammarly for skills analogy! Agreed that steering away from vibe checking is the path forward, and context evaluations + optimization is our solution to this problem.

You've watched the DevOps toolchain mature from chaos 😄 to CI/CD. Do you see already see a similar standardization arc happening with agent skills / context engineering?

Report

14d ago

This feels like the missing layer in the agent stack.

Everyone’s shipping “skills” but very few are measuring whether they actually improve outcomes. The versioning + evaluation angle makes a lot of sense.

Curious how you think about benchmarking across models? A skill might behave very differently between Claude / GPT / open models.

Congrats on the launch — this could quietly become core infra for serious agent teams.

Report

13d ago

I'm still new and still learning. I do find this quiet interesting and would like to learn more about it...? What kind of tech is used for this type of "training"? How hard was it to make this, and how did u manage to perform this well?

Report

10d ago

Good tool team! Currently working across documentation mainly. Just tested out Tessl, very easy to use, good user experience!

Report

14d ago

Tessl

Maker

@krupali_trivedi awesome to hear - glad the experience felt smooth! Documentation is one of the most common starting points we see. Out of curiosity, did the eval surface anything surprising about how agents were using your docs? That's usually the "aha moment" for folks

Report

14d ago

I've been building with Claude Code and the difference between a well-written skill/instruction set and a mediocre one is night and day. The ElevenLabs case study is a compelling proof point. Most people are still treating agent instructions as an afterthought, just a markdown file in the repo. The idea that you can actually evaluate and iterate on them like any other piece of software makes a lot of sense.

Congrats on the launch! Excited to see where this goes.

Report

14d ago

1 2 3

Reviews

Most Informative

The ElevenLabs 2x result is a concrete proof point that avoids the usual vague benchmark claims. That kind of before/after is what actually convinces teams to adopt a new tool in their workflow.