Selene by Atla

Frontier models to evaluate generative AI

136 followers

Frontier models to evaluate generative AI

136 followers

Find and fix AI mistakes at scale, and build more reliable GenAI applications. Use our LLM-as-a-Judge to test and evaluate prompts and model versions.

Free Options

Launch tags:API•Developer Tools•Artificial Intelligence

Launch Team / Built With

Figr AI: UX Agent for Product Teams — Learns your product. Thinks through UX

Learns your product. Thinks through UX

Promoted

Surgeflow

🚀 Congrats on launching Selene 1 on Product Hunt! 🎉 This looks like a game-changer for anyone building with AI—finally, an evaluation tool that’s both accurate and scalable.

I love how Selene 1 tackles the inconsistency of general-purpose LLMs as evaluators. The fact that it outperforms models like GPT-4o and Claude 3.5 Sonnet across multiple benchmarks is super impressive! 👏

One question: Since evaluations can be very domain-specific, have you considered allowing users to fine-tune Selene 1 itself for their niche use cases? That could add another layer of customization for teams working in highly specialized fields.

Excited to see how this evolves! 🚀

Report

1yr ago

Atla

Maker

@rocsheh Hi Zepeng! Great question. Selene was trained to be a 'general purpose evaluator' so it’s capable of handling a wide variety of eval tasks across domains. To help steer Selene, we released our Alignment Platform, where users can create custom eval metrics that match their niche use case. The tool helps users generate an eval prompt, test it against test cases (which can be their own sample data), and refine the prompt until its fit for use. Would be interested to hear your feedback if you try it out!

Report

12mo ago

Stripo.email

Reliable AI evaluation is a huge challenge, and Selene 1 looks like a major step forward in making AI performance more measurable and scalable.

Report

1yr ago

Atla

Maker

@marianna_tymchuk Thanks for the support, Marianna! Exactly. And if you can't measure it, you can't improve it (stealing that from our CTO).

Report

1yr ago

Wow, very interesting. Will try it out!

Report

12mo ago

1 2

Reviews

Most Informative