Selene by Atla
Frontier models to evaluate generative AI
136 followers
Frontier models to evaluate generative AI
136 followers
Find and fix AI mistakes at scale, and build more reliable GenAI applications. Use our LLM-as-a-Judge to test and evaluate prompts and model versions.






Surgeflow
🚀 Congrats on launching Selene 1 on Product Hunt! 🎉 This looks like a game-changer for anyone building with AI—finally, an evaluation tool that’s both accurate and scalable.
I love how Selene 1 tackles the inconsistency of general-purpose LLMs as evaluators. The fact that it outperforms models like GPT-4o and Claude 3.5 Sonnet across multiple benchmarks is super impressive! 👏
One question: Since evaluations can be very domain-specific, have you considered allowing users to fine-tune Selene 1 itself for their niche use cases? That could add another layer of customization for teams working in highly specialized fields.
Excited to see how this evolves! 🚀
Atla
@rocsheh Hi Zepeng! Great question. Selene was trained to be a 'general purpose evaluator' so it’s capable of handling a wide variety of eval tasks across domains. To help steer Selene, we released our Alignment Platform, where users can create custom eval metrics that match their niche use case. The tool helps users generate an eval prompt, test it against test cases (which can be their own sample data), and refine the prompt until its fit for use. Would be interested to hear your feedback if you try it out!
Stripo.email
Reliable AI evaluation is a huge challenge, and Selene 1 looks like a major step forward in making AI performance more measurable and scalable.
Atla
@marianna_tymchuk Thanks for the support, Marianna! Exactly. And if you can't measure it, you can't improve it (stealing that from our CTO).
Wow, very interesting. Will try it out!