Michael Seibel

Evidently AI - Open-source evaluations and observability for LLM apps

byβ€’
Evidently is an open-source framework to evaluate, test and monitor AI-powered apps.

πŸ“š 100+ built-in checks, from classification to RAG.
🚦 Both offline evals and live monitoring.
πŸ›  Easily add custom metrics and LLM judges.

Add a comment

Replies

Best
Devin doggy
Elena, this is an impressive step for Evidently! Expanding into LLM evaluation is so needed in today's landscape. With the variety of built-in checks and the flexibility to add custom metrics, it really simplifies a complex area. The challenges you outlined are so relatableβ€”having a structured approach will be a game-changer for many developers. Excited to see how the community will contribute to this project! Keep pushing those boundaries! πŸŽ‰
Elena Samuylova
@devindsgbyq Thank you! Let us know if you have the chance to check it out!
Arseny Kravchenko
Wish you luck! AI reliability is still a massive problem
Elena Samuylova
@arseny_info Thanks for the support! Hopefully we can contribute to solving in. Looking forward to the feedback from the community πŸš€
Mikhail Rozhkov
Congrats with the launch! Great milestone @elenasamuylova and @emeli_dral! Evidenlty is part of my MLOps stack and I recommend it to my friends and clients! I'm happy to contribute to Evidently and look forward to collaboration!
Elena Samuylova
@emeli_dral @mikhail_rozhkov Thanks for your support! I hope Evidently will fit in the updated LLMOps stack as well πŸš€πŸš€
Alexey Dral
+1 amazing team +1 amazing product in addition: friendly open-source support (easy to add suggestions and see it in the next release)
Elena Samuylova
@aadral Thanks for your support! Looking forward to LLM-related feature requests πŸš€
Alex
Congrats on the launch, @elenasamuylova! πŸŽ‰ It's amazing to see Evidently evolving into the realm of LLMs with such robust features. The focus on a quality workflow is crucial for us as we develop AI-powered applications. I love the idea of easily integrating custom metrics and having that interactive summary for evaluations. Looking forward to exploring the new capabilities and contributing to the community! Keep up the great work!
Elena Samuylova
@zanereed596 Thank you! πŸš€
Vasili Shynkarenka
congrats with the launch! love the video :)
Elena Samuylova
@flreln Open-source production! :)
bunga
ou explain how LLM judge templates empower users to define custom evaluation criteria and create tailored prompts.
Elena Samuylova
@bunga_trisnulia We make it easy for the user to focus only on the evaluation's contents (for example, write that "I want to label responses as concise or verbose") without thinking about how to write the rest of the evaluation prompt. We automatically add all the other parts, such as formatting prompts as JSON to get structured output, asking LLM to provide the reasoning before outputting the label, etc. Basically, we help the users to define only what's strictly necessary but do all the boilerplate on the background.
Eugene Ter-Avakyan
Congrats on the launch, Elena and Emeli! It's nice to see it released in open source!
Elena Samuylova
@eugene_ter_avakyan Thank you! We are looking forward to the community input πŸš€
Bryan
Congrats on this launch, Elena! The transition from traditional ML to LLMs is a game changer. The ability to customize metrics and have a monitoring dashboard will definitely help many makers in evaluating their AI apps. Can't wait to see how the community uses Evidently! πŸš€
Elena Samuylova
@dance17219 Thank you! πŸš€
Ema Elisi
Hey @elenasamuylova! Exciting stuff with Evidently stepping into the LLM space! The challenges you've outlined around evaluating generative AI are real. Love the
Elena Samuylova
@ema_elisi Thanks for the support!