Michael Seibel

Evidently AI - Open-source evaluations and observability for LLM apps

byβ€’
Evidently is an open-source framework to evaluate, test and monitor AI-powered apps.

πŸ“š 100+ built-in checks, from classification to RAG.
🚦 Both offline evals and live monitoring.
πŸ›  Easily add custom metrics and LLM judges.

Add a comment

Replies

Best
Hamza Tahir
Amazing team + product. Been using evidently for years now and can confidently say its one of the best in the market!
Elena Samuylova
@hamza_tahir Thanks for the support ❀️
Star Boat
Excited to see Evidently AI launch! This could truly change the way we monitor AI-powered apps. Looking forward to experiencing those powerful functions Thanks for your hard work, @elenasamuylova!
Elena Samuylova
@star_boat Thanks for the support! Let us know how it works for you!
Tony Han
Love this are toolkits as part of Evidently AI to cover some of the most important AIOps use cases like monitoring model hallucinations, etc. I'm new to building products with AI, I'm curious if there are learning resources for someone like me to learn more about topics like how to test AI generated results. Or does the AI has some suggestions on what methods to use? Also, big props for making the Evidently platform open source - you have my support for making this available to the world! Congrats on the launch @elenasamuylova and team!
Elena Samuylova
@tonyhanded Hi Tony! Thanks a lot for the support. We are huge believers in the open-source, too! We are working on quite a lot of content on this topic in our blog (and we'll soon add more!): you may find it useful. For example, this recent blog on regression testing for LLM outputs https://www.evidentlyai.com/blog... Soon more to come! One popular approach we implemented is using LLM as a judge, where you effectively use another LLM / different prompt to label your outputs by certain criteria (we recommend using a binary True/False scale here). This is one of the approaches we implemented with this release!
Tony Han
@elenasamuylova very interesting thanks for sharing this!
Lemuel Albotra
Hi Elena and the Evidently AI team! πŸš€ It’s fantastic to see Evidently AI expanding into the LLM space. The ability to evaluate and monitor LLM performance is crucial, especially as these models become more integral to various applications. Can you provide an example or case study demonstrating how Evidently AI has successfully helped a project manage LLM evaluation and quality assurance, particularly highlighting the impact on model performance or user experience? Looking forward to seeing how Evidently AI evolves in this new domain and thanks for making such valuable tools open-source! πŸ› οΈπŸ“Š
Elena Samuylova
@lemuel_albotra1 Thanks for the support! In the LLM space, we see a lot of users who are still in the pre-production phase (they are iterating with beta users or testing the tool before rolling it out). Here, the main value of bringing in automated evals goes to the speed of iteration; otherwise, there is a lot of manual log analysis to find, e.g., inadequate responses and trying to figure out the failure modes. Then, once you identify those and want to fix them (e.g., by changing a prompt in a certain way), you can run a regression test efficiently to see if the responses of your LLM change and where. So it's a lot about making the iteration process smoother - and, of course, actually improving the quality of the product measurably, not just "based on vibes."
Ilia Semenov
@elenasamuylova, @emeli_dral Way to go! The space needs more OSS!
Elena Samuylova
@emeli_dral @iliasemenov Thanks for your support! πŸŽ‰
Suman Ray
Nice system.
Elena Samuylova
@suman_ray4 Thanks!
Ivan Shcheklein
Congrats @elenasamuylova and @emeli_dral . It's an amazing product. I've been recommending your course to our users and customer (https://www.evidentlyai.com/ml-o...) - it's one of the best I think. It's an exciting progress on the LLM side.
Elena Samuylova
@emeli_dral @shcheklein Thanks for your support!πŸš€
Daniel W. Chen
This solves so many of my current pain points working with LLMs. I'm developing AI mentors and therapists and I need a better way to run evals for each update and prompt optimization. Upvoting, bookmarking, and going to try this out! Thank you Elena!
Elena Samuylova
@danielwchen Thank you! Let us know how it works for you. We see a lot of usage with healthcare-related apps; these are the use cases where quality is paramount - you can't just ship on vibes!
Ivelin Kozarev
love that this is open-source, planning to explore it more
Elena Samuylova
@ivelin_kozarev Thank you! Let us know your feedback as you try it!
Jenifer De Figueiredo
Congrats Elena and team! Taking evaluation to the next level with LLMs! πŸ”₯
Elena Samuylova
@jenifer_de_figueiredo Thanks for the support Jenifer! πŸš€πŸš€πŸš€