Collaborative AI observability platform

Start new thread

Evidently AI - Open-source evaluations and observability for LLM apps

Cargo

•1yr ago

Evidently is an open-source framework to evaluate, test and monitor AI-powered apps.

📚 100+ built-in checks, from classification to RAG.
🚦 Both offline evals and live monitoring.
🛠 Easily add custom metrics and LLM judges.

Replies

Best

GitTrends

Amazing team + product. Been using evidently for years now and can confidently say its one of the best in the market!

Report

1yr ago

Evidently AI

Maker

@hamza_tahir Thanks for the support ❤️

Report

1yr ago

Excited to see Evidently AI launch! This could truly change the way we monitor AI-powered apps. Looking forward to experiencing those powerful functions Thanks for your hard work, @elenasamuylova!

Report

1yr ago

Evidently AI

Maker

@star_boat Thanks for the support! Let us know how it works for you!

Report

1yr ago

Love this are toolkits as part of Evidently AI to cover some of the most important AIOps use cases like monitoring model hallucinations, etc. I'm new to building products with AI, I'm curious if there are learning resources for someone like me to learn more about topics like how to test AI generated results. Or does the AI has some suggestions on what methods to use? Also, big props for making the Evidently platform open source - you have my support for making this available to the world! Congrats on the launch @elenasamuylova and team!

Report

1yr ago

Evidently AI

Maker

@tonyhanded Hi Tony! Thanks a lot for the support. We are huge believers in the open-source, too! We are working on quite a lot of content on this topic in our blog (and we'll soon add more!): you may find it useful. For example, this recent blog on regression testing for LLM outputs https://www.evidentlyai.com/blog... Soon more to come! One popular approach we implemented is using LLM as a judge, where you effectively use another LLM / different prompt to label your outputs by certain criteria (we recommend using a binary True/False scale here). This is one of the approaches we implemented with this release!

Report

1yr ago

@elenasamuylova very interesting thanks for sharing this!

Report

1yr ago

Hi Elena and the Evidently AI team! 🚀 It’s fantastic to see Evidently AI expanding into the LLM space. The ability to evaluate and monitor LLM performance is crucial, especially as these models become more integral to various applications. Can you provide an example or case study demonstrating how Evidently AI has successfully helped a project manage LLM evaluation and quality assurance, particularly highlighting the impact on model performance or user experience? Looking forward to seeing how Evidently AI evolves in this new domain and thanks for making such valuable tools open-source! 🛠️📊

Report

1yr ago

Evidently AI

Maker

@lemuel_albotra1 Thanks for the support! In the LLM space, we see a lot of users who are still in the pre-production phase (they are iterating with beta users or testing the tool before rolling it out). Here, the main value of bringing in automated evals goes to the speed of iteration; otherwise, there is a lot of manual log analysis to find, e.g., inadequate responses and trying to figure out the failure modes. Then, once you identify those and want to fix them (e.g., by changing a prompt in a certain way), you can run a regression test efficiently to see if the responses of your LLM change and where. So it's a lot about making the iteration process smoother - and, of course, actually improving the quality of the product measurably, not just "based on vibes."

Report

1yr ago

Cloudthread

@elenasamuylova, @emeli_dral Way to go! The space needs more OSS!

Report

1yr ago

Evidently AI

Maker

@emeli_dral @iliasemenov Thanks for your support! 🎉

Report

1yr ago

Nice system.

Report

1yr ago

Evidently AI

Maker

@suman_ray4 Thanks!

Report

1yr ago

Congrats @elenasamuylova and @emeli_dral . It's an amazing product. I've been recommending your course to our users and customer (https://www.evidentlyai.com/ml-o...) - it's one of the best I think. It's an exciting progress on the LLM side.

Report

1yr ago

Evidently AI

Maker

@emeli_dral @shcheklein Thanks for your support!🚀

Report

1yr ago

Life Note

This solves so many of my current pain points working with LLMs. I'm developing AI mentors and therapists and I need a better way to run evals for each update and prompt optimization. Upvoting, bookmarking, and going to try this out! Thank you Elena!

Report

1yr ago

Evidently AI

Maker

@danielwchen Thank you! Let us know how it works for you. We see a lot of usage with healthcare-related apps; these are the use cases where quality is paramount - you can't just ship on vibes!

Report

1yr ago

love that this is open-source, planning to explore it more

Report

1yr ago

Evidently AI

Maker

@ivelin_kozarev Thank you! Let us know your feedback as you try it!

Report

1yr ago

DVC

Congrats Elena and team! Taking evaluation to the next level with LLMs! 🔥

Report

1yr ago

Evidently AI

Maker

@jenifer_de_figueiredo Thanks for the support Jenifer! 🚀🚀🚀

Report

1yr ago

1 2 3 4