Excited to see Evidently AI launch! This could truly change the way we monitor AI-powered apps. Looking forward to experiencing those powerful functions Thanks for your hard work, @elenasamuylova!
@star_boat Thanks for the support! Let us know how it works for you!
Report
Love this are toolkits as part of Evidently AI to cover some of the most important AIOps use cases like monitoring model hallucinations, etc. I'm new to building products with AI, I'm curious if there are learning resources for someone like me to learn more about topics like how to test AI generated results. Or does the AI has some suggestions on what methods to use?
Also, big props for making the Evidently platform open source - you have my support for making this available to the world!
Congrats on the launch @elenasamuylova and team!
@tonyhanded Hi Tony! Thanks a lot for the support. We are huge believers in the open-source, too!
We are working on quite a lot of content on this topic in our blog (and we'll soon add more!): you may find it useful. For example, this recent blog on regression testing for LLM outputs https://www.evidentlyai.com/blog... Soon more to come!
One popular approach we implemented is using LLM as a judge, where you effectively use another LLM / different prompt to label your outputs by certain criteria (we recommend using a binary True/False scale here). This is one of the approaches we implemented with this release!
Hi Elena and the Evidently AI team! π
Itβs fantastic to see Evidently AI expanding into the LLM space. The ability to evaluate and monitor LLM performance is crucial, especially as these models become more integral to various applications.
Can you provide an example or case study demonstrating how Evidently AI has successfully helped a project manage LLM evaluation and quality assurance, particularly highlighting the impact on model performance or user experience?
Looking forward to seeing how Evidently AI evolves in this new domain and thanks for making such valuable tools open-source! π οΈπ
@lemuel_albotra1 Thanks for the support! In the LLM space, we see a lot of users who are still in the pre-production phase (they are iterating with beta users or testing the tool before rolling it out).
Here, the main value of bringing in automated evals goes to the speed of iteration; otherwise, there is a lot of manual log analysis to find, e.g., inadequate responses and trying to figure out the failure modes.
Then, once you identify those and want to fix them (e.g., by changing a prompt in a certain way), you can run a regression test efficiently to see if the responses of your LLM change and where. So it's a lot about making the iteration process smoother - and, of course, actually improving the quality of the product measurably, not just "based on vibes."
This solves so many of my current pain points working with LLMs. I'm developing AI mentors and therapists and I need a better way to run evals for each update and prompt optimization. Upvoting, bookmarking, and going to try this out!
Thank you Elena!
@danielwchen Thank you! Let us know how it works for you. We see a lot of usage with healthcare-related apps; these are the use cases where quality is paramount - you can't just ship on vibes!
Report
love that this is open-source, planning to explore it more
Replies
GitTrends
Evidently AI
Evidently AI
Evidently AI
Evidently AI
Cloudthread
Evidently AI
Evidently AI
Evidently AI
Life Note
Evidently AI
Evidently AI
DVC
Evidently AI