LLM testing made easy with a spreadsheet-like interface. Score tests with natural language, pattern matching, or code. Optimize LLM apps by experimenting with models, parameters, and prompts. Gain insights from test results and analytics.
This approach could really accelerate AI deployment and feedback cycles. Excited to see how Langtail handles integration with various AI models and if there are tools for automating complex test scenarios!
@evelyn_kumsah Thanks! Take a look, and let me know if you have any questions.
Report
I have used quite a few options out there, and this is probably the nicest UI I've seen.
Now, it does lack (maybe I didn't see it) a couple of things I'd like to see:
- Can I do bulk updates from API?
- How would I unify version control with the prompts in my system.
- Prompts in actual systems are usually composed (i.e. dynamically created) so the evaluation should ideally pick up from that moment on.
- More preset evaluations, particularly for RAG evals
Really great work @petrbrzek, happy to chat if you want to brainstorm some options!
Thanks for the kind words Jorge! I'm definitely happy to chat. Now to your questions:
Yeah, we don't currently have a bulk update from the API, but we're very much flexible and can prioritize stuff which our users want to use.
In terms of version control, it depends how you currently store your prompts. You can even store them in Langtail if you want, and there's also version history. That's usually good if non-developers would like to have access to the prompts and change them as well. But on the other hand, if you store them in your code, there's currently no good way to automatically share it. I'm definitely open for ideas on how to do it nicely.
Regarding dynamically created prompts:
- There's a concept of variables which works in a way that you can send the dynamic data
- It's also possible to use function calling and function handler to get some external data, so that could be useful for testing
- We have definitely ideas for having dynamic variables so you could even use JavaScript functions for getting dynamic data for the prompt
But yeah, as I said, I'm super super happy to chat about it - we want to make the best possible experience for our testing!
Report
Big big congratulations on the 1.0 launch! 🎉 Testing prompts and AI generally can be pretty cumbersome, and it looks like you guys got a really good take on it. The testing feature with multiple models and assertions seems useful and very well thought out 🚀
The proxy feature for gradually adopting existing codebases is also a great idea!
It'd be awesome to craft responseFormat directly in the Langtail interface (specifically in zod or another schema library). However, for the time being (or maybe I just missed the feature?) — it’s great that we can invoke the deployed prompt with the response format option or use the proxy as a workaround.
Thanks @susickypavel. Good idea with the responseFormat. We could definitely support it when you're only interested in text output. It gets a bit tricky with function calling. Overall, Vercel AI SDK probably has the best DX (developer experience).
@raghavendra_devadiga4 Yeah, from our experience, self-hosting is a must-have. Bigger companies need it to keep their data secure and comply with regulations like GDPR and SOC2. It's just the safest option.
Report
Great to see Langtail evolving into a full-fledged testing suite for AI apps! The spreadsheet-style interface is exactly what I've been looking for - been struggling with messy prompt iterations in my recent projects. Love that you've added hosted tools and that AI firewall feature (seriously, prompt injection has been keeping me up at night 😅). The self-hosting option is a huge plus for enterprise teams who need to keep everything in-house. Feels like you guys really listened to the community pain points and delivered. Definitely giving this a spin on my next LLM project! 👍
@_ivan1 Woohoo! Glad to hear that. When you start your next project, hit us up in the chat and we'd be happy to walk you through the setup process and show you around.
Replies
Vela Partners
Macaly
Macaly
Macaly
Trulioo
Macaly
Nawvel
Macaly
Langtail