Yatharth Nehra

Promptly - An AI Cost Optimization Infrastructure for LLM Applications

by
Promptly is an OpenAI-compatible proxy that cuts your LLM spend by up to 60% with smart routing, prompt optimization, semantic caching, and context pruning. Works with OpenAI, Anthropic, and Google.

Add a comment

Replies

Best
Yatharth Nehra
Promptly started from a problem we kept seeing while building AI applications. LLMs are incredibly powerful, but once you start using them in production, the costs and inefficiencies add up quickly. Long prompts, repeated context, unnecessary tokens, and lack of caching can make AI workflows much more expensive than they need to be. We realized that most teams were solving the same problems over and over, building custom caching layers, trimming prompts manually, or writing complex infrastructure just to control costs and performance. That’s what inspired Promptly. We wanted to create a simple optimization layer between your app and the model, something developers could adopt instantly without changing their existing workflows. So Promptly works as an OpenAI-compatible proxy or SDK. You point your app to Promptly, and it automatically handles things like prompt optimization, context pruning, semantic caching, and smart routing. During development, the biggest evolution was realizing that developers want minimal friction. Early versions had more configuration, but we simplified it significantly, making it a drop-in integration where you can just change the base URL or use the SDK and everything works. Our goal is simple: Help developers run AI systems more efficiently without building a lot of infrastructure.
Akshat Gupta

Really excited to finally share Promptly 🚀

If you’re working with LLMs, you’ve probably seen how quickly costs can scale in production.

Promptly helps optimize requests, reduce unnecessary tokens, and make AI systems more efficient - without changing your existing setup.

Would love to hear your thoughts and feedback 🙌

Shubham Goel

What started as a constant frustration while building AI has now turned into something we truly believe in.

The future of AI isn’t just about powerful models it’s about making them efficient, scalable, and accessible.

Really excited to finally share Promptly 🚀