Edgee

Name: Edgee
Rating: 5.0 (1 reviews)

The AI Gateway that TL;DR tokens

5.0•1 review•

220 followers

The AI Gateway that TL;DR tokens

5.0•1 review•

220 followers

Visit website

AI Infrastructure Tools

•

AI Metrics and Evaluation

•

LLM Developer Tools

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Free Options

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team / Built With

MasterClass On Call Desktop beta — Instant feedback on how you communicate & lead in meetings.

Instant feedback on how you communicate & lead in meetings.

Promoted

Maker

Hey Product Hunt 👋

I’m Sacha, co-founder of Edgee. Thanks for checking us out!

We built Edgee because we kept seeing the same thing everywhere:

AI cost is going crazy!!!

LLMs are easy to try, but once you ship them in production, costs explode and reliability becomes a mess.

Most teams start with direct calls to OpenAI or Anthropic… or simply using a coding assistant... then quickly end up dealing with:

Unpredictable token spend
Multiple provider APIs
Outages / rate limits
Security & privacy constraints
And no real observability across teams

Edgee is an AI Gateway built to reduce LLM costs and simplify production inference.

It gives you a single OpenAI-compatible API across providers, plus a layer of intelligence around inference:

✅ Token compression to remove redundant tokens and cut costs, with no semantic loss
✅ Routing & fallbacks across providers
✅ Observability + cost tracking you can trust
✅ Privacy & security controls (ZDR, BYOK...)
✅ Support for public + private models,
✅ & Edge Tools 🚀

We're launching early and working closely with a small group of design partners, so feedback (even brutal feedback 😅) would mean a lot.

Happy to answer any questions here, and I’d love to hear how you’re handling LLM infra in production today!

Sacha

Report

1mo ago

We're experimenting with cheaper models to control costs, but quality suffers.

Can Edgee help us stay on expensive models but reduce token usage instead?

Report

26d ago

@pierregodret Yes, that’s exactly what Edgee does.

Edgee optimizes your prompts at the edge using intelligent token compression, removing redundancy while preserving meaning, then forwards the compressed request to your LLM provider of choice. You can also tag requests with metadata to track usage/costs and get alerts when spend spikes.

Happy to discuss this further if you’d like.

Report

26d ago

Maker

Absolutely @pierregodret . With our token-compression model, the LLM bill mechanically decreases, so it's actually a good opportunity to afford a slightly more expensive model... for the same price ;)

Report

26d ago

@sachamorard But how do you ensure that critical context is not lost after compression.
How do you evaluate your model?
This would be a huge gain, But I am sceptical about quality, because two piece of text might be semantically similar but not mean the same thing.

Report

24d ago

@sachamorard @somangshu Edgee falling back to the original prompt when BERT similarity drops below threshold is the right production default. You don't silently lose meaning, you just skip the savings on that request. The harder problem is one threshold across all request types. RAG context with repeated chunks compresses well, but structured outputs and few-shot examples are dense and break easily. You end up too conservative on easy wins or too aggressive on fragile stuff. Per-request overrides fix it but now you're maintaining compression config alongside prompt config.

Report

21d ago

Inyo

As a product guy in the agentic platform space, I’m definitely going to keep a close eye on this one. Good luck with the launch!

Report

26d ago

Maker

@yannick_mthy The agentic space is exactly where we’re seeing things get interesting (and complex) fast, especially with growing context sizes, tool calls, and multi-model orchestration.

Would love to hear how you're currently handling cost + routing on the agent side. Always keen to learn from teams building in this space. Thx

Report

26d ago

The idea is very interesting. But how does it work?

For example, I have a travel AI — essentially a wrapper around ChatGPT and Gemini. Some of the prompts are huge. How would you reduce the number of tokens? Would you compress my prompts? But that could affect quality.

Could you suggest where something can be replaced with free or cheaper tools? But then you would need to know our product no worse than we do… How do you do that?

Report

25d ago

PhotoRoom

Congrats on the launch! will closely follow as the topic is complex and moves fast!

Report

26d ago

Maker

@olivier_lemarie1 Thank you ! Indeed, a very exciting and challenging topic and so many things to explore and improve :D We'll soon be having a series a blog posts going through all the details and the research around compression, so stay tuned !

Report

26d ago

I've been waiting to see companies start tackling this issue. Cost and efficiency are going to be increasingly important once AI platforms are increasingly pressured for revenue.

Report

26d ago

Maker

Hey @sasha_pave, so we’re there! Happy to tackle this challenge ;)

Report

26d ago

Congrats on the launch! Will definitely be following this project closely. I've always thought there should be a way to more efficiently provide prompt for LLMs, especially when the latest models consume a lot of them for complex work. Hopefully this will eventually result in less usage rate and higher limits.

Report

26d ago

Maker

@mnzrabusham The further we advance, the more complex models become. Hardware innovations will probably improve the energy efficiency of models (because that's what it's all about), but the smartphone industry has taught us that the more powerful machines are, the more we ask them to perform increasingly complex tasks. So yes, I think that only intermediate systems can help us be a little more frugal.

Report

26d ago

1 2 3 4

Forum Threads

p/edgee

•

1mo ago

Token Compression for LLMs: How to reduce context size without losing accuracy

Hey, I'm Sacha, co-founder at @Edgee

Over the last few months, we've been working on a problem we kept seeing in production AI systems:

LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.

So we built a token compression layer designed to run before inference.

View all

Hey Product Hunt 👋

I’m Sacha, co-founder of Edgee. Thanks for checking us out!

We built Edgee because we kept seeing the same thing everywhere:

AI cost is going crazy!!!

LLMs are easy to try, but once you ship them in production, costs explode and reliability becomes a mess.

Most teams start with direct calls to OpenAI or Anthropic… or simply using a coding assistant... then quickly end up dealing with:

Unpredictable token spend
Multiple provider APIs
Outages / rate limits
Security & privacy constraints
And no real observability across teams

Edgee is an AI Gateway built to reduce LLM costs and simplify production inference.

It gives you a single OpenAI-compatible API across providers, plus a layer of intelligence around inference:

We're launching early and working closely with a small group of design partners, so feedback (even brutal feedback 😅) would mean a lot.

Happy to answer any questions here, and I’d love to hear how you’re handling LLM infra in production today!

Sacha