Edgee

Name: Edgee
Rating: 5.0 (1 reviews)

The AI Gateway that TL;DR tokens

5.0•1 review•

220 followers

The AI Gateway that TL;DR tokens

5.0•1 review•

220 followers

Visit website

AI Infrastructure Tools

•

AI Metrics and Evaluation

•

LLM Developer Tools

Edgee compresses prompts before they reach LLM providers and reduces token costs by up to 50%. Same code, fewer tokens, lower bills.

Free Options

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team / Built With

Unblocked — Get AI agents to generate code that fits your system.

Get AI agents to generate code that fits your system.

Promoted

Love this! Congrats @sachamorard - Great onboarding XP and managed to get going in <5' we will do ❤️. Curious whether and how we can control the compression level and adjust based on endpoints or use case as I imagine there's a quality trade-off?

Report

26d ago

Maker

@gdecugis Yes, compression isn’t “one size fits all.” Different endpoints and use cases tolerate different levels of optimization. We’re building it so compression can be: - configurable at the org / key / sdk level - safe, adding what we call « Semantic preservation threshold »: If similarity (Bert indicator) is below this threshold, we send the original prompt instead to preserve quality. That’s mathematics ;) - observable The goal isn’t to blindly shrink prompts, but to make the trade-off explicit and controllable. And you’re absolutely right, there can be a quality trade-off in some scenarios, so giving teams control (and visibility) is key. Happy to go deeper if you have a specific use case in mind

Report

25d ago

@sachamorard Super clear. Thanks!

Report

25d ago

Impressed by the edge-native architecture with 100+ PoPs and the token compression approach.

I noticed Edgee is built with Claude Code. For developers using AI coding agents (Claude Code, Cursor, etc.) that make heavy API calls during development, does Edgee support integration at the agent workflow level? Specifically, can we route AI agent requests through Edgee to compress tool call contexts and reduce token consumption during iterative coding sessions?

Report

25d ago

Maker

Hello @yamamoto7. The short answer is yes! We already have a documentation that explains how to use Edgee with Claude Code. We are actively working on a new version of our token compressor to specifically target Claude Code messages, and the level of gain is gonna be massive! Let’s keep in touch ;)

Report

25d ago

Thanks for sharing! Exciting to hear about the Claude Code-specific token compressor. Looking forward to seeing the gains in iterative coding sessions.

Report

24d ago

Would like to see benchmarks across different model providers and prompt types. If the compression holds under real production loads, this could become default infra in most LLM stacks.

Report

25d ago

Maker

@wallerson absolutely! We already publish a benchmark on our website at this address : https://www.edgee.ai/benchmark This clearly show the gap in token consumption between the different AI gateway. But we’re hardly working on a more detailed version.

Report

25d ago

TimeZoneNinja

This looks amazing, @gilles_raymond ! Reducing token costs by 50% is a game changer for anyone building agents for big audience 🤯 Question: How does the compression impact the latency for real-time applications? Congrats on the launch!

Report

25d ago

Maker

@sgiraudie as our architecture is at the edge, there is no sensitive effect on latency.

Report

25d ago

Love the focus on production problems vs demo features. Does the cost tracking integrate with existing observability tools (DataDog, etc.)?

Report

26d ago

Maker

@nielsrolland You raise a very interesting point! For now, we allow data to be exported in csv/json, but we're already working on integrating partner solutions. If you know our history (which seems to be the case), you know how easy it is for us to send data to any solution... so we're not going to hold back from offering this feature to our users ;)

Report

26d ago

This would be game-changing for our margins. Does the compression work for both prompts and completions?

Report

26d ago

Maker

@hajar_lamjadab2 yes it is! And it's even more efficient when the context window becomes larger and larger.

Report

26d ago

ChatPal

Cool idea! Do you get transparency into how prompt was trimmed/manipulated so you can ensure nothing was missed?

Report

26d ago

Maker

@daniele_packard We have information that allows us to understand how our model performs, yes. However, we do not keep the original prompt for obvious privacy reasons. To control the compressed prompt, we perform a similarity analysis by calculating several metrics (rouge, bert, cosine...). And we allow our users to define a threshold that guarantees semantic similarity.

Report

26d ago

1 2 3 4

Forum Threads

p/edgee

•

1mo ago

Token Compression for LLMs: How to reduce context size without losing accuracy

Hey, I'm Sacha, co-founder at @Edgee

Over the last few months, we've been working on a problem we kept seeing in production AI systems:

LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.

So we built a token compression layer designed to run before inference.

View all