fmerian

Edgee Codex Compressor - Use Codex at 35.6% lower costs

byβ€’
We benchmarked Codex alone against Codex routed through Edgee's compression gateway on the same repo, with the same model, under the same workflow. The result: Codex + Edgee used 49.5% fewer input tokens, improved cache hit rate from 76.1% to 85.4%, and reduced total session cost by 35.6%. This post breaks down why context compression makes Codex more efficient, more frugal, and materially cheaper to run without sacrificing useful output.

Add a comment

Replies

Best
Sacha MORARD
Maker
πŸ“Œ

Hey PH πŸ‘‹


We're launching the Codex Compressor today.

But first, what is Edgee?

Edgee is an AI Gateway for Coding Agents, and it helps you save tokens. It's really simple to use, you only need two command lines:

That's it! And it works the same with Claude Code.


The results:

As a gateway, Edgee can optimize the requests that are sent to OpenAI, remove noise and waste, and cut input token usage almost in half.

We ran a controlled benchmark (see the video): same repo, same model (gpt-5.4), same task sequence.
One session with plain Codex, one with Codex routed through Edgee.

  • Input tokens: βˆ’49.5%

  • Total cost: βˆ’35.6%

  • Cache hit rate: from 76.1% to 85.4%

The cache hit rate improvement is the part I find most interesting. By sending leaner prompts, @OpenAI cache is hit more often, so the savings compound beyond just the compression ratio.

Here's what makes this different from other token compression tools: we pull token counts directly from the OpenAI API usage fields. No character-based estimates. The numbers are what you're actually billed for.


⭐️ Please, give a star to our brand new OSS repository, we need support ;)

And don't hesitate to try, it's free!

Happy to answer any questions here all day. πŸ™

fmerian
Hunter

@sachamorardΒ s/o for this new launch -- keep up the great work

Lakshay Gupta

Coolest launch of the day!! Btw what kinds of transformations are you applying like semantic compression, deduplication, summarization or something else???

Sacha MORARD

@lak7Β We're doing token-level compression, not semantic. Concretely, we clean the tool results: smart filtering (strip ANSI codes, progress bars, whitespace noise), deduplication (collapse repeated log lines with counts), grouping (aggregate similar items), and truncation (keep the signal, cut the redundancy).

No summarization, no embedding-space compression. The approach stays fully transparent and deterministic, what gets sent to the model is readable and debuggable, just leaner.

The biggest gains come from tool outputs like cargo build, git log, go test... designed for humans, not for models. That's where the βˆ’93% on cargo comes from.

fmerian
Hunter

"Coolest launch of the day!"

100%, @Edgee is underrated IMHO

Ryan W. McClellan, MS

That's a very specific figure β€” I like that. Is that an average across user sessions or a median? And what does the distribution look like β€” are most users clustered around that number or is it more bimodal between light and heavy usage?

Sacha MORARD

@ryanwmcc1Β Great question, and I want to be upfront: this is a single benchmark run, not an aggregate across user sessions. The βˆ’49.5% figure comes from one controlled test, same repo, same model, same task sequence, so it's a point measurement rather than a statistical distribution.

That said, the compression ratio in our architecture isn't random. It tracks directly with how much redundant context accumulates in a session, which is a function of session length, tool call frequency, and how repetitive the tool outputs are. Cargo build output, for example, is extremely compressible (βˆ’93% in this run) because it's verbose and structurally repetitive. File reads are less so (βˆ’34%).


If we look at the average compression rate for each Codex session, we're more around -40% of input tokens, as it depends heavily on how the developer uses Codex.

Nicolas GreniΓ©
I would do anything to save tokens, thank you edgee team! 🀩
Sacha MORARD

@picsoungΒ haha, thanks. I would do anything to help friends save tokens.

fmerian
Hunter

@picsoungΒ  @sachamorardΒ "friends don't let friends waste tokens."

Jalmin

Very cool product !
I'm using it for 3 weeks and it's very efficient. A game changer to controle API costs.

Sacha MORARD
@jalmin Thank you so much for making the most of Edgee. πŸ’ͺ
Jack Behar

Really interesting concept. Token compression plus routing in one layer feels powerful. How do you decide what gets compressed without affecting output quality?

Linoy Bar-Gal

35.6% is an oddly specific number, which makes me trust it more than "save up to 50%." What's actually being compressed, prompt-side context pruning, response caching, or something closer to semantic dedup across a session? Asking because I've been eyeing my own API bill lately and the honest breakdown matters.