Edgee Codex Compressor - Use Codex at 35.6% lower costs

by•11d ago

We benchmarked Codex alone against Codex routed through Edgee's compression gateway on the same repo, with the same model, under the same workflow. The result: Codex + Edgee used 49.5% fewer input tokens, improved cache hit rate from 76.1% to 85.4%, and reduced total session cost by 35.6%. This post breaks down why context compression makes Codex more efficient, more frugal, and materially cheaper to run without sacrificing useful output.

Replies

Best

Edgee

Maker

📌

Hey PH 👋

We're launching the Codex Compressor today.

But first, what is Edgee?

Edgee is an AI Gateway for Coding Agents, and it helps you save tokens. It's really simple to use, you only need two command lines:

A first one to install Edgee CLI with curl or brew
And a simple edgee launch codex

That's it! And it works the same with Claude Code.

The results:

As a gateway, Edgee can optimize the requests that are sent to OpenAI, remove noise and waste, and cut input token usage almost in half.

We ran a controlled benchmark (see the video): same repo, same model (gpt-5.4), same task sequence.
One session with plain Codex, one with Codex routed through Edgee.

Input tokens: −49.5%
Total cost: −35.6%
Cache hit rate: from 76.1% to 85.4%

The cache hit rate improvement is the part I find most interesting. By sending leaner prompts, @OpenAI cache is hit more often, so the savings compound beyond just the compression ratio.

Here's what makes this different from other token compression tools: we pull token counts directly from the OpenAI API usage fields. No character-based estimates. The numbers are what you're actually billed for.

⭐️ Please, give a star to our brand new OSS repository, we need support ;)

And don't hesitate to try, it's free!

Happy to answer any questions here all day. 🙏

Report

13d ago

Hunter

@sachamorard s/o for this new launch -- keep up the great work

Report

10d ago

Coolest launch of the day!! Btw what kinds of transformations are you applying like semantic compression, deduplication, summarization or something else???

Report

10d ago

Edgee

Maker

@lak7 We're doing token-level compression, not semantic. Concretely, we clean the tool results: smart filtering (strip ANSI codes, progress bars, whitespace noise), deduplication (collapse repeated log lines with counts), grouping (aggregate similar items), and truncation (keep the signal, cut the redundancy).

No summarization, no embedding-space compression. The approach stays fully transparent and deterministic, what gets sent to the model is readable and debuggable, just leaner.

The biggest gains come from tool outputs like cargo build, git log, go test... designed for humans, not for models. That's where the −93% on cargo comes from.

Report

10d ago

Hunter

"Coolest launch of the day!"

100%, @Edgee is underrated IMHO

Report

10d ago

That's a very specific figure — I like that. Is that an average across user sessions or a median? And what does the distribution look like — are most users clustered around that number or is it more bimodal between light and heavy usage?

Report

11d ago

Edgee

Maker

@ryanwmcc1 Great question, and I want to be upfront: this is a single benchmark run, not an aggregate across user sessions. The −49.5% figure comes from one controlled test, same repo, same model, same task sequence, so it's a point measurement rather than a statistical distribution.

That said, the compression ratio in our architecture isn't random. It tracks directly with how much redundant context accumulates in a session, which is a function of session length, tool call frequency, and how repetitive the tool outputs are. Cargo build output, for example, is extremely compressible (−93% in this run) because it's verbose and structurally repetitive. File reads are less so (−34%).

If we look at the average compression rate for each Codex session, we're more around -40% of input tokens, as it depends heavily on how the developer uses Codex.

Report

10d ago

Typeform

I would do anything to save tokens, thank you edgee team! 🤩

Report

10d ago

Edgee

Maker

@picsoung haha, thanks. I would do anything to help friends save tokens.

Report

10d ago

Hunter

@picsoung @sachamorard "friends don't let friends waste tokens."

Report

10d ago

Very cool product !
I'm using it for 3 weeks and it's very efficient. A game changer to controle API costs.

Report

10d ago

Edgee

Maker

@jalmin Thank you so much for making the most of Edgee. 💪

Report

10d ago

UXPin Merge

Really interesting concept. Token compression plus routing in one layer feels powerful. How do you decide what gets compressed without affecting output quality?

Report

9d ago

35.6% is an oddly specific number, which makes me trust it more than "save up to 50%." What's actually being compressed, prompt-side context pruning, response caching, or something closer to semantic dedup across a session? Asking because I've been eyeing my own API bill lately and the honest breakdown matters.

Report

9d ago