Hey, I'm Sacha, co-founder at @Edgee
Over the last few months, we've been working on a problem we kept seeing in production AI systems:
LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.
So we built a token compression layer designed to run before inference.
Edgee
@sachamorard Does compression hold up for non-English prompts? Thinking CJK specifically, tokenizers already split those into way more tokens per character.
Edgee
Congrats @sachamorard , did some quick market analysis for Edgee: https://www.ideajarvis.ai/idea-posts/ddbf4b65-76ed-46dc-978a-e3b656eb7109
one idea: flip the leaderboard from "biggest token spender" to tokens-per-merged-PR. You already have the GitHub attribution and the compression-adjusted token counts in one place, so joining them is mostly UX work. The reframe is bigger than it sounds though — cost dashboards are observability, but tokens-per-PR is actual AI engineering productivity. It's also a much better pitch upgrade for CTOs: "where did our budget go" is interesting, but "who ships the most with the least" is what they'd actually want to know.