Launched this week

Context Gateway
Make Claude Code faster and cheaper without losing context
287 followers
Make Claude Code faster and cheaper without losing context
287 followers
Context Gateway cuts latency and token spend for Claude Code / Codex / OpenClaw by compressing tool output while preserving important context. Setup takes less than a minute. Quality-of-life features: instant context compaction and setting spend limit in Claude Code.



Context Gateway
The spend cap and Slack notifications are almost more valuable than the compression itself. Running Claude Code on a large codebase without any spending guardrails is genuinely stressful. You check back after 20 minutes and it's burned through $40 on a rabbit hole.
Is the compression lossy in practice? I've seen context window summaries drop important details (like specific variable names or error messages) that then cause the agent to hallucinate fixes. How do you handle preserving the details that actually matter vs. trimming the boilerplate?
Context Gateway
@whatworkedforme hey Jack, thanks! The way compression works is very different from summarization: we preserve the structure of each tool output, but remove the "irrelevant" tokens. We condition compression on the user's query, making sure that useful info is kept, while the boilerplate is trimmed. So, in practice, we see that the quality improves. We are also running comprehensive benchmarking right now
Would be awesome if you could give context gateway a shot and share your feedback!
Told
The token compression angle is the right problem to attack — once devs start hitting context limits mid-session, the cognitive cost of managing that manually kills flow. Curious how the compression handles cases where the 'noise' in tool output turns out to be context a later step actually needed — that edge case is where these systems tend to break trust with developers. The Claude Code integration is smart timing given how fast that tool's adoption is moving right now. Would be interested to see how much latency reduction looks like in practice on a typical 30-minute coding session.
Context Gateway
Hey@jscanzi , great question - the edge cases are a concern! As of now, the model would likely just call the same tool again to get the necessary info. We are currently working on a feature, allowing the model to "pull" one of the previously compressed outputs in full on-demand. As of the latency, in this particular example we saw 30% speed-up (https://youtu.be/idmbFE6L5HU?si=Mcha2h-BYZ2z4gpk). We are working on the more comprehensive benchmarking now
Told
@ivanzak thanks a lot for your reply!
BrandingStudio.ai
Congrats on the launch! Curious how the compression handles tool outputs that contain mixed content, structured data alongside verbose logs, for example. Does it preserve the structured parts reliably while trimming the noise, or is it more of a blunt summarization?
Context Gateway
@joao_seabra Thanks for the question!
Right now we don’t explicitly differentiate between structured and unstructured data and the compression runs across the tool outputs as they are. Even with that simple approach we’re seeing pretty significant gains in accuracy and reduction of cost and latency.
That being said, you’re touching on something we’re actively working on. Our next major update will start treating structured and unstructured parts differently, so we can treat things like JSON/schema fields atomically while being more aggressive with verbose logs.
Expect improvements here soon.
Really smart approach to a problem I hit constantly - agent tool calls returning massive outputs that bloat context and burn tokens. The instant compaction feature is clutch too, waiting 3 min for /compact in Claude Code always kills my flow. Curious how the compression models handle code-heavy outputs vs prose - do you see different compression ratios?
Context Gateway
Hey @emad_ibrahim , thank you! The compression ratio is currently fixed at 0.5 - we'll make it auto-tunable in the future to account for varying "density" of different inputs, but, empirically, we see that it already works fine!
So to cut token spend in Claude Code you actually spend more tokens on a summarizer model? And this model will summarize your content on it's will, actually running the risk of cutting important information?
Could the summarization be done with a local model instead?
Context Gateway
Hey@daniel_sitnik , we don't spend more tokens because our compression model doesn't generate anything. It acts as a classifier, keeping some tokens and removing others. It is cheap, fast, and works well. We don't remove important info, because we always condition compression on the user's request, making sure we keep information relevant to it
Right now we are running compression on our side, but in principle it can be done on-prem - happy to chat more about that
Copus
Context compression is one of those problems that becomes critical as agentic workflows scale. Running Claude Code on large codebases eats through tokens fast, so a proxy that intelligently compacts history while preserving the important bits is genuinely useful. The fact that it works across Claude Code, OpenClaw, Codex, and other agents makes it a nice universal layer. Curious how it handles preserving semantic meaning during compression - does it use summarization or more of a selective pruning approach? Great launch, and the open-source angle makes it easy to try.