TwoTrim - Cut LLM API costs by 65%. No GPU. No code changes.
by•
TwoTrim — The Mathematical Prompt Compression Fabric for LLM APIs. Cut up to 65% of your AI token costs without losing accuracy.
Replies
Best
Hunter
📌
TwoTrim is an open-source prompt compression middleware for LLM applications.
It sits between your app and any LLM API — OpenAI, Anthropic, or any OpenAI-compatible endpoint — and removes the tokens your model doesn't need
before the request is sent. Your code doesn't change. Your costs do.
What it does:
→ Strips filler words, redundant sentences, and formatting noise (lossless)
→ Semantic sentence scoring + Lost-in-the-Middle reordering (balanced)
→ BART summarization for long documents (aggressive)
→ FAISS semantic cache — works on similar queries, not just identical ones
What makes it different:
→ CPU-only. No GPU infrastructure required.
→ Zero refactoring — drop-in base_url swap for any OpenAI-compatible client
→ Works across providers via LiteLLM, vLLM, and more
→ Honest benchmarks. The results where it fails are published too.
Works best on: document summarization, long-context tasks, and high-volume chatbot/support systems with repeated queries.
Does not work well on: extreme multi-hop RAG at aggressive compression.
Full benchmark data is public in the repo.
Open source. Apache 2.0. Free forever.
github.com/overseek944/twotrim
Replies