We built Agent-Corex after hitting 'context bloat hell' with 200+ tools
Hey everyone! π
We just shipped Agent-Corex, and I want to share the story of why we built it.
The Problem We Faced:
Six months ago, we were building an LLM agent system that had access to ~200 different tools. We did what seemed logical: we dumped all of them into the system prompt.
It was a disaster.
Our API costs exploded (30K tokens per request π±)
Inference was slow (2.3 seconds per response)
The LLM kept getting confused about which tool to use
We were burning through context windows like crazy
We realized we had a problem: how do you intelligently select which tools to include without manually curating for every scenario?
The Solution:
We built a hybrid ranking system that:
Keyword matches your query against tool names/descriptions (<1ms)
Understands semantics using embeddings to find related tools (50-100ms)
Scores everything using a smart blend (30% keyword + 70% semantic)
Result? Only 5-10 tools per query instead of 200.
The impact:
β 68% reduction in API costs
β 4.6x faster inference
β Same capability (the LLM still has access to everything, just smarter selection)
β 95%+ test coverage, production ready
Why Open Source:
We realized this is a problem every team building LLM agents faces. So we open-sourced it (MIT license) with zero dependencies for basic usage.
What We're Looking For:
Early adopters - Try it, break it, tell us what sucks
Use cases - How are you using it? What edge cases are we missing?
Contributions - Better ranking algorithms? Different embedding models? We're all ears
Feedback - Before we build the enterprise version, what features would actually help?
Quick Start:
pip install agent-corex
Then:
from agent_core import rank_tools
# One line to get smart tool selection
relevant_tools = rank_tools(
query="your task here",
tools=all_your_tools,
method="hybrid",
top_k=5
)
We're at v1.0.1 and this is just the beginning. Would love to hear what you think, especially if you're already dealing with tool selection headaches.
Ask us anything:
How does it compare to your current approach?
Are there use cases we're not thinking about?
What would make this 10x better for your workflow?
Looking forward to building this with the community! π


Replies
@dipjyoti_sharmaΒ
Appreciate it π β thatβs exactly the problem we ran into as well.
Scaling is where things really start to break:
tool selection quality drops
token usage spikes
latency grows with every additional tool
What weβve seen so far is that once you cross ~20β30 tools, naive approaches donβt hold up anymore.
With Agent-Corex, the focus is on keeping the toolset minimal per request using a retrieval + ranking layer, so even if you have 100s of tools overall, the model only sees a small, relevant subset at runtime.
Still early, but initial tests with larger tool sets are promising β especially in keeping both token usage and latency under control.
Would love to hear how youβre handling this on your side if youβve worked with larger systems π
@dipjyoti_sharmaΒ
Great question β and honestly this is something I was curious about too early on.
For smaller / beginner workflows, the benefits are still there, just less obvious at first:
fewer tools β less confusion for the model
more consistent outputs (especially for simple tasks)
lower token usage even on basic setups
Where it really starts to shine is when workflows grow a bit:
combining multiple tools (APIs, scrapers, content generators, etc.)
or chaining steps like research β draft β optimize
For content creators (blogging, SEO, etc.), I actually think itβs quite relevant:
instead of exposing every possible tool (keyword research, SERP analysis, writing, editing, publishing), the system can just bring in whatβs needed per step.
So:
βwrite blog introβ β only writing tools
βoptimize for SEOβ β keyword + SEO tools
That keeps things faster and more predictable, even if the overall system has a lot of capabilities under the hood.
You donβt really feel the complexity, but you benefit from it as things scale.
Curious β are you using AI more for content workflows or something else?