Majid Fekri

Where $2.5 Million a Year Actually Goes

by

When most people think about the cost of AI search, they think about the vector database. But the database is just the tip of the iceberg. Here's what the full cost stack actually looks like for a typical enterprise running retrieval-augmented generation (RAG) at scale:

The vector database cluster is the obvious one. To serve 150 multi-tenant customers with real-time retrieval, this customer was running Qdrant on a fleet of AWS memory-optimized instances (r7g.2xlarge) plus Kubernetes orchestration. Annual cost: ~$591,000. And that infrastructure runs 24/7, whether it's peak hours or 3 AM on a Sunday. You're renting RAM by the year to hold vectors that might get queried once an hour.

The reranking API is the cost nobody budgets for. Traditional vector databases use approximate search — they give you "close enough" results using a probabilistic algorithm called HNSW. For enterprise use cases in regulated industries, "close enough" isn't good enough. So teams bolt on a reranking service like Cohere Rerank to improve accuracy after the initial retrieval. That API call on every query, at this volume, costs roughly ~$1.5 million per year. It's the single biggest line item, and most teams don't see it coming until they're already in production.

The middleware and observability layer adds another surprise. Enterprise RAG requires auditability — you need to trace exactly which documents were retrieved, with what parameters, through what logic. Teams typically bolt on LangSmith or similar observability tooling on top of LangChain, which adds token overhead and tracing costs. For this customer: ~$378,000 per year.

The engineering team to keep it all running is the hidden human cost. A Qdrant cluster on Kubernetes doesn't manage itself. This customer had two DevOps engineers and one backend engineer dedicated to maintaining the vector pipeline — provisioning, scaling, troubleshooting, patching. Fully loaded: ~$450,000 per year in salaries tied to infrastructure maintenance rather than product development.

Add it up: $2.5 million per year — and the Qdrant software license itself is free. The open-source database is the cheapest part of the stack. Everything around it is where the money burns.


3 views

Add a comment

Replies

Be the first to comment