The open-source AI gateway with built-in observability, automatic failover, and a one-line integration. Add credits, and get instant access to 100+ models through one API key. OpenAI compatible, zero markup, and trusted by teams like DeepAI, PodPitch, and Sunrun.
We build open-source tools that help AI startups ship faster and break less. Today, weβre launching the Helicone AI Gateway β one API key for every model, with observability and automatic failover built in.
The Why Over 90% of AI products today use five or more LLMs.
Every AI engineer I talk to is struggling with: - Writing custom logic to handle provider outages - Hitting constant 429s and waiting weeks for limit increases - Managing multiple APIs, keys, and auth flows - Paying 5β10% markup fees just to use a gateway - No visibility into routing or performance
The How The Helicone AI Gateway fixes that. Itβs open source, transparent, and simple to use.
π 1 API key, 100+ models β add credits and get instant access to every major provider π― 0% markup fees β you pay exactly what the provider charges π Observability included β logs, latency, costs, and traces built in π Reliable by design β automatic failover, caching, and routing that avoids provider rate limits entirely βοΈ Custom rate limits β define your own per-user or per-segment caps right in the gateway π Fully open source β MIT licensed, self-host or contribute, no lock-in
The What
β OpenAI SDK-compatible (change the baseURL, access 100+ models) β Supports all major providers (OpenAI, Anthropic, Gemini, TogetherAI, and more) β Real-time dashboards and analytics β Built-in caching and request deduplication β Automatic failover and retry logic β Custom per-user rate limits β 0% markup fees, pay provider pricing β Fully open source
Traction
Already processing billions of tokens monthly for teams at Sunrun, DeepAI, and PodPitch.
Weβve been building this in the open for six months, shaped by feedback from hundreds of developers.
@masumpΒ It does not. We hash the entire request body including the model & all metadata. If it's different, it will be a cache miss.
Report
@cole_gottdankΒ You say you avoid rate limits entirely through routing. That only works if you have pooled credits across providers or you're just shifting the problem to a different API. Which one is it?
@sanskarixΒ Observability works out of the box for both of them. We split the stream & return it immediately to the client & we read the other side. If the client cancels the stream, we cancel it on our end as well.
We've been using Helicone for the past few months. For us the benefits are: - not having to maintain our own proxy translation layer between models - latency, cost, and usage metrics are really helpful - easy debugging of when there is an AI failure and why - supports complex API uses like streaming, rich media, etc - minimal latency impact - friendly pricing (unlike competitors who sometimes take a cut of the model inference itself, which is bonkers)
What it lacks (unless this has changed): - authentication layer. We still have to proxy every request to handle the authentication which incurs additional infra+compute cost. It is also an additional failure point. - model support rollout is badly lagged, such as GPT-5 taking 2-3 months to be available on Helicone. I understand this was a major API change on OpenAI's part (shame on them), but this will be unacceptably slow for many companies given OpenAI is a non-negotiable provider to support.
Overall Helicone is an excellent product and I'm excited for what the future brings.
Report
This is seriously impressive, does Helicone handle token usage tracking per user across multiple providers automatically?
Replies
Helicone AI
Hey everyone π
Iβm Cole, Co-Founder of Helicone.
We build open-source tools that help AI startups ship faster and break less.
Today, weβre launching the Helicone AI Gateway β one API key for every model, with observability and automatic failover built in.
The Why
Over 90% of AI products today use five or more LLMs.
Every AI engineer I talk to is struggling with:
- Writing custom logic to handle provider outages
- Hitting constant 429s and waiting weeks for limit increases
- Managing multiple APIs, keys, and auth flows
- Paying 5β10% markup fees just to use a gateway
- No visibility into routing or performance
The How
The Helicone AI Gateway fixes that. Itβs open source, transparent, and simple to use.
π 1 API key, 100+ models β add credits and get instant access to every major provider
π― 0% markup fees β you pay exactly what the provider charges
π Observability included β logs, latency, costs, and traces built in
π Reliable by design β automatic failover, caching, and routing that avoids provider rate limits entirely
βοΈ Custom rate limits β define your own per-user or per-segment caps right in the gateway
π Fully open source β MIT licensed, self-host or contribute, no lock-in
The What
β OpenAI SDK-compatible (change the baseURL, access 100+ models)
β Supports all major providers (OpenAI, Anthropic, Gemini, TogetherAI, and more)
β Real-time dashboards and analytics
β Built-in caching and request deduplication
β Automatic failover and retry logic
β Custom per-user rate limits
β 0% markup fees, pay provider pricing
β Fully open source
Traction
Already processing billions of tokens monthly for teams at Sunrun, DeepAI, and PodPitch.
Weβve been building this in the open for six months, shaped by feedback from hundreds of developers.
Try it now and tell us what you think: https://www.helicone.ai/signup
GitHub: https://github.com/Helicone/heli...
Docs: https://docs.helicone.ai/gateway...
Would love your feedback!
@cole_gottdankΒ Does the caching work across different models if the prompts are identical?
Helicone AI
@masumpΒ It does not. We hash the entire request body including the model & all metadata. If it's different, it will be a cache miss.
@cole_gottdankΒ You say you avoid rate limits entirely through routing. That only works if you have pooled credits across providers or you're just shifting the problem to a different API. Which one is it?
Cal ID
Congrats on the launch!
How do you handle observability for streaming responses compared to traditional request response patterns?
Helicone AI
@sanskarixΒ Observability works out of the box for both of them. We split the stream & return it immediately to the client & we read the other side. If the client cancels the stream, we cancel it on our end as well.
Prism AI
Cant wait to try it!
Helicone AI
@ayxliu19Β Thanks Alex!
Dreamlit AI
LFG Justin!!
We've been using Helicone for the past few months. For us the benefits are:
- not having to maintain our own proxy translation layer between models
- latency, cost, and usage metrics are really helpful
- easy debugging of when there is an AI failure and why
- supports complex API uses like streaming, rich media, etc
- minimal latency impact
- friendly pricing (unlike competitors who sometimes take a cut of the model inference itself, which is bonkers)
What it lacks (unless this has changed):
- authentication layer. We still have to proxy every request to handle the authentication which incurs additional infra+compute cost. It is also an additional failure point.
- model support rollout is badly lagged, such as GPT-5 taking 2-3 months to be available on Helicone. I understand this was a major API change on OpenAI's part (shame on them), but this will be unacceptably slow for many companies given OpenAI is a non-negotiable provider to support.
Overall Helicone is an excellent product and I'm excited for what the future brings.
This is seriously impressive, does Helicone handle token usage tracking per user across multiple providers automatically?
Why no GPT-5-Pro?
Cuckoo
Congrats team. This with your existing observability stack will be awesome.
Also looking forward to how this can integrate with memory and cache handling in the future.
Finally, a single gateway for all the LLM chaos π Open-source, no markup, observability built-in β super excited to test this!
Weβre also building an AI startup, but we currently have just one LLM. We might add a second one soon. Iβll keep you in my notes for the future)