How are you keeping Claude Code token spend visible while you build?
I like Claude Code a lot, but one thing still feels weirdly opaque to me: token burn while you are deep in a session.
When I am iterating fast, the bill usually shows up after the fact. By then I already made the expensive choices. Long context, repeated retries, and bouncing between models can get surprisingly costly before you really notice it.
I am curious how people here handle this in practice.
Are you just trusting the monthly bill?
Do you set hard limits somewhere?
Do you watch usage in the Anthropic console?
Do you have your own scripts or dashboards for this?
I ended up building a tiny macOS menu bar app for myself called TokenBar that shows token usage live across AI tools, mostly because I wanted cost visibility before the bill arrives, not after. It has been especially useful during long coding sessions where small prompts add up.
Not trying to make this an ad. I am genuinely interested in the workflow side of this because I feel like a lot of people optimize prompts, models, and coding speed, but still fly blind on usage.
What is your setup?


Replies
One thing I learned after watching my own Claude usage more closely is that the expensive part usually is not one giant prompt. It is the compound effect of small decisions.
The biggest cost spikes for me were:
1. long sessions where context quietly keeps growing
2. retry loops when I am not happy with an answer yet
3. switching between tools and models without noticing the total session burn
What helped most was treating token spend like any other live dev metric instead of something to check at the end of the month.
My current setup is simple:
- keep a live usage number visible while I work
- set a rough per-task budget in my head before I start
- reset or fork context earlier than feels natural
- pay attention to when a task should move from "keep chatting" to "just edit the file myself"
That is basically why I built TokenBar for macOS. It sits in the menu bar and shows AI token usage live so I can catch the costly behavior while it is happening, not after the bill lands. It has been most useful during coding sessions where I think I am making fast progress, but I am actually paying a tax on context drift.
Curious if anyone here has found a good rule of thumb for when to restart context versus keep pushing through.
tokenbar.site
Flying blind on token burn is the ultimate jump-scare for any developer deep in a flow state. We’ve all had those sessions where a massive context window sneakily eats the budget before the code even runs. A live tracker is such a clever way to keep that high-speed momentum without the financial hangover later it’s basically the "low fuel" light we all need to keep our infrastructure from melting the bank!
Do you find yourself switching to cheaper models mid-session when you see the burn, or do you just power through for the sake of the logic?
Is there any way to track token usage in real-time instead of after the session ends?