How to Actually Monitor Your LLM Costs (Without a Spreadsheet)
I used to think I had a handle on my AI spending. I had a rough mental model: Claude is cheap, GPT-4 is expensive, Gemini is somewhere in the middle. Good enough, right? Then I started actually logging what I was burning through. The gap between my mental model and reality was embarrassing. The problem with just watching your bill Every major AI provider gives you a monthly bill. That's fine for accounting. It's useless for actually understanding your costs. By the time the invoice shows up, the context is gone. You don't remember which project, which feature, which dumb experiment ate half your budget. You just see a number and try to feel bad about it. What you actually need is visibility at the call level. How many tokens did that chat completion use? How expensive was that context wind
I used to think I had a handle on my AI spending. I had a rough mental model: Claude is cheap, GPT-4 is expensive, Gemini is somewhere in the middle. Good enough, right?
Then I started actually logging what I was burning through. The gap between my mental model and reality was embarrassing.
The problem with just watching your bill
Every major AI provider gives you a monthly bill. That's fine for accounting. It's useless for actually understanding your costs.
By the time the invoice shows up, the context is gone. You don't remember which project, which feature, which dumb experiment ate half your budget. You just see a number and try to feel bad about it.
What you actually need is visibility at the call level. How many tokens did that chat completion use? How expensive was that context window? Is the cost per feature trending up as my codebase grows?
None of the dashboards the providers give you answer these questions in real time.
What I tried first
Spreadsheets. Obviously. I had a tab for each provider, manually entered rough token counts after each session, tried to estimate costs.
This lasted about a week before I stopped maintaining it. The friction was too high. I'd forget to log things. I'd ballpark numbers. The data became meaningless noise.
I also tried building a lightweight proxy that logged every API call. That actually worked technically, but then I had to maintain a piece of infrastructure just to track my own costs. As a solo dev building two apps simultaneously, I don't have the bandwidth for that.
The habit that actually worked
I started paying attention to token counts in real time, at the point of use, not after the fact.
This sounds obvious but there's a specific reason it works: when you see the number immediately, you can actually connect cause and effect. Oh, that system prompt is 2,000 tokens every single call. Oh, I'm re-sending the entire conversation history when I only needed the last three messages.
For my Mac menu bar workflows, I ended up using TokenBar — it shows live token counts and estimated cost right in the menu bar as I work. The thing about having it persistent and always visible is that it changes how you think. You start making micro-decisions constantly: is this context worth the extra tokens? Is this feature request worth spinning up a full Claude session or can I handle it with a lighter model?
The three questions I ask now
After a few months of actually paying attention, I've settled into asking three things about every AI interaction:
-
What's the token density? Not just how many tokens, but how much useful work per token. A 5,000-token call that produces a complete working feature is cheap. A 1,000-token call that produces a vague response I have to iterate on three more times is expensive.
-
Is this the right model for this job? I was defaulting to Claude Sonnet for everything for a long time. Then I realized: for quick validation tasks, formatting, or simple transformations, Haiku costs a fraction of the price and is fast enough that it doesn't matter. I probably cut my costs by 40% just by routing tasks properly.
-
Am I paying for laziness? This one stings. A lot of my token burn was from not thinking carefully about prompts before sending them. I'd throw a messy half-formed request at the API, get a mediocre response, and iterate three more times. A little upfront clarity would have been one call instead of four.
What actual monitoring looks like
I check my token usage the same way I check my git diffs — regularly, not obsessively, but with intention.
The key insight is that monitoring shouldn't require effort. If you have to go somewhere to check it, you won't. The visibility needs to be ambient — always there, not intrusive.
Menu bar for live tracking. Provider dashboards for weekly reviews. That's the whole stack.
The spreadsheet phase was necessary because it forced me to pay attention, even if the data was garbage. What replaced it is better because it's automatic — the numbers are just there, and over time you develop intuitions about what's normal and what's a red flag.
The meta-lesson
AI costs are weird because they feel like they should be predictable (it's just API calls!) but they're actually highly variable based on how you're working, what you're building, and how carefully you're thinking.
You can't optimize what you're not measuring. And you can't sustain measurement that requires manual effort.
Find the lowest-friction way to make costs visible in your actual workflow, not in a separate dashboard you have to remember to check. That's the whole game.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudegeminimodel
China not targeting US West Coast with ultra-large underwater drones: lead scientist
China’s unmanned submersibles now rank as the world’s largest, with last year’s military parade showcasing two models (HSU001 and AJX002) approaching 20 metres (66 feet) in length. Satellite imagery analysed by Western media also revealed a classified variant exceeding 40 metres at a naval installation, triggering international concern – particularly in the United States. These dimensions created a brand new class of drones known as extra-extra-large uncrewed underwater vehicles (XXLUUVs). They...

How China is transforming Hong Kong into a strategic hub
Hong Kong’s first five-year plan is expected to guide the city’s future development. Never before has the city attempted a comprehensive plan in the style of mainland China, signalling a major shift in how it approaches long‑term growth. The real question is not why a laissez‑faire economy must adopt a new model but how this transformation will unfold. This exercise is unprecedented on multiple fronts. First, it departs from Hong Kong’s long-standing reliance on market forces and incremental...
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

How China is transforming Hong Kong into a strategic hub
Hong Kong’s first five-year plan is expected to guide the city’s future development. Never before has the city attempted a comprehensive plan in the style of mainland China, signalling a major shift in how it approaches long‑term growth. The real question is not why a laissez‑faire economy must adopt a new model but how this transformation will unfold. This exercise is unprecedented on multiple fronts. First, it departs from Hong Kong’s long-standing reliance on market forces and incremental...




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!