Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AIChinese firms trail US peers in AI adoption due to corporate culture: ex-OpenAI executiveSCMP Tech (Asia AI)'We play it way too safe': 5 questions with Raissa PardiniCreative Bloq AI DesignBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHigh-Risk Authors — Malicious Accounts — 2026-04-05Dev.to AIAutomating Your Playtest Triage with AIDev.to AIEcosystem Health Index — 2026-04-05Dev.to AIAudit Coverage Report — 2026-04-05Dev.to AIThreat Deep Dive — Attack Categories — 2026-04-05Dev.to AIFastest Growing Skills — Download Surge — 2026-04-05Dev.to AINewly Discovered Skills This Week — 2026-04-05Dev.to AISkill Category Distribution — 2026-04-05Dev.to AIRising Authors — Clean Track Records — 2026-04-05Dev.to AII Made My AI CEO Keep a Public Diary. Here's What 42 Sessions of $0 Revenue Looks Like.Dev.to AIChinese firms trail US peers in AI adoption due to corporate culture: ex-OpenAI executiveSCMP Tech (Asia AI)'We play it way too safe': 5 questions with Raissa PardiniCreative Bloq AI Design
AI NEWS HUBbyEIGENVECTOREigenvector

How to Actually Monitor Your LLM Costs (Without a Spreadsheet)

Dev.to AIby Henry GodnickApril 4, 20264 min read0 views
Source Quiz

I used to think I had a handle on my AI spending. I had a rough mental model: Claude is cheap, GPT-4 is expensive, Gemini is somewhere in the middle. Good enough, right? Then I started actually logging what I was burning through. The gap between my mental model and reality was embarrassing. The problem with just watching your bill Every major AI provider gives you a monthly bill. That's fine for accounting. It's useless for actually understanding your costs. By the time the invoice shows up, the context is gone. You don't remember which project, which feature, which dumb experiment ate half your budget. You just see a number and try to feel bad about it. What you actually need is visibility at the call level. How many tokens did that chat completion use? How expensive was that context wind

I used to think I had a handle on my AI spending. I had a rough mental model: Claude is cheap, GPT-4 is expensive, Gemini is somewhere in the middle. Good enough, right?

Then I started actually logging what I was burning through. The gap between my mental model and reality was embarrassing.

The problem with just watching your bill

Every major AI provider gives you a monthly bill. That's fine for accounting. It's useless for actually understanding your costs.

By the time the invoice shows up, the context is gone. You don't remember which project, which feature, which dumb experiment ate half your budget. You just see a number and try to feel bad about it.

What you actually need is visibility at the call level. How many tokens did that chat completion use? How expensive was that context window? Is the cost per feature trending up as my codebase grows?

None of the dashboards the providers give you answer these questions in real time.

What I tried first

Spreadsheets. Obviously. I had a tab for each provider, manually entered rough token counts after each session, tried to estimate costs.

This lasted about a week before I stopped maintaining it. The friction was too high. I'd forget to log things. I'd ballpark numbers. The data became meaningless noise.

I also tried building a lightweight proxy that logged every API call. That actually worked technically, but then I had to maintain a piece of infrastructure just to track my own costs. As a solo dev building two apps simultaneously, I don't have the bandwidth for that.

The habit that actually worked

I started paying attention to token counts in real time, at the point of use, not after the fact.

This sounds obvious but there's a specific reason it works: when you see the number immediately, you can actually connect cause and effect. Oh, that system prompt is 2,000 tokens every single call. Oh, I'm re-sending the entire conversation history when I only needed the last three messages.

For my Mac menu bar workflows, I ended up using TokenBar — it shows live token counts and estimated cost right in the menu bar as I work. The thing about having it persistent and always visible is that it changes how you think. You start making micro-decisions constantly: is this context worth the extra tokens? Is this feature request worth spinning up a full Claude session or can I handle it with a lighter model?

The three questions I ask now

After a few months of actually paying attention, I've settled into asking three things about every AI interaction:

  1. What's the token density? Not just how many tokens, but how much useful work per token. A 5,000-token call that produces a complete working feature is cheap. A 1,000-token call that produces a vague response I have to iterate on three more times is expensive.

  2. Is this the right model for this job? I was defaulting to Claude Sonnet for everything for a long time. Then I realized: for quick validation tasks, formatting, or simple transformations, Haiku costs a fraction of the price and is fast enough that it doesn't matter. I probably cut my costs by 40% just by routing tasks properly.

  3. Am I paying for laziness? This one stings. A lot of my token burn was from not thinking carefully about prompts before sending them. I'd throw a messy half-formed request at the API, get a mediocre response, and iterate three more times. A little upfront clarity would have been one call instead of four.

What actual monitoring looks like

I check my token usage the same way I check my git diffs — regularly, not obsessively, but with intention.

The key insight is that monitoring shouldn't require effort. If you have to go somewhere to check it, you won't. The visibility needs to be ambient — always there, not intrusive.

Menu bar for live tracking. Provider dashboards for weekly reviews. That's the whole stack.

The spreadsheet phase was necessary because it forced me to pay attention, even if the data was garbage. What replaced it is better because it's automatic — the numbers are just there, and over time you develop intuitions about what's normal and what's a red flag.

The meta-lesson

AI costs are weird because they feel like they should be predictable (it's just API calls!) but they're actually highly variable based on how you're working, what you're building, and how carefully you're thinking.

You can't optimize what you're not measuring. And you can't sustain measurement that requires manual effort.

Find the lowest-friction way to make costs visible in your actual workflow, not in a separate dashboard you have to remember to check. That's the whole game.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudegeminimodel

Knowledge Map

Knowledge Map
TopicsEntitiesSource
How to Actu…claudegeminimodelfeaturetrendinsightDev.to AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!