The hidden cost of GPT-4o: what every SaaS founder should know about per-user LLM spend it
<p>So you're running a SaaS that leans on an LLM. You check your OpenAI bill at the end of the month, it's a few hundred bucks, you shrug and move on. As long as it's not five figures, who cares, right?</p> <p>Wrong. That total is hiding a nasty secret: you're probably losing money on some of your users.</p> <p>I'm not talking about the obvious free-tier leeches. I'm talking about paying customers who are costing you more in API calls than they're giving you in subscription fees. You're literally paying for them to use your product.</p> <p><strong>The problem with averages</strong></p> <p>Let's do some quick, dirty math. GPT-4o pricing settled at around $3/1M tokens for input and $10/1M for output. It's cheap, but it's not free.</p> <p>Say you have a summarization feature. A user pastes in
So you're running a SaaS that leans on an LLM. You check your OpenAI bill at the end of the month, it's a few hundred bucks, you shrug and move on. As long as it's not five figures, who cares, right?
Wrong. That total is hiding a nasty secret: you're probably losing money on some of your users.
I'm not talking about the obvious free-tier leeches. I'm talking about paying customers who are costing you more in API calls than they're giving you in subscription fees. You're literally paying for them to use your product.
The problem with averages
Let's do some quick, dirty math. GPT-4o pricing settled at around $3/1M tokens for input and $10/1M for output. It's cheap, but it's not free.
Say you have a summarization feature. A user pastes in 50,000 tokens of text (around 37.5k words) and gets a 1,000 token summary back.
• Input cost: 50,000 / 1,000,000 * $3.00 = $0.15 • Output cost: 1,000 / 1,000,000 * $10.00 = $0.01 • Total cost for one summary: $0.16
If a user on a $19/mo plan does this just four times a day, every day, their usage looks like this:
• Daily cost: $0.16 * 4 = $0.64 • Monthly cost: $0.64 * 30 = $19.20
You just lost twenty cents on that customer. And that's one feature. What if your app is a chatbot? What if they're running complex agentic workflows? It's easy to see how a single "power user" can quietly burn through their subscription fee and start eating into your margins.
Your monthly bill averages this out. You see the total, you see your total MRR, and if one is bigger than the other, you think you're fine. But you're flying blind. You have no idea which customers are profitable and which are financial dead weight.
You can't fix what you can't see
The real issue is attribution. The OpenAI invoice is just a number. It doesn't tell you that customer-123 on the Pro plan cost you $45 last month while customer-456 cost you $1.50. Without that breakdown, you can't make smart decisions.
• You can't identify users who need to be moved to a higher tier. • You can't set fair rate limits. • You can't detect abuse. • You can't accurately price your service.
You're just guessing.
To give you a clearer picture, let's look at how the main providers stack up. Prices are always in flux, but as of early 2026, here's the landscape for the flagship models per million tokens:
Model Input Cost / 1M tokens Output Cost / 1M tokens
OpenAI GPT-4o ~$3.00 ~$10.00
Anthropic Claude 3.5 Sonnet ~$3.00 ~$15.00
Google Gemini 1.5 Pro ~$3.50 ~$10.50
As you can see, output costs for a model like Claude 3.5 Sonnet are 50% higher than for GPT-4o. If your application is write-heavy (generating long reports, articles, etc.), that difference will show up on your bill. Without per-user tracking, you'd have no idea if a profitable GPT-4o user would become a loss-leader on a different model.
4 ways to stop the bleeding
Okay, so tracking is the first step. But once you can see the problem, how do you fix it? Here are a few practical strategies. This isn't rocket science, but it's amazing how many startups ignore the basics.
-
Strategic Rate Limiting This is the simplest tool in your arsenal. Don't offer an unlimited buffet. Set generous but firm limits based on your tiers. A free user might get 10 complex summaries per day, while a Pro user gets 100. This prevents a single user from running up a massive bill, accidentally or maliciously.
-
Introduce Usage-Based Tiers Flat-rate subscriptions are simple, but they're a poor fit for variable costs like LLM APIs. A better model is to include a generous token allowance with each plan (e.g., 5 million tokens/month for $19) and then charge for overages. This ensures your power users pay for what they use, keeping your business profitable.
-
Implement Smart Caching Is your tool summarizing popular articles? Are multiple users asking the same question to your chatbot? Cache the results. Hitting a database is orders of magnitude cheaper than hitting an LLM API. A simple Redis cache layer can save a surprising amount of money on redundant queries.
-
Use Cheaper Models for Simpler Tasks Not every task needs a flagship model. For things like text classification, basic formatting, or simple Q&A, a cheaper and faster model like Claude 3 Haiku or Gemini 1.5 Flash can do the job for a fraction of the cost. Route tasks intelligently based on complexity. Use the expensive scalpel, not the expensive chainsaw, for delicate work.
A simple logging wrapper (example)
You don't need a complex system to get started. Here’s a conceptual JavaScript snippet showing how you could wrap your OpenAI calls to log usage per customer.
// This is a simplified example, not production code.
async function callOpenAIWithCostTracking(prompt, customerId) { // Your existing OpenAI API call logic const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: prompt }], });
const usage = response.usage; // { prompt_tokens: 123, completion_tokens: 456 } const inputCost = (usage.prompt_tokens / 1000000) * 3.00; // GPT-4o input pricing const outputCost = (usage.completion_tokens / 1000000) * 10.00; // GPT-4o output pricing const totalCost = inputCost + outputCost;
// Log it to your database
console.log(- LOGGING: Customer ${customerId} request cost $${totalCost.toFixed(4)});
// await db.logLLMUsage({
// customerId: customerId,
// model: 'gpt-4o',
// promptTokens: usage.prompt_tokens,
// completionTokens: usage.completion_tokens,
// cost: totalCost
// });
return response.choices[0].message.content; }
// When a user makes a request: // const result = await callOpenAIWithCostTracking("Summarize this for me...", "customer-123");`
Enter fullscreen mode
Exit fullscreen mode
Start tracking today
Building your own logging wrapper is a solid first step, but maintaining it at scale gets annoying fast. fwiw, I use a simple open-source tool called LLMeter that does exactly this — it wraps the provider APIs and logs costs per user to a dashboard, no proxying required. Might be worth a look if you're in the same boat and don't want to build the tracking yourself.
But honestly, whether you build it, use a tool, or just run a script, the important thing is to start tracking your per-user LLM spend today. Your bottom line will thank you.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudegeminimodelAI company insiders can bias models for election interference
tl;dr it is currently possible for a captured AI company to deploy a frontier AI model that later becomes politically disinformative and persuasive enough to distort electoral outcomes. With gratitude to Anders Cairns Woodruff for productive discussion and feedback. LLMs are able to be highly persuasive, especially when engaged in conversational contexts . An AI "swarm" or other disinformation techniques scaled massively by AI assistance are potential threats to democracy because they could distort electoral results. AI massively increases the capacity for actors with malicious incentives to influence politics and governments in ways that are hard to prevent, such as AI-enabled coups . Mundane use and integration of AI also has been suggested to pose risks to democracy. A political persuas
I open sourced a production MLOps pipeline. Here is what it took to get it to PyPI and Hugging Face in one day.
<p>I have been running ML pipelines in production for few years. Tens of millions of predictions a day, real money on the line, no tolerance for guesswork.</p> <p>PulseFlow started as something I built for myself. A reference architecture I kept recreating from scratch at every company because nothing open source matched what production actually demands.</p> <p>Today I packaged it, published it to PyPI, and put a live demo on Hugging Face. Here is what it covers and how to run it in under ten minutes.</p> <h2> What PulseFlow is </h2> <p>A production-grade MLOps pipeline you can clone and run immediately. Not a tutorial. Not a toy dataset. A real stack.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>pip <span class="nb">install </span>pulseflow-mlops <
Building a Real-Time Dota 2 Draft Prediction System with Machine Learning
<p>I built an AI system that watches live Dota 2 pro matches and predicts which team will win based purely on the draft. Here's how it works under the hood.</p> <p><strong>The Problem</strong><br> Dota 2 has 127 heroes. A Captain's Mode draft produces roughly 10^15 possible combinations. Analysts spend years building intuition about which drafts work — I wanted to see if a model could learn those patterns from data.</p> <p><strong>Architecture</strong></p> <p><em>Live Match → Draft Detection → Feature Engineering → XGBoost + DraftNet → Prediction + SHAP Explanation</em></p> <p>The system runs 24/7 on Railway (Python/FastAPI). When a professional draft completes, it detects the picks within seconds, runs them through two models in parallel, and publishes the prediction to a Telegram channel
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
I open sourced a production MLOps pipeline. Here is what it took to get it to PyPI and Hugging Face in one day.
<p>I have been running ML pipelines in production for few years. Tens of millions of predictions a day, real money on the line, no tolerance for guesswork.</p> <p>PulseFlow started as something I built for myself. A reference architecture I kept recreating from scratch at every company because nothing open source matched what production actually demands.</p> <p>Today I packaged it, published it to PyPI, and put a live demo on Hugging Face. Here is what it covers and how to run it in under ten minutes.</p> <h2> What PulseFlow is </h2> <p>A production-grade MLOps pipeline you can clone and run immediately. Not a tutorial. Not a toy dataset. A real stack.<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>pip <span class="nb">install </span>pulseflow-mlops <
🚀 Build a Full-Stack Python Web App (No JS Framework Needed)
<p>Most developers assume you <em>need</em> React, Next.js, or Vue for modern web apps.</p> <p>But what if you could build a full-stack app using <strong>just Python</strong>?</p> <p>In this post, I’ll show you how to build a real web app using Reflex — a framework that lets you create frontend + backend entirely in Python.</p> <h2> 🧠 What You’ll Build </h2> <p>We’ll create a simple <strong>Task Manager App</strong> with:</p> <ul> <li>Add tasks</li> <li>Delete tasks</li> <li>Reactive UI (auto updates)</li> <li>Clean component-based structure</li> </ul> <h2> ⚙️ Setup </h2> <p>First, install Reflex:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code>pip <span class="nb">install </span>reflex </code></pre> </div> <p>Create a new project:<br> </p> <div class
MiniScript Weekly News — Apr 1, 2026
<h2> Development Updates </h2> <p>Work on <strong>MiniScript 2</strong> continues to pick up speed, and the team shared that a working <strong>REPL</strong> is now in place in both C# and C++. The latest dev log also mentions a refactor to better preserve globals across REPL entries, plus a fix for multi-function REPL handling and Ctrl-D to exit.<br><br> GitHub: <a href="https://github.com/JoeStrout/miniscript2" rel="noopener noreferrer">miniscript2</a></p> <p>On the <strong>raylib-miniscript</strong> side, there were a few useful updates landed this week: <code>resourceCounts</code> now reports loaded resources, <code>FileHandle</code> was added, and the text mutation intrinsics were refreshed with new <code>...Alloc</code> variants. These changes should help with debugging leaks and keep
Programming Logic: The First Step to Mastering Any Language
<p>Categories: Beginner - Backend - Dictionary</p> <h3> Definition </h3> <p><strong>Programming Logic</strong> is the coherent organization of instructions that allows a computer to execute tasks in a sequential and logical manner. For a beginner, it can be understood as the development of a "step-by-step" process (<strong>algorithm</strong>) to solve a challenge, serving as the essential foundation before learning any specific programming language.</p> <h3> Use Cases </h3> <ul> <li>Creating decision flows in <strong>e-commerce</strong> systems.</li> <li>Automating manual and repetitive processes.</li> <li>Developing search engines and data filters.</li> <li>Structuring business rules in financial applications.</li> </ul> <h3> Practical Example </h3> <div class="highlight js-code-highlight

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!