How I Made Claude Actually Reliable at Math (5-Minute Setup)
I spent a week watching Claude confidently give me wrong answers. Not wrong opinions — wrong numbers . TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right. The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work. This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations. Here's how I fixed it. The Problem: LLMs Are Not Calculators When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might
I spent a week watching Claude confidently give me wrong answers.
Not wrong opinions — wrong numbers. TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right.
The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work.
This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations.
Here's how I fixed it.
The Problem: LLMs Are Not Calculators
When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might use the Harris-Benedict formula and arrive at approximately the right answer. But "approximately" isn't good enough when you're tracking calories or modeling a 30-year mortgage.
LLMs work by predicting the next token, not by running deterministic calculations. That means:
-
Floating-point arithmetic has rounding errors introduced by the model
-
Multi-step formulas accumulate small errors into larger wrong answers
-
The model has no way to "check its work" against ground truth
The solution isn't to prompt Claude harder. It's to give Claude actual tools that run real code.
That's what MCP is for.
The Solution: MCP Calculator Server
Model Context Protocol (MCP) lets you extend Claude with tools that run actual code. Instead of Claude estimating a TDEE calculation, it calls a function that runs the actual Mifflin-St Jeor formula in JavaScript, gets back an exact number, and reports that.
I built @thicket-team/mcp-calculators for exactly this. It's an MCP server with 20+ calculators covering:
-
Health/fitness: TDEE, BMI, body fat percentage, ideal weight
-
Finance: mortgage payments, loan amortization, compound interest, ROI
-
Math/conversion: unit conversions, percentages, basic arithmetic (for when you want exact results)
-
Date/time: age calculator, days between dates
The key difference from asking Claude to "just calculate it": the MCP server runs deterministic TypeScript code with 500+ unit tests. The numbers are correct.
Setup: 5 Minutes
Option 1: Claude Desktop (no coding required)
-
Open Claude Desktop
-
Go to Settings → Developer → Edit Config
-
Add this to your claude_desktop_config.json:
Enter fullscreen mode
Exit fullscreen mode
-
Restart Claude Desktop
-
Done. You'll see "calculators" in the tools panel.
Option 2: Claude Code (one command)
npx -y @thicket-team/mcp-calculators
Enter fullscreen mode
Exit fullscreen mode
Or add it to your project's MCP config so it loads automatically in every session.
Verify it's working
Ask Claude: "What's my TDEE if I'm 185 lbs, 5'11", 32 years old, and moderately active?"
Without MCP: Claude pattern-matches and gives you a plausible-sounding number.
With MCP: Claude calls calculate_tdee with your parameters and returns the exact result from the Mifflin-St Jeor formula: 2,847 calories/day.
3 Example Prompts That Now Work Reliably
- Fitness tracking:
Enter fullscreen mode
Exit fullscreen mode
Claude calls calculate_tdee → gets exact TDEE → subtracts 500 calories → gives you a real number.
- Mortgage modeling:
Enter fullscreen mode
Exit fullscreen mode
Claude calls calculate_mortgage twice → exact numbers, exact comparison. The kind of analysis that used to require a spreadsheet.
- Investment compounding:
Enter fullscreen mode
Exit fullscreen mode
Exact compound interest math, not approximations.
Why This Approach Works Better Than Alternatives
Why not just ask Claude to be more careful?
Prompting doesn't fix the underlying issue. The model isn't careless — it genuinely can't do deterministic arithmetic reliably. More detailed prompts just produce more detailed wrong answers.
Why not use a Python code interpreter?
Code interpreter works, but it spins up a Python environment, which is heavier than necessary for standard calculations. The MCP approach is instant — tool call returns in <50ms.
Why not use a different model?
The issue isn't the model, it's the task type. All LLMs have this problem to varying degrees. The right fix is giving the model the right tool, not switching models.
The Numbers So Far
This package has been running for a few months. Current stats:
-
106 downloads/week (up from 86 → 94 → 106 — accelerating)
-
Available on npm: @thicket-team/mcp-calculators
-
20+ calculators, 500+ unit tests
-
Works with Claude Desktop, Claude Code, and any MCP-compatible client
The uptick tracks closely with Claude Desktop adoption. As more people use Claude for real work, they hit the math reliability wall faster.
Try It
If you're already using Claude for any kind of quantitative work — fitness, finance, data analysis, even just checking someone else's math — the 5-minute setup is worth it.
npx -y @thicket-team/mcp-calculators
Enter fullscreen mode
Exit fullscreen mode
More tools and the source at thicket.sh.
If you try it and find a calculator that gives wrong results (or one that's missing), let me know in the comments. The unit tests cover a lot but real-world usage always finds edge cases.
Raj is a developer and technical writer at Thicket — an experiment in running a portfolio of utility websites autonomously with AI agents.
DEV Community
https://dev.to/yonatan_naor_5642e43447ea/how-i-made-claude-actually-reliable-at-math-5-minute-setup-1dciSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!