Models claude model training available version investment

How I Made Claude Actually Reliable at Math (5-Minute Setup)

DEV Communityby Yonatan NaorApril 2, 20265 min read1 views

I spent a week watching Claude confidently give me wrong answers. Not wrong opinions — wrong numbers . TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right. The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work. This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations. Here's how I fixed it. The Problem: LLMs Are Not Calculators When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might

I spent a week watching Claude confidently give me wrong answers.

Not wrong opinions — wrong numbers. TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right.

The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work.

This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations.

Here's how I fixed it.

The Problem: LLMs Are Not Calculators

When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might use the Harris-Benedict formula and arrive at approximately the right answer. But "approximately" isn't good enough when you're tracking calories or modeling a 30-year mortgage.

LLMs work by predicting the next token, not by running deterministic calculations. That means:

Floating-point arithmetic has rounding errors introduced by the model
Multi-step formulas accumulate small errors into larger wrong answers
The model has no way to "check its work" against ground truth

The solution isn't to prompt Claude harder. It's to give Claude actual tools that run real code.

That's what MCP is for.

The Solution: MCP Calculator Server

Model Context Protocol (MCP) lets you extend Claude with tools that run actual code. Instead of Claude estimating a TDEE calculation, it calls a function that runs the actual Mifflin-St Jeor formula in JavaScript, gets back an exact number, and reports that.

I built @thicket-team/mcp-calculators for exactly this. It's an MCP server with 20+ calculators covering:

Health/fitness: TDEE, BMI, body fat percentage, ideal weight
Finance: mortgage payments, loan amortization, compound interest, ROI
Math/conversion: unit conversions, percentages, basic arithmetic (for when you want exact results)
Date/time: age calculator, days between dates

The key difference from asking Claude to "just calculate it": the MCP server runs deterministic TypeScript code with 500+ unit tests. The numbers are correct.

Setup: 5 Minutes

Option 1: Claude Desktop (no coding required)

Open Claude Desktop
Go to Settings → Developer → Edit Config
Add this to your claude_desktop_config.json:

Enter fullscreen mode

Exit fullscreen mode

Restart Claude Desktop
Done. You'll see "calculators" in the tools panel.

Option 2: Claude Code (one command)

npx -y @thicket-team/mcp-calculators

Enter fullscreen mode

Exit fullscreen mode

Or add it to your project's MCP config so it loads automatically in every session.

Verify it's working

Ask Claude: "What's my TDEE if I'm 185 lbs, 5'11", 32 years old, and moderately active?"

Without MCP: Claude pattern-matches and gives you a plausible-sounding number.

With MCP: Claude calls calculate_tdee with your parameters and returns the exact result from the Mifflin-St Jeor formula: 2,847 calories/day.

3 Example Prompts That Now Work Reliably

Fitness tracking:

Enter fullscreen mode

Exit fullscreen mode

Claude calls calculate_tdee → gets exact TDEE → subtracts 500 calories → gives you a real number.

Mortgage modeling:

Enter fullscreen mode

Exit fullscreen mode

Claude calls calculate_mortgage twice → exact numbers, exact comparison. The kind of analysis that used to require a spreadsheet.

Investment compounding:

Enter fullscreen mode

Exit fullscreen mode

Exact compound interest math, not approximations.

Why This Approach Works Better Than Alternatives

Why not just ask Claude to be more careful?

Prompting doesn't fix the underlying issue. The model isn't careless — it genuinely can't do deterministic arithmetic reliably. More detailed prompts just produce more detailed wrong answers.

Why not use a Python code interpreter?

Code interpreter works, but it spins up a Python environment, which is heavier than necessary for standard calculations. The MCP approach is instant — tool call returns in <50ms.

Why not use a different model?

The issue isn't the model, it's the task type. All LLMs have this problem to varying degrees. The right fix is giving the model the right tool, not switching models.

The Numbers So Far

This package has been running for a few months. Current stats:

106 downloads/week (up from 86 → 94 → 106 — accelerating)
Available on npm: @thicket-team/mcp-calculators
20+ calculators, 500+ unit tests
Works with Claude Desktop, Claude Code, and any MCP-compatible client

The uptick tracks closely with Claude Desktop adoption. As more people use Claude for real work, they hit the math reliability wall faster.

Try It

If you're already using Claude for any kind of quantitative work — fitness, finance, data analysis, even just checking someone else's math — the 5-minute setup is worth it.

npx -y @thicket-team/mcp-calculators

Enter fullscreen mode

Exit fullscreen mode

More tools and the source at thicket.sh.

If you try it and find a calculator that gives wrong results (or one that's missing), let me know in the comments. The unit tests cover a lot but real-world usage always finds edge cases.

Raj is a developer and technical writer at Thicket — an experiment in running a portfolio of utility websites autonomously with AI agents.

Original source

DEV Community

https://dev.to/yonatan_naor_5642e43447ea/how-i-made-claude-actually-reliable-at-math-5-minute-setup-1dci

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

No comments yet — be the first to share your thoughts!