Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhy Today’s AR Displays Fall Short and a 75-Year-Old Idea May Helpeetimes.comThe Invisible Engine: How Quiet Tech Is Quietly Upgrading Our Lives in 2026Medium AITwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityPaidlsvp.comBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhy Today’s AR Displays Fall Short and a 75-Year-Old Idea May Helpeetimes.comThe Invisible Engine: How Quiet Tech Is Quietly Upgrading Our Lives in 2026Medium AITwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityPaidlsvp.com
AI NEWS HUBbyEIGENVECTOREigenvector

How I Made Claude Actually Reliable at Math (5-Minute Setup)

DEV Communityby Yonatan NaorApril 2, 20265 min read1 views
Source Quiz

I spent a week watching Claude confidently give me wrong answers. Not wrong opinions — wrong numbers . TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right. The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work. This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations. Here's how I fixed it. The Problem: LLMs Are Not Calculators When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might

I spent a week watching Claude confidently give me wrong answers.

Not wrong opinions — wrong numbers. TDEE calculations off by 200 calories. Mortgage amortization that didn't add up. Compound interest that was close-ish but not quite right.

The thing is, Claude sounds confident when it hallucinates math. It walks you through the reasoning, uses the right formula names, and arrives at a number that feels plausible. The problem only shows up when you check the work.

This is a known issue with LLMs. They don't actually "do math" — they pattern-match from training data. Arithmetic is surprisingly unreliable, especially for multi-step calculations.

Here's how I fixed it.

The Problem: LLMs Are Not Calculators

When you ask Claude to calculate your TDEE (Total Daily Energy Expenditure), it might use the Harris-Benedict formula and arrive at approximately the right answer. But "approximately" isn't good enough when you're tracking calories or modeling a 30-year mortgage.

LLMs work by predicting the next token, not by running deterministic calculations. That means:

  • Floating-point arithmetic has rounding errors introduced by the model

  • Multi-step formulas accumulate small errors into larger wrong answers

  • The model has no way to "check its work" against ground truth

The solution isn't to prompt Claude harder. It's to give Claude actual tools that run real code.

That's what MCP is for.

The Solution: MCP Calculator Server

Model Context Protocol (MCP) lets you extend Claude with tools that run actual code. Instead of Claude estimating a TDEE calculation, it calls a function that runs the actual Mifflin-St Jeor formula in JavaScript, gets back an exact number, and reports that.

I built @thicket-team/mcp-calculators for exactly this. It's an MCP server with 20+ calculators covering:

  • Health/fitness: TDEE, BMI, body fat percentage, ideal weight

  • Finance: mortgage payments, loan amortization, compound interest, ROI

  • Math/conversion: unit conversions, percentages, basic arithmetic (for when you want exact results)

  • Date/time: age calculator, days between dates

The key difference from asking Claude to "just calculate it": the MCP server runs deterministic TypeScript code with 500+ unit tests. The numbers are correct.

Setup: 5 Minutes

Option 1: Claude Desktop (no coding required)

  • Open Claude Desktop

  • Go to Settings → Developer → Edit Config

  • Add this to your claude_desktop_config.json:

Enter fullscreen mode

Exit fullscreen mode

  • Restart Claude Desktop

  • Done. You'll see "calculators" in the tools panel.

Option 2: Claude Code (one command)

npx -y @thicket-team/mcp-calculators

Enter fullscreen mode

Exit fullscreen mode

Or add it to your project's MCP config so it loads automatically in every session.

Verify it's working

Ask Claude: "What's my TDEE if I'm 185 lbs, 5'11", 32 years old, and moderately active?"

Without MCP: Claude pattern-matches and gives you a plausible-sounding number.

With MCP: Claude calls calculate_tdee with your parameters and returns the exact result from the Mifflin-St Jeor formula: 2,847 calories/day.

3 Example Prompts That Now Work Reliably

  1. Fitness tracking:

Enter fullscreen mode

Exit fullscreen mode

Claude calls calculate_tdee → gets exact TDEE → subtracts 500 calories → gives you a real number.

  1. Mortgage modeling:

Enter fullscreen mode

Exit fullscreen mode

Claude calls calculate_mortgage twice → exact numbers, exact comparison. The kind of analysis that used to require a spreadsheet.

  1. Investment compounding:

Enter fullscreen mode

Exit fullscreen mode

Exact compound interest math, not approximations.

Why This Approach Works Better Than Alternatives

Why not just ask Claude to be more careful?

Prompting doesn't fix the underlying issue. The model isn't careless — it genuinely can't do deterministic arithmetic reliably. More detailed prompts just produce more detailed wrong answers.

Why not use a Python code interpreter?

Code interpreter works, but it spins up a Python environment, which is heavier than necessary for standard calculations. The MCP approach is instant — tool call returns in <50ms.

Why not use a different model?

The issue isn't the model, it's the task type. All LLMs have this problem to varying degrees. The right fix is giving the model the right tool, not switching models.

The Numbers So Far

This package has been running for a few months. Current stats:

  • 106 downloads/week (up from 86 → 94 → 106 — accelerating)

  • Available on npm: @thicket-team/mcp-calculators

  • 20+ calculators, 500+ unit tests

  • Works with Claude Desktop, Claude Code, and any MCP-compatible client

The uptick tracks closely with Claude Desktop adoption. As more people use Claude for real work, they hit the math reliability wall faster.

Try It

If you're already using Claude for any kind of quantitative work — fitness, finance, data analysis, even just checking someone else's math — the 5-minute setup is worth it.

npx -y @thicket-team/mcp-calculators

Enter fullscreen mode

Exit fullscreen mode

More tools and the source at thicket.sh.

If you try it and find a calculator that gives wrong results (or one that's missing), let me know in the comments. The unit tests cover a lot but real-world usage always finds edge cases.

Raj is a developer and technical writer at Thicket — an experiment in running a portfolio of utility websites autonomously with AI agents.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
How I Made …claudemodeltrainingavailableversioninvestmentDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!