Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessMistral Leads a Week of European Infrastructure Plays - Startup FortuneGNews AI MistralInside the Creative Artificial Intelligence (AI) Stack: Where Human Vision and Artificial Intelligence Meet to Design Future FashionMarkTechPostRecap: Europe’s top funding rounds this week (30 March – 5 April)The Next Web AIZenaTech (ZENA) Is Up 8.7% After Launching Ukraine Drone Hub And Expanding AI Defense Platform - simplywall.stGoogle News - AI UkraineHow AI and Alternative Data Are Finally Making Germany's Hidden Champions Accessible to Global InvestorsDev.to AIThe Hidden Auditory Knowledge Inside Language ModelsHackernoon AIThe Simple Truth About AI Agent RevenueDev.to AIAI Transformation in German SMEs: McKinsey Data Shows Up to 10x ROI from Strategic AI IntegrationDev.to AIAutomating Your Urban Farm with AI: From Guesswork to PrecisionDev.to AIThe Real Ceiling in Claude Code's Memory System (It’s Not the 200-Line Cap)Dev.to AIThe Invisible Rhythms of the Siuntio FortDev.to AIXYRONIXDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessMistral Leads a Week of European Infrastructure Plays - Startup FortuneGNews AI MistralInside the Creative Artificial Intelligence (AI) Stack: Where Human Vision and Artificial Intelligence Meet to Design Future FashionMarkTechPostRecap: Europe’s top funding rounds this week (30 March – 5 April)The Next Web AIZenaTech (ZENA) Is Up 8.7% After Launching Ukraine Drone Hub And Expanding AI Defense Platform - simplywall.stGoogle News - AI UkraineHow AI and Alternative Data Are Finally Making Germany's Hidden Champions Accessible to Global InvestorsDev.to AIThe Hidden Auditory Knowledge Inside Language ModelsHackernoon AIThe Simple Truth About AI Agent RevenueDev.to AIAI Transformation in German SMEs: McKinsey Data Shows Up to 10x ROI from Strategic AI IntegrationDev.to AIAutomating Your Urban Farm with AI: From Guesswork to PrecisionDev.to AIThe Real Ceiling in Claude Code's Memory System (It’s Not the 200-Line Cap)Dev.to AIThe Invisible Rhythms of the Siuntio FortDev.to AIXYRONIXDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

The 5-Hour Quota, Boris's Tweet, and What the Source Code Actually Reveals

DEV Communityby Jonathan BarazanyApril 1, 20263 min read1 views
Source Quiz

<p>Yesterday I published a deep dive into Claude Code's compaction engine. At the end, I made a promise: go deeper on the caching optimizations that happen <em>outside</em> of compaction.</p> <p>But actually, the caching rabbit hole started before that post - because of a tweet from about ten days ago.</p> <h2> The Tweet That Confused Me </h2> <p>If you're a heavy Claude Code user, you felt the 5-hour usage cap snap shut after Anthropic's two-week promotional window closed. The complaints flooded in. Someone tagged Boris - the engineer behind Claude Code, the person who built it - asking what he planned to do about it.</p> <p>His answer: improvements are coming to squeeze more out of the current quota.</p> <p>My first reaction: <em>what can he possibly do?</em> The quota is server-side. It

Yesterday I published a deep dive into Claude Code's compaction engine. At the end, I made a promise: go deeper on the caching optimizations that happen outside of compaction.

But actually, the caching rabbit hole started before that post - because of a tweet from about ten days ago.

The Tweet That Confused Me

If you're a heavy Claude Code user, you felt the 5-hour usage cap snap shut after Anthropic's two-week promotional window closed. The complaints flooded in. Someone tagged Boris - the engineer behind Claude Code, the person who built it - asking what he planned to do about it.

His answer: improvements are coming to squeeze more out of the current quota.

My first reaction: what can he possibly do? The quota is server-side. It's rate limits and token budgets. There's no client trick that changes how many tokens you're allowed per hour.

That question sat with me. Then yesterday's compaction post led me to look harder at the source - and the answer became obvious.

Cache Hit Ratio Is the Quota

Every message you send to Claude Code costs tokens. But tokens aren't flat. Cache hits are discounted significantly. Cache misses cost 1.25x - you're not just paying full price, you're paying a penalty.

If your cache hit ratio is high, you stretch the same quota dramatically further than someone whose cache keeps busting. The quota doesn't change. What you extract from it does.

This is the reframe. When Boris says improvements are coming, he's not talking about changing server limits. He's talking about recovering cache hit ratio - which is the same thing as handing quota back to users.

What Claude Code Already Does About This

When I asked Claude to analyze its own source code, what came back wasn't a simple "we cache the system prompt." It was twelve distinct mechanisms working together, each one plugging a specific leak.

Two stood out - and they reveal how deeply Anthropic thinks about cache economics.

The first solves a combinatorial explosion: five runtime booleans in the system prompt means 32 possible cache entries, most of which will never get a second hit. Claude Code's fix involves a literal boundary string in the source that splits stable content from dynamic content, with the stable prefix shared globally across every user on Earth.

The second is even more interesting: a side-channel called cache_edits that surgically removes old tool results from the cached KV store without changing a single byte in the actual message. No cache invalidation. No reprocessing penalty.

But those are just two of twelve mechanisms. The full picture includes a 728-line diagnostic system that treats cache misses as bugs, a function literally named DANGEROUS_uncachedSystemPromptSection(), and a one-sentence prompt rewrite that saved 20K tokens per budget flip.

Read the full source code analysis on my blog →

Here's what you'll find in the full post:

  • How SYSTEM_PROMPT_DYNAMIC_BOUNDARY solves the 2^N cache key explosion with Blake2b prefix hashing

  • The cache_edits side-channel: surgery without invalidation

  • Why there's a function called DANGEROUS_uncachedSystemPromptSection() and what it forces engineers to do

  • The real mechanism behind the /clear warning (it's called "willow" internally)

  • What Boris can actually ship to stretch your quota further

Previously: Claude Code's Compaction Engine: What the Source Code Actually Reveals

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
The 5-Hour …claudeanalysisglobalclaude codepublishedDEV Communi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 135 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!