Anthropic says Claude Code s usage drain comes down to peak-hour caps and ballooning contexts
Anthropic explains why Claude Code users have been burning through their limits so fast and shares tips to cut down on token usage. The article Anthropic says Claude Code s usage drain comes down to peak-hour caps and ballooning contexts appeared first on The Decoder .
Apr 3, 2026
Anthropic has looked into complaints from users who were hitting their Claude Code usage limits much faster than expected. According to Anthropic's Lydia Hallie, tighter limits during peak hours and sessions with 1-million-token contexts growing larger are the two main reasons for the problem. Hallie says Anthropic also fixed some bugs, but none of them led to incorrect billing. The company has also shipped efficiency improvements and added in-product pop-ups to keep users informed.
Hallie recommends using Sonnet 4.6 instead of Opus, since Opus burns through limits roughly twice as fast. She also suggests turning off Extended Thinking when it's not needed, starting fresh sessions instead of continuing old ones, and limiting the context window. Users who still notice unusually high usage should report it through the feedback function.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudeclaude code
Qodo vs Tabnine: AI Coding Assistants Compared (2026)
Quick Verdict Qodo and Tabnine address genuinely different problems. Qodo is a code quality specialist - its entire platform is built around making PRs better through automated review and test generation. Tabnine is a privacy-first code assistant - its entire platform is built around delivering AI coding help in environments where data sovereignty cannot be compromised. Choose Qodo if: your team needs the deepest available AI PR review, you want automated test generation that proactively closes coverage gaps, you use GitLab or Azure DevOps alongside GitHub, or you want the open-source transparency of PR-Agent as your review foundation. Choose Tabnine if: your team needs AI code completion as a primary feature, your organization requires on-premise or fully air-gapped deployment with battle

Socratic AI: how I learned formal grammars (and built a compiler) without losing control of what I was building
Table of Contents The context The problem The method How it works in practice When the method isn't needed The limits of the method One way among many Links 1. The context About a month ago I started building Clutter : a compiler for a custom markup language that outputs Vue SFCs. The idea was to enforce design system compliance at compile time, if a value isn't in the token dictionary, the build fails. It stayed a POC, proved its point, and opened larger questions that are evolving into a different project. I wrote it in Rust, which I'm learning. I had no background in formal grammars beyond a university course I took over twenty years ago and mostly forgot. The goal was to have a working compiler I fully understood, every design choice and the reasoning behind it. To know how it worked w
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

I wrote a fused MoE dispatch kernel in pure Triton that beats Megablocks on Mixtral and DeepSeek at inference batch sizes
Been working on custom Triton kernels for LLM inference for a while. My latest project: a fused MoE dispatch pipeline that handles the full forward pass in 5 kernel launches instead of 24+ in the naive approach. Results on Mixtral-8x7B (A100): Tokens vs PyTorch vs Megablocks 32 4.9x 131% 128 5.8x 124% 512 6.5x 89% At 32 and 128 tokens (where most inference serving actually happens), it's faster than Stanford's CUDA-optimized Megablocks. At 512+ Megablocks pulls ahead with its hand-tuned block-sparse matmul. The key trick is fusing the gate+up projection so both GEMMs share the same input tile from L2 cache, and the SiLU activation happens in registers without ever hitting global memory. Saves ~470MB of memory traffic per forward pass on Mixtral. Also tested on DeepSeek-V3 (256 experts) and




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!