Releases claude available version product report review

Claude Code memory: how to survive a 200k context window filling up

Dev.to AIby brian austinApril 2, 20264 min read0 views

Claude Code memory: how to survive a 200k context window filling up If you've used Claude Code for more than a few hours on a big project, you've hit this wall. You're in the middle of a refactor. Claude is tracking 15 files, your CLAUDE.md, the conversation history, tool call outputs. Then it slows down. Responses get shorter. It starts forgetting things you told it an hour ago. You're not imagining it. Claude Code's context window is filling up — and there's a specific way to handle it. What's actually consuming your context ClaudeCode tracks several layers of context simultaneously: ┌─────────────────────────────────────┐ │ System prompt (CLAUDE.md) ~2k │ │ Project context (settings) ~1k │ │ Conversation history fills │ │ Tool call results large │ │ File contents (read_file) large │ │ A

Claude Code memory: how to survive a 200k context window filling up

If you've used Claude Code for more than a few hours on a big project, you've hit this wall.

You're in the middle of a refactor. Claude is tracking 15 files, your CLAUDE.md, the conversation history, tool call outputs. Then it slows down. Responses get shorter. It starts forgetting things you told it an hour ago.

You're not imagining it. Claude Code's context window is filling up — and there's a specific way to handle it.

What's actually consuming your context

ClaudeCode tracks several layers of context simultaneously:

┌─────────────────────────────────────┐ │ System prompt (CLAUDE.md) ~2k │ │ Project context (settings) ~1k │ │ Conversation history fills │ │ Tool call results large │ │ File contents (read_file) large │ │ Available: 200k total │ └─────────────────────────────────────┘

┌─────────────────────────────────────┐ │ System prompt (CLAUDE.md) ~2k │ │ Project context (settings) ~1k │ │ Conversation history fills │ │ Tool call results large │ │ File contents (read_file) large │ │ Available: 200k total │ └─────────────────────────────────────┘

Enter fullscreen mode

Exit fullscreen mode

The biggest culprits:

Tool call results — every read_file, bash, grep appends its full output
Long conversation threads — each message adds up
Repeated file reads — Claude re-reads the same files multiple times

The early warning signs

Before the window fills completely, you'll notice:

Responses get shorter and less specific
Claude starts asking you to re-explain things you already covered
read_file outputs get truncated
Multi-step plans lose track of earlier steps
Claude starts hedging more: "I'm not sure if we already..."

Strategy 1: The /clear command

The nuclear option. Wipes the conversation history entirely.

/clear

Enter fullscreen mode

Exit fullscreen mode

But you lose all context — including the reasoning behind decisions. Use this as a last resort.

Strategy 2: Checkpoint summaries

Before hitting the limit, ask Claude to summarize its own progress:

Before we continue, write a markdown summary of:

What we've accomplished so far
What files we've modified and why
What the next 3 steps are
Any decisions or constraints I should know about

Save it to PROGRESS.md`

Enter fullscreen mode

Exit fullscreen mode

Now you can /clear and paste the summary back in. You lose the conversation but keep the knowledge.

Strategy 3: Compact the context mid-session

Instead of clearing everything, compress it:

Summarize everything we've done in this session into a single paragraph I can paste  into a fresh conversation to continue. Be extremely concise — just the decisions,  changes made, and current state.

Summarize everything we've done in this session into a single paragraph I can paste  into a fresh conversation to continue. Be extremely concise — just the decisions,  changes made, and current state.

Enter fullscreen mode

Exit fullscreen mode

This gives you a handoff document that fits in ~500 tokens instead of 50k.

Strategy 4: Prevent it with targeted reads

Instead of:

# Bad: loads entire files into context read the codebase and understand how auth works

# Bad: loads entire files into context read the codebase and understand how auth works

Enter fullscreen mode

Exit fullscreen mode

Do:

# Good: surgical reads grep -n 'auth\|login\|token' src/routes/*.js | head -30

# Good: surgical reads grep -n 'auth\|login\|token' src/routes/*.js | head -30

Enter fullscreen mode

Exit fullscreen mode

Prompt Claude to use grep/find before read_file. It gets the answer with 1/10th the context cost.

Strategy 5: Use subagents for isolated tasks

For work that doesn't need full project context, spin up a subagent:

Create a subagent that ONLY has access to src/utils/format.js. Its only job: add JSDoc comments to every function. Report back the changes when done.

Create a subagent that ONLY has access to src/utils/format.js. Its only job: add JSDoc comments to every function. Report back the changes when done.

Enter fullscreen mode

Exit fullscreen mode

The subagent runs in its own context window. The results come back to your main session as a compact summary, not a 10k token diff.

Strategy 6: CLAUDE.md context pruning

Your CLAUDE.md gets loaded every session. Keep it under 500 lines.

Periodically audit:

Review my CLAUDE.md and identify:

Instructions that are outdated
Instructions that are redundant
Instructions that could be shorter

Suggest a pruned version under 200 lines.`

Enter fullscreen mode

Exit fullscreen mode

Strategy 7: The session handoff pattern

For long projects, end each session with a handoff ritual:

# In CLAUDE.md, add this section:

Session Handoff Protocol

At the end of each session, create SESSION-NOTES.md with:

What was accomplished
What was NOT done (and why)
Current blockers
Next session starting point
Any important context future sessions need`

Enter fullscreen mode

Exit fullscreen mode

This externalizes memory to the filesystem, where it's free.

What about rate limits?

Here's the thing that's frustrating with standard Claude access: hitting context limits often coincides with hitting rate limits, because you're doing your most intensive work.

If you're using Claude Code via ANTHROPIC_BASE_URL pointed at a flat-rate proxy:

export ANTHROPIC_BASE_URL=https://api.simplylouie.com

Enter fullscreen mode

Exit fullscreen mode

You get unlimited requests at $2/month — so at least rate limits stop compounding the context window problem.

TL;DR: the memory survival kit

1. Watch for early warning signs (shorter responses, hedging) 2. Use checkpoint summaries before /clear 3. Surgical grep instead of full file reads 4. Subagents for isolated tasks 5. Keep CLAUDE.md under 200 lines 6. End every long session with SESSION-NOTES.md

1. Watch for early warning signs (shorter responses, hedging) 2. Use checkpoint summaries before /clear 3. Surgical grep instead of full file reads 4. Subagents for isolated tasks 5. Keep CLAUDE.md under 200 lines 6. End every long session with SESSION-NOTES.md

Enter fullscreen mode

Exit fullscreen mode

The 200k context window feels huge until you're doing real work. These patterns keep you productive across the full project lifecycle.

Claude Code power user? Try pointing ANTHROPIC_BASE_URL at simplylouie.com for flat-rate access — no rate limits, $2/month.

Original source

Dev.to AI

https://dev.to/subprime2010/claude-code-memory-how-to-survive-a-200k-context-window-filling-up-idk

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudeavailableversion

Products

RAG on Mobile: Embedding Vector Databases in Production - vocal.media

RAG on Mobile: Embedding Vector Databases in Production vocal.media

GNews AI RAG

1m4 days ago

Research PapersLive

[R] Best way to tackle this ICML vague response?

Going through ICML submission for the first time. I had a reviewer ask for some things and during the rebuttal period I ran more experiments and answered all their questions (they wrote 3 weaknesses). Yesterday started the author-reviewer discussion period which ends on April 7. In their response to my rebuttal the reviewer wrote in one line that my "experiments greatly improved the paper" but "some details remain only partially clarified". That's it... They marked "Acknowledgement: (b) Partially resolved - I have follow-up questions for the authors." The ICML email state that I can "post up to one additional response to any further reviewer comments that are posted, as a reply to your rebuttal". But since the reviewers didn't actually write any follow up questions I have no idea how to ta

Reddit r/MachineLearning

1mabout 1 hour ago

CountriesLive

UK National Education Union poll: 66% of secondary school teachers in England say pupils using AI are losing their capacity for core skills like writing (Sally Weale/The Guardian)

Sally Weale / The Guardian : UK National Education Union poll: 66% of secondary school teachers in England say pupils using AI are losing their capacity for core skills like writing Two-thirds of secondary school teachers report a decline in core abilities such as writing and problem-solving

Techmeme

1m9 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 196 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

ReleasesFresh

Korea Deloitte, NICE Consulting launch RAG AI to verify security disclosures - CHOSUNBIZ - Chosunbiz

Korea Deloitte, NICE Consulting launch RAG AI to verify security disclosures - CHOSUNBIZ Chosunbiz

GNews AI RAG

1mabout 3 hours ago

ReleasesLive

Alibaba releases Qwen3.6-Plus, its third proprietary, closed-source AI model launched within a three-day period, saying it "drastically enhanced" agentic coding (Luz Ding/Bloomberg)

Luz Ding / Bloomberg : Alibaba releases Qwen3.6-Plus, its third proprietary, closed-source AI model launched within a three-day period, saying it drastically enhanced agentic coding Alibaba Group Holding Ltd. has released its third proprietary AI model in as many days, reinforcing the company's intent

Techmeme

1m39 minutes ago

ReleasesFresh

Mark Gurman Dismisses Apple Intelligence China Launch As Error, Points To Missing Approval And Unusual Timing - Yahoo Finance

Mark Gurman Dismisses Apple Intelligence China Launch As Error, Points To Missing Approval And Unusual Timing Yahoo Finance

GNews AI Apple

1mabout 11 hours ago

ReleasesRecent

Gcore Launches NVIDIA Hopper GPU Virtual Machines for Flexible AI Compute - HPCwire

Gcore Launches NVIDIA Hopper GPU Virtual Machines for Flexible AI Compute HPCwire

GNews AI NVIDIA

1m1 day ago