I Built a Production Agent Orchestrator. Then Claude Code's Source Leaked and I Saw the Same Architecture
<p>On March 31st, someone discovered that version 2.1.88 of Claude Code's npm package shipped with an unobfuscated source map pointing to Anthropic's entire TypeScript codebase. Around 1,900 files. Over half a million lines of code. Everything.</p> <p>I've spent the better part of a year building a production system that uses Claude as the brain of a voice controlled automation tool. Five specialist agents, an Opus orchestrator, over 1,400 tests across 22 major versions. When I read through what the leak revealed, I wasn't looking for secrets. I wanted to know: did they arrive at the same patterns? Turns out the answer was yes.</p> <h2> The orchestrator and specialists pattern </h2> <p>The most important architectural decision in any agent system is how you split work between a coordinator
On March 31st, someone discovered that version 2.1.88 of Claude Code's npm package shipped with an unobfuscated source map pointing to Anthropic's entire TypeScript codebase. Around 1,900 files. Over half a million lines of code. Everything.
I've spent the better part of a year building a production system that uses Claude as the brain of a voice controlled automation tool. Five specialist agents, an Opus orchestrator, over 1,400 tests across 22 major versions. When I read through what the leak revealed, I wasn't looking for secrets. I wanted to know: did they arrive at the same patterns? Turns out the answer was yes.
The orchestrator and specialists pattern
The most important architectural decision in any agent system is how you split work between a coordinator and its specialists. A high capability model (Opus) handles strategic decisions: what needs to happen, which agent should do it, how to combine results. Lower cost models (Sonnet) handle tactical execution: read these files, run these tests, update this documentation. The orchestrator thinks. The specialists do.
The leaked coordinator code confirms Anthropic built the same thing. Their orchestrator delegates to specialists with focused tool access and scoped responsibilities. The coordinator prompt even echoes guidance I'd written independently: don't be vaguely deferential when delegating, specify exactly what to do. This isn't a coincidence. It's convergent engineering. Orchestrators need to preserve their context window for coordination.
Specialists need to burn through context freely and return only a concise summary. If the specialist pollutes the orchestrator's context, the orchestrator drowns. If the orchestrator tries to do the specialist's job, it runs out of room for strategy.
Subagents don't inherit your configuration
Here's the most important thing I learned the hard way. Custom subagents do not inherit your project configuration. I spent weeks debugging why specialists ignored conventions documented in my main configuration file. The orchestrator followed them perfectly. The specialists acted like they'd never seen them. Because they hadn't. Each specialist gets its own context window. It starts fresh.
The leaked code confirms this is by design. Each specialist prompt needs to be a complete behavioral contract: who it is, what it can do, what files it owns, what format to return results in, and when to stop. If a specialist needs your coding conventions, those conventions go in the specialist prompt itself.
This is the number one mistake in agent systems. People write beautiful CLAUDE.md files and wonder why their spawned agents ignore every rule. Now you know why.
Guardrails that actually work
Early on, I tried to enforce critical rules through prompt wording. Don't modify files outside your scope. Don't add features that weren't requested. I capitalised things. I wrote "MUST" and "NEVER" in important places.
It didn't work. Not reliably. The agent would follow the rules most of the time, then occasionally blow past them without warning. The failure rate was low enough to be dangerous.
The leaked permission engine shows Anthropic solved this the same way I did: with code, not words. Critical restrictions are enforced by a permission engine that checks before every operation, not by prompt instructions the model might or might not follow. Hooks fire deterministically. Deny rules block operations regardless of what the model wants to do.
I call this deterministic guardrails over probabilistic compliance. You cannot rely on the model remembering to do something consistently. A linter that blocks bad imports beats a coding guideline. A verification script that runs automatically beats a prompt that says "remember to run tests." Rules enforced by tools are always followed. Rules enforced by discipline are followed until something goes wrong.
The most critical rules in Claude Code are not in the system prompt. They're in the permission engine.
Graduated autonomy
I assign freedom levels to each specialist based on risk. Low freedom agents that touch critical infrastructure get exact formats, narrow file ownership, and explicit boundaries. My testing agent can read source code and run tests, but cannot edit source files. If it finds a bug, it reports it. It does not fix it. High freedom agents like the orchestrator get goals, constraints, and heuristics.
Anthropic's agent architecture maps to the same spectrum. A doc reviewer gets read access only, a test runner gets read and execute but no edit, a code modifier gets full read and write but no command execution. The leak further confirms that subagents cannot spawn other subagents, preventing scope creep and infinite nesting.
Most people give every agent the same autonomy. That's like giving every employee the same security clearance.
What surprised me
Not everything confirmed what I already knew. Some of it sent me back to rethink my own architecture.
The leaked code references KAIROS, an autonomous daemon mode with "autoDream" memory consolidation that periodically reviews, consolidates, and prunes stored memories to prevent context bloat. I had been handling memory reactively. This proactive approach is something I'm planning to adopt.
The source apparently injects fake tool definitions into system prompts to poison competitor training data. Whatever you think about the ethics, it's fascinating adversarial engineering. If you're building systems that call LLM APIs, responses you receive might contain deliberate noise designed for purposes that have nothing to do with your query.
And then there's "Undercover Mode," which instructs Claude to never reveal it's an AI when contributing to public repositories. This has generated the most controversy, understandably. But from an engineering perspective, it reveals that Anthropic treats agent identity as a configurable property, not a fixed trait.
Interesting architectural choice regardless of whether you agree with how they used it.
The real lesson
Most coverage of this leak has focused on the controversial parts. But for anyone building agent systems, the deeper story is quieter and more useful.
The patterns in Claude Code are not novel. Orchestrator plus specialists. Context isolation. Deterministic enforcement. Graduated autonomy. These are the engineering reality of building systems where AI agents need to coordinate reliably, and anyone working seriously with agent orchestration arrives at them through trial and error.
What's encouraging is that the leak lowers the barrier. Before, you had to discover these patterns through months of mistakes. Now the reference implementation is on GitHub with 30,000 stars. You can read how Anthropic's team solved the same problems you're going to face, and skip the worst of the dead ends.
The patterns aren't the hard part though. Knowing when to apply them, which tradeoffs matter for your system, and how to debug failures that don't show up in any architecture diagram — that's still something you learn by building. The leak gives you a map. You still have to walk the territory.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudemodeltraining
Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
arXiv:2604.00443v1 Announce Type: new Abstract: If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense di

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning
arXiv:2604.00438v1 Announce Type: new Abstract: In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward estimation, as models typically lack access to ground-truths during inference. To address this limitation, we propose Test-Time Rethinking for In-Context Reinforcement Learning (TR-ICRL), a novel ICRL framework designed for both reasoning and knowledge-intensive tasks. TR-ICRL operates by first retrieving the most relevant instances from an unlabeled evaluation set for a given query. During each ICRL iteration, LLM generates a set of candidate answers for every retrieved instance. Next, a pseudo-label is derived from this set through majority voti

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
arXiv:2604.00375v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. T
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics
arXiv:2604.00443v1 Announce Type: new Abstract: If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different word, same meaning) across models spanning 110M-70B parameters. The confound carries into sparse autoencoders (18-36% of features blend senses), sits in <=1% of activation dimensions, and hurts downstream tasks: filtering it out improves word sense di

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning
arXiv:2604.00438v1 Announce Type: new Abstract: In-Context Reinforcement Learning (ICRL) enables Large Language Models (LLMs) to learn online from external rewards directly within the context window. However, a central challenge in ICRL is reward estimation, as models typically lack access to ground-truths during inference. To address this limitation, we propose Test-Time Rethinking for In-Context Reinforcement Learning (TR-ICRL), a novel ICRL framework designed for both reasoning and knowledge-intensive tasks. TR-ICRL operates by first retrieving the most relevant instances from an unlabeled evaluation set for a given query. During each ICRL iteration, LLM generates a set of candidate answers for every retrieved instance. Next, a pseudo-label is derived from this set through majority voti

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
arXiv:2604.00375v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. T

Neuropsychiatric Deviations From Normative Profiles: An MRI-Derived Marker for Early Alzheimer's Disease Detection
arXiv:2604.00545v1 Announce Type: new Abstract: Neuropsychiatric symptoms (NPS) such as depression and apathy are common in Alzheimer's disease (AD) and often precede cognitive decline. NPS assessments hold promise as early detection markers due to their correlation with disease progression and their non-invasive nature. Yet current tools cannot distinguish whether NPS are part of aging or early signs of AD, limiting their utility. We present a deep learning-based normative modelling framework to identify atypical NPS burden from structural MRI. A 3D convolutional neural network was trained on cognitively stable participants from the Alzheimer's Disease Neuroimaging Initiative, learning the mapping between brain anatomy and Neuropsychiatric Inventory Questionnaire (NPIQ) scores. Deviations

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!