The bottleneck for AI coding assistants isn't intelligence — it's navigation
<p>You ask Claude about a function. It gives you a confident, detailed explanation. You build on it for an hour. Then you find out it was wrong.</p> <p>Or: you change a function, tests pass, you ship. Three days later — four other places called that function, all broken. Claude never mentioned them.</p> <p>Same root cause: <strong>Claude doesn't have a way to navigate your codebase.</strong></p> <h2> The core idea </h2> <p><strong>Turn your entire repo into a graph. Use BFS + LSP to search and traverse it.</strong><br> </p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>/generate-index → build the graph (deterministic script + AI refine) ↓ AI_INDEX.md → the graph itself (adjacency list — nodes are domains, edges are connections) ↓ /investigate-module → rea
Not another code graph engine. A lightweight navigation workflow for AI coding agents.
You ask Claude about a function. It gives you a confident, detailed explanation. You build on it for an hour. Then you find out it was wrong.
Or: you change a function, tests pass, you ship. Three days later — four other places called that function, all broken. Claude never mentioned them.
Same root cause: Claude doesn't have a way to navigate your codebase.
It starts from scratch every time. It reads what you give it. It guesses what it doesn't have. You get hallucinations, missed impact, bugs introduced in blind spots.
The fix isn't a smarter model. It's a map.
Why this matters for real-world engineers
If you use AI for coding regularly, you've probably seen this already:
You ask it to fix a bug or add a feature, and it starts confidently exploring the wrong area of the codebase.
It reads a few files that look related, misses one critical connection, and then builds the wrong mental model from there.
Sometimes it still produces code that looks plausible. Sometimes it even "almost works." And that's exactly what makes it dangerous.
Because now you're not saving time anymore. You're doing one of these instead:
-
babysitting its search process
-
repeatedly correcting its assumptions
-
re-pasting the right files into context
-
cleaning up a solution built on the wrong part of the system
At that point, the bottleneck is no longer code generation. It's codebase navigation.
And the bigger or messier the repo gets, the worse this becomes.
Most real repositories are not clean demo projects. They have:
-
historical baggage
-
duplicated patterns
-
stale modules
-
hidden wiring
-
config-driven behavior
-
"one weird file" that everything secretly depends on
A human engineer eventually learns those paths over time. AI does not. Every session starts with partial memory, incomplete context, and a high chance of exploring the wrong route.
This project is built for that exact reality. Not for idealized benchmarks. Not for toy repos. Not for "look how smart the model is" demos.
But for actual day-to-day engineering work:
"Where do I even start?" "What else does this change affect?" "What files should I read before I touch this?" "What did the AI miss?"
That is the problem this project is trying to solve.
The dilemma
Here's the dilemma every Claude Code user faces:
Option A: Let Claude read everything. It greps your entire repo, opens 20 files, reads thousands of lines. Thorough, but it burns through your token budget before getting to the actual work. On a large repo, you hit context limits and Claude starts forgetting what it read 5 minutes ago.
Option B: Let Claude read what it thinks is relevant. It guesses which files matter, opens 3-4, and gives you a confident answer. Fast, but it quietly misses files it didn't know existed. You ship the fix. A week later, someone finds the broken code path Claude never looked at.
Both options suck. Read too much = expensive and slow. Read too little = miss things.
There's a third option: give Claude a map.
Think of your codebase like Tokyo's subway system. Without a map, you wander between stations trying random lines until you find your destination. With a map, you glance at it once, see the route, and go. The map doesn't limit where you can go — it just keeps you off the wrong train.
"But isn't this just another CLAUDE.md / AI_INDEX template?"
No. And this is the key difference.
A typical AI_INDEX is a phone book:
auth → src/auth/ payments → src/payments/ billing → src/billing/auth → src/auth/ payments → src/payments/ billing → src/billing/Enter fullscreen mode
Exit fullscreen mode
It tells Claude where files live. That's it. Claude finds auth, reads it, fixes the bug. Done. But it has no idea that payments calls auth through processRefund(), or that billing depends on both.
This plugin builds a graph — a map with connections:
auth → connects to: payments (via processRefund in payments/checkout.py) → connects to: billing (via verifyToken in billing/api.py) → tests: tests/test_auth.py, tests/test_checkout.pyauth → connects to: payments (via processRefund in payments/checkout.py) → connects to: billing (via verifyToken in billing/api.py) → tests: tests/test_auth.py, tests/test_checkout.pyEnter fullscreen mode
Exit fullscreen mode
When you fix a bug in auth, the plugin walks those connections and finds every downstream path that breaks. A phone book can't do that. grep returns noisy results that Claude has to manually filter — and it often misses connections buried in the noise. A graph with edges traces impact directly, no filtering needed.
Here's what it looks like in practice:
Without the graph, you say "fix the deleteItem bug" and Claude:
-
greps for "deleteItem" → 12 results across 6 files
-
Opens mover.js (good guess) → finds the bug → fixes it
-
Done. Ships. But the restore/undo handler in server.js had the same bug. Claude never looked there because nothing told it to.
With the graph, Claude reads AI_INDEX.md first and sees:
Mover → Connects to: Server (via POST /api/delete, POST /api/restore)
Enter fullscreen mode
Exit fullscreen mode
Now it knows: "mover connects to server through delete AND restore." It checks both. Finds the cascade bug. Fixes both files. Ships clean.
That's why the cascade bug in our benchmark (Test 1) was only caught by the graph version — it followed the Connects to edge from the delete handler to the restore/undo handler that had the same bug. Without edges, Claude fixed one and shipped the other broken.
The problem is not intelligence. It's navigation.
We tested existing approaches — Aider's repo map, RepoMapper, and similar tools.
They are useful. But they solve a different problem.
They help models understand a repository. But in practice, AI coding assistants don't fail because they lack understanding. They fail because they:
-
read too many irrelevant files
-
miss critical connections (registries, routing, config wiring)
-
build incorrect mental models
-
waste tokens exploring blindly
-
fail to trace impact correctly
The bottleneck is not intelligence. The bottleneck is navigation.
This project is not trying to compete with Aider, RepoMapper, or full code intelligence systems. Those tools are good at summarizing a repository. This project solves a different problem:
AI coding assistants don't fail because they are not smart enough. They fail because they don't know where to look.
We focus on navigation, not summarization.
Instead of giving the model more context, we give it a routing system:
-
A lightweight graph (like a subway map)
-
Deterministic structure (no heavy infra, no databases)
-
Skill-driven traversal (BFS, impact tracing)
-
Fallback exploration when the graph is incomplete
The goal is simple:
Use the least amount of tokens to find the right code paths — reliably.
Why not just use search, grep, or repo maps?
A common reaction to this problem is: "Can't the AI just search the repo?" or "Isn't grep enough?" or "Don't repo maps already solve this?"
We tried all of those. They help — but they don't solve the core issue.
Search and grep are reactive
Search works only if you already know what to look for. In real tasks, you often don't.
You might start with a vague bug, an unclear entry point, or a feature request with no obvious anchor. So the AI searches for keywords. That usually returns too many files, partially related code, and misleading matches. Now the model still has to guess: "Which of these actually matters?"
This is where things go wrong.
Repo maps are static summaries
Tools like Aider generate a condensed view of the repository — key classes, important functions, structural highlights. That's useful for orientation.
But it doesn't tell the model:
-
where to start for this specific task
-
how to move from one file to another
-
what paths to follow when tracing impact
-
how to recover when the initial path is wrong
A summary is not a navigation system.
The real problem is path selection
The failure mode we observed repeatedly is not: "The model doesn't understand the code." It's: "The model is looking at the wrong code."
Once it goes down the wrong path, every subsequent step compounds the error, the context gets polluted, and the solution drifts further away from reality.
This project focuses on routing, not searching
Instead of asking the model to figure everything out from scratch, we give it:
-
a starting point
-
a set of connected nodes
-
a structured way to expand outward (BFS-style)
-
a fallback when the graph is incomplete
The goal is not to eliminate exploration. The goal is to guide it.
Think of it this way: search is like dropping someone into a city with Google. They can find things — eventually. This graph is a subway map. It doesn't replace exploration. But it prevents wandering blindly.
Without this: AI searches → opens random files → guesses → drifts
With this: AI starts at the right place → follows connections → reads source → expands only when needed
Less noise. Fewer wrong turns. Lower token usage. More reliable results.
That's the difference between finding code and navigating a system.
What this plugin does
Four Claude Code skills (slash commands that run structured workflows) that give Claude a persistent, structured map of your codebase — and the workflows to use it effectively.
Skill What it does
/generate-graph
Builds the codebase map (domain → files → relationships → docs links)
/sync-graph
Keeps the map fresh after changes
/debug
Locate → root cause → fix
/new-feature
Find pattern → trace impact → implement
The map (AI_INDEX.md) lives in your repo. Claude reads it at the start of every task. It knows which files belong to which domain, which patterns exist, where the docs are.
How it works
The map
/generate-graph produces an AI_INDEX.md — a structured routing manifest:
## Domain: auth Files: src/auth/login.py, src/auth/tokens.py, src/auth/middleware.py Patterns: JWT tokens, session handling Docs: docs/auth/overview.md## Domain: auth Files: src/auth/login.py, src/auth/tokens.py, src/auth/middleware.py Patterns: JWT tokens, session handling Docs: docs/auth/overview.mdEnter fullscreen mode
Exit fullscreen mode
Claude reads this at the start of every task. It knows which files belong to which domain, which patterns exist, where the docs are. No hallucination. No guessing.
How big does the graph get? It grows slowly relative to codebase size: 10-file repo = 62 lines. 77K-file repo = ~420 lines. A 420-line map still costs only ~1,500 tokens to read — cheaper than a single grep that returns 40 noisy results.
The workflow
Instead of dumping context into the model, we do:
- Generate a lightweight graph
-
Deterministic (imports, structure, tests, entry points)
-
No database, no heavy setup
-
Stored directly in the repo
- Navigate using the graph
-
Start from a node (file / feature / bug)
-
Traverse using BFS-style expansion
-
Follow connections (imports, tests, routing, etc.)
- Read source code only when needed
-
The graph narrows the search space
-
The model reads actual source for correctness
- Fallback when the graph is incomplete
-
Use grep / references / search
-
Discover missing connections
-
Optionally patch the graph
The skills
/debug — a structured workflow, not a prompt
-
Locate the entry point (graph → domain → file)
-
Read the relevant code
-
Identify root cause
-
Exhaustive scan for the same pattern across all files
-
Fix all instances
Think of it like dropping a piece of food in a petri dish with slime mold. The slime mold doesn't search the entire dish — it starts from the food and sends out tendrils in every direction, following the paths that lead somewhere and cutting the ones that don't. Eventually it finds every connected point without wasting energy on dead ends. That's what /debug does — starts from the bug and follows connections outward until it's found everything affected.
/new-feature — find the existing pattern, copy it
-
Graph → find a similar existing feature
-
Trace impact of that feature to understand all layers it touches
-
Implement the new feature at every layer, following the same pattern
-
Verify before shipping
/sync-graph — keep the map fresh
After significant changes, /sync-graph updates AI_INDEX.md. Adds new files to the right domains, updates pattern lists, keeps docs links current.
What if the map goes stale? /sync-graph runs after every bug fix and feature — the plugin reminds Claude to update. If you forget, the graph is unlikely to give wrong answers — at worst, it points to files that moved or connections that changed, and Claude discovers this immediately upon reading the file and falls back to grep. A stale graph degrades to grep-level performance, not worse. It's additive, not replacing.
When to regenerate: After major refactors, or weekly on active repos. Takes under 30 seconds. On a team, commit AI_INDEX.md to the repo and add /sync-graph to your PR checklist — if someone forgets, the graph just becomes incomplete, never harmful.
Design principles
-
The graph is minimal. We only include information that affects navigation: where to start, what connects to what, which tests are relevant. No summaries. No explanations. No fluff.
-
Source code is always the truth. The graph never replaces code reading.
-
Deterministic first, AI second. Scripts generate the base graph. AI only fills critical gaps when needed.
-
No heavy infrastructure. No graph databases. No vector stores. No indexing services. Everything lives inside the repo.
-
Graceful degradation. If the graph is wrong or incomplete, the system falls back to search, exploration continues, and the graph can be patched.
Your workflow (the human part)
You don't need to understand the internals. You don't choose between approaches. The plugin handles that automatically. Here's what your day actually looks like:
First time on a repo:
/generate-graph
Enter fullscreen mode
Exit fullscreen mode
Done. Takes 30 seconds. You now have a graph.
Someone reports a bug:
You: "fix this bug: [paste the Slack message / error / screenshot]"
Enter fullscreen mode
Exit fullscreen mode
Claude automatically reads the graph, finds the right domain, reads the docs, traces the code, finds root cause, and proposes a fix. You review and merge.
Someone requests a feature:
You: "add this feature: [paste the requirement]"
Enter fullscreen mode
Exit fullscreen mode
Claude finds a similar existing feature, copies the pattern across all layers, and implements it. You review and merge.
That's it. You paste the problem, Claude follows the workflow, you review the output. The graph, the docs, the search logic — all of that happens behind the scenes. You don't invoke skills manually. You don't choose an approach. You just say what you need.
The only thing you need to remember:
-
First time → /generate-graph
-
After that → just paste your task and let Claude work
Does it actually work?
Eight benchmark tasks across repos of different sizes (small hobby project to 77K-file monorepo), comparing: graph-guided navigation vs. no map vs. project docs vs. fullstack-debug vs. Aider's PageRank map.
Test 1 — Bug fix: missing rate limit (small repo)
Metric A (graph) B (no map)
Tokens 14K 14K
Tool calls 10 12
Found root cause? ✅ ✅
Found cascade impact? ✅ ❌
Same tokens, but B missed the restore/undo path. It fixed the main bug and left a secondary code path broken. A found it because it walked the full call graph.
Test 2 — Bug fix: UI refresh issue (small repo)
Metric A (graph) B (no map)
Tokens 5K 5.1K
Tool calls 4 5
Found root cause? ✅ ✅
Simple UI bug — comparable performance. Graph doesn't help much when the entry point is obvious.
Test 3 — New feature planning (small repo)
Metric A (graph) B (no map)
Tokens 11K 14K
Tool calls 10 14
Identified impact correctly? ✅ ✅
23% fewer tokens. The graph told Claude which files to skip. B explored files that turned out to be irrelevant.
Test 4 — Understanding a flow (small repo)
Metric A (graph) B (no map)
Tokens 5K 6K
Tool calls 5 8
Accurate explanation? ✅ ✅
17% fewer tokens, 37% fewer tool calls. Graph provided entry points directly.
Test 5 — Pattern audit: find all instances of a bug pattern (small repo)
Metric A (graph) B (no map) A + exhaustive sweep
Tokens 16K 22K 16K + $0.02
Tool calls 12 18 12 + sweep
Coverage ~80% ~60% 100%
Neither agent alone hits 100%. Graph scopes the search area, then an optional exhaustive sweep scans every file for the same bug pattern — costs about $0.02 on a large repo. Full coverage.
Test 6 — Bug fix: missing feature flag (large repo, 77K files)
Metric A (graph) C (no map)
Tokens 48K 72K
Tool calls 14 26
Found root cause? ✅ ✅
33% fewer tokens on a 77K-file repo. The graph narrowed the search from the entire monorepo to a single domain. C explored broadly before finding the right area.
Test 7 — Cross-repo investigation: frontend calling backend (large repo)
Metric A (graph) C (no map)
Tokens 55K 82K
Tool calls 18 33
Found the backend endpoint? ✅ ✅
Found the wiring gap? ✅ ❌
C found the backend endpoint. A found that too — plus the fact that the frontend component called get_tool_input_text(). Infrastructure ready, caller not wired. Graph saved 33% tokens over no-map.
Test 8 — New feature investigation: session context tool calls (large repo, 4 approaches)
Frontend developer asks: can we add tool calls, in/out flags, and tool names to the session context API?
Metric A (graph) C (no map) D (project docs) E (fullstack-debug) Aider map
Tokens 61K 47K 64K 49K N/A
Tool calls 17 30 35 32 N/A
Found endpoint? ✅ ✅ ✅ ✅ ❌
Found existing helpers? ✅ ✅ ✅ ✅ —
Extra insight — — ⚠️ ingestion caveat — —
Aider's map optimizes for editing context, not investigation. Its PageRank-based ranking prioritizes "globally important" functions — on the 77K-file repo, the session context endpoint wasn't important enough to make it into the 560-line map. A task-specific graph with explicit edges performs better for tracing and investigation. Agent D (project docs) found a critical caveat about data storage that others missed. Agent A used fewest tool calls (17 vs 30-35).
Honest note: in Test 8, the graph version actually used MORE tokens (61K vs 47K). The graph guided Claude to read deeper — it found an ingestion caveat the others missed, but it cost more tokens doing so. The graph doesn't always save tokens. Its value is coverage, not cost.
Summary: when does each approach help?
Task type Token savings (graph vs no map) Quality difference
Bug fix (clear entry point) ~0% Graph finds cascade impact others miss
Bug fix (UI flow) ~3% Comparable
New feature planning 23% Graph knows which files to skip
Understanding a flow 17% Graph provides entry points directly
Pattern audit (large repo) 42% Graph + exhaustive sweep = 100% coverage
Cross-repo investigation 33% Graph points to the right repo/domain
Feature investigation (large repo) Varies Aider optimizes for editing, not investigation; graph + docs wins
Key findings
The graph's biggest value isn't saving tokens — it's preventing missed impact. On a 10-file repo, savings are 17-23%. On a 77K-file repo, savings jump to 33-42%. But finding the cascade bug (the restore/undo path that only the graph version caught) — that's a qualitative difference, not quantitative.
(42% is the peak saving on pattern audits across large repos. Average across all task types is 17–33%. We show the full range in the benchmarks above.)
Aider's map and this graph solve different problems. Aider optimizes for editing context (which files to include when making changes). This plugin optimizes for investigation and impact tracing (which files are connected to your change). On the 77K-file repo, the session context endpoint wasn't in Aider's 560-line map at all — it wasn't globally important, just task-relevant.
No single approach achieves 100% coverage on pattern audits. The best workflow is a hybrid: graph scopes down the search area, then an exhaustive sweep finds every remaining instance for ~$0.02.
Project documentation adds unique value — domain-specific caveats and business logic that code alone won't tell you. The graph's Docs: field links to these per-domain docs automatically.
What this is NOT
-
Not a full code intelligence platform
-
Not a semantic search engine
-
Not a replacement for reading code
-
Not trying to be the most "accurate" graph
What this IS
A lightweight, practical navigation layer for AI coding workflows.
Not perfect understanding. Not complete graphs. Just this:
Find the right code, fast, with minimal tokens, and don't miss critical paths.
Get it
Install as a plugin — drop it into any project:
cd your-project git clone https://github.com/ithiria894/claude-code-best-practices .claude-plugincd your-project git clone https://github.com/ithiria894/claude-code-best-practices .claude-pluginEnter fullscreen mode
Exit fullscreen mode
Then run /generate-graph in Claude Code. That's it.
github.com/ithiria894/claude-code-best-practices
Built from research, source code analysis, and way too many hours of watching Claude confidently explain code it hadn't read.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudemodelfeatureAI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries - Stanford HAI
<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxNMC0tS0FTMnJWT0J0c1NBVWItaURTT0tkN2hFVURwOFlrSTMyVDlyT3I3dnhiaUNZVG1IM3Jqbmp0M0JXaDM3NThjcXd2emhobmx2a2Z5cW9YcF95cEUxYXFWbi1Tb0txUXAxYmpZdjJXMWdRU0VUT0RuOW9NeG1XaFpfTVRseU5uYjVLeWxQeTBHZzN0anU5VXRMWTJhNVdQT1E?oc=5" target="_blank">AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries</a> <font color="#6f6f6f">Stanford HAI</font>
Sources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)
Bloomberg : Sources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B — SpaceX has filed confidentially for an initial public offering, according to people familiar with the matter …
Meituan debuts open-source AI model - KrASIA
<a href="https://news.google.com/rss/articles/CBMiSEFVX3lxTE1ZZlBjMlJKaENXcnY4Sm5fcmd3dF93YXFyekg3Mko0VU5OWFBYeWFKc2l2S3B1TzJPOW5sR0RtVXhaX1ZWd1RhZw?oc=5" target="_blank">Meituan debuts open-source AI model</a> <font color="#6f6f6f">KrASIA</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
Iran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle East
Iran has threatened multiple US tech giants in the Middle East, escalating tensions and raising fears of AI-driven warfare turning physical. The post Iran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle East appeared first on TechRepublic .
OnlyOffice kills Nextcloud partnership for forking its project without approval
Article URL: https://www.neowin.net/news/onlyoffice-suspends-nextcloud-partnership-over-unapproved-euro-office-fork/ Comments URL: https://news.ycombinator.com/item?id=47601168 Points: 17 # Comments: 3
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!