Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessGoogle Just Dropped Gemma 4 + Veo 3.1 Lite And Quietly Killed the Cloud-Only AI EraMedium AI10 New Features from Google Gemini That Are Changing Artificial Intelligence in 2026Medium AIThe Paper That Broke Deep Learning Open: A Brutal, Illustrated Walkthrough of “Attention Is All You…Medium AI팔란티어처럼 해체하고 연결하고 장악하라Medium AIO Conto do Vigário Tech: Por Que o “Vibe Coding” e a Dependência Cega da IA Estão Criando…Medium AIAge Is Not an Ending: Designing Your Life Around Changing CapabilitiesMedium AIWe Moved a Production System from Azure VMs to Bare Metal Kubernetes in 3 MonthsDEV CommunityMorocco's E-Invoicing Revolution: What Every Business Needs to Know Before 2026DEV CommunityAI Safety at the Frontier: Paper Highlights of February & March 2026lesswrong.comWe audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionallyDEV Community50 Hours Building a Next.js Boilerplate So You Can Ship in 30 Minutes!DEV Communityoh-my-claudecode is a Game Changer: Experiencing Local AI Swarm OrchestrationDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessGoogle Just Dropped Gemma 4 + Veo 3.1 Lite And Quietly Killed the Cloud-Only AI EraMedium AI10 New Features from Google Gemini That Are Changing Artificial Intelligence in 2026Medium AIThe Paper That Broke Deep Learning Open: A Brutal, Illustrated Walkthrough of “Attention Is All You…Medium AI팔란티어처럼 해체하고 연결하고 장악하라Medium AIO Conto do Vigário Tech: Por Que o “Vibe Coding” e a Dependência Cega da IA Estão Criando…Medium AIAge Is Not an Ending: Designing Your Life Around Changing CapabilitiesMedium AIWe Moved a Production System from Azure VMs to Bare Metal Kubernetes in 3 MonthsDEV CommunityMorocco's E-Invoicing Revolution: What Every Business Needs to Know Before 2026DEV CommunityAI Safety at the Frontier: Paper Highlights of February & March 2026lesswrong.comWe audited LoCoMo: 6.4% of the answer key is wrong and the judge accepts up to 63% of intentionallyDEV Community50 Hours Building a Next.js Boilerplate So You Can Ship in 30 Minutes!DEV Communityoh-my-claudecode is a Game Changer: Experiencing Local AI Swarm OrchestrationDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

We tested structured ontology vs Markdown+RAG for AI agents — "why?" recall was 0% vs 100%

Dev.to AIby Martin ArvaApril 4, 20266 min read0 views
Source Quiz

Our AI agent knew the company uses Provider A for identity verification. It could name the provider, list the integration specs, recite the timeline. Then we asked why Provider A was chosen over Provider B. The agent couldn't answer. Not once across 24 attempts. Zero percent recall on reasoning questions. So we built the layer that was missing — and ran 48 controlled experiments to measure the difference. The problem: AI agents can't answer "why?" If you give an AI agent a folder of Markdown docs and let it use RAG to find answers, it handles factual questions well. What modules exist? Who owns this component? When was this decision made? But "why?" is different. Reasoning is rarely stored as a discrete fact. It's spread across meeting notes, scattered through Slack threads, buried in the

Our AI agent knew the company uses Provider A for identity verification. It could name the provider, list the integration specs, recite the timeline.

Then we asked why Provider A was chosen over Provider B.

The agent couldn't answer. Not once across 24 attempts. Zero percent recall on reasoning questions.

So we built the layer that was missing — and ran 48 controlled experiments to measure the difference.

The problem: AI agents can't answer "why?"

If you give an AI agent a folder of Markdown docs and let it use RAG to find answers, it handles factual questions well. What modules exist? Who owns this component? When was this decision made?

But "why?" is different.

Reasoning is rarely stored as a discrete fact. It's spread across meeting notes, scattered through Slack threads, buried in the third paragraph of a design doc written six months ago. The connection between a strategic goal and an operational decision almost never appears as a single retrievable chunk.

This means vector search finds the documents that mention the decision, but not the reasoning chain that justifies it. The agent knows what happened. It doesn't know why.

This matters more than it sounds. An agent that doesn't understand why a decision was made will make follow-up decisions that are technically correct but institutionally wrong — optimizing for the wrong goal, violating an unwritten constraint, repeating a mistake that was already analyzed and rejected.

Our approach: structured ontology as a navigation layer

We didn't replace the Markdown docs. We added a structured layer on top — a four-level ontology that maps business reasoning into queryable relationships:

LORE (foundational beliefs, worldview)  ↓ interpreted_into VISION (goals, priorities, boundaries)  ↓ operationalized_into RULES (policies, decision rules, constraints)  ↓ applied_to OPERATIONS (initiatives, decisions, tasks)

Enter fullscreen mode

Exit fullscreen mode

Every connection between layers carries an assertion — an explicit explanation of why that relationship exists. This means an agent can trace from any operational decision back to the foundational beliefs that justify it.

Here's what that looks like in practice. Ask: "Why did we choose Provider A for identity verification?"

The agent traces:

OPERATIONS → Chose Provider A (affordable, OIDC-compatible)  ← applied_to RULES → Start with affordable identity provider, plan migration later  ← operationalized_into VISION → Build self-service tools for micro-entrepreneurs  ← interpreted_into LORE → Small business owners want to handle accounting themselves

Enter fullscreen mode

Exit fullscreen mode

No vector search. No probabilistic retrieval. SQL queries over a versioned database.

The backend is Dolt — a database with Git semantics. Branch, commit, diff, merge, pull request. Every change to the ontology goes through human review before it becomes canonical.

The interface is MCP (Model Context Protocol) — the de facto standard for connecting AI agents to external tools. Our server exposes 18 tools: 9 for querying, 4 for proposing changes, 3 for generating reasoning envelopes, and 2 for Dolt version control.

The experiment

We tested this on a real business domain — a SaaS company's market expansion project. Same knowledge base, same questions, two modes:

  • Mode A: Agent gets Markdown documentation + file search tools

  • Mode B: Agent gets the same knowledge as a structured ontology + Dolt MCP tools

48 sessions. 8 task types. 3 runs per task per mode. Two independent LLM judges (GPT-5.4 and Claude Opus 4.5) evaluated every answer against ground truth.

Results

Metric Markdown + RAG Right Reasons Delta

Entity recall 0.514 0.976 +90%

"Why?" question recall 0.000 1.000 0% → 100%

Reasoning quality (1-5) 1.96 4.33 +121%

Stability (variance) 1.457 0.472 3× more stable

Latency 284.6s 183.8s 35% faster

Pairwise wins 0 20 (4 ties)

The "why?" result is the headline: Mode A scored 0.0 entity recall across all 6 runs on reasoning questions. Not low — zero. Mode B scored 1.0 across all 6 runs. This isn't statistical noise. It's a deterministic gap.

The conventional assumption is that structured retrieval is a tradeoff — better recall but more overhead and higher latency. This experiment showed the opposite: the structured approach was simultaneously more accurate, faster, more stable, and more compact in its answers.

Judge agreement was 83.3%. Average judge confidence was 0.927. The only disagreements were on impact analysis tasks where multiple valid reasoning paths existed.

What we didn't prove (honestly)

  • Ingest: Getting business knowledge into the ontology was manual. This is the hardest unsolved problem.

  • Write path: We only tested reading. Agents proposing ontology changes is designed but not yet benchmarked.

  • Generalization: Tested on one domain (dev planning). Other domains are next.

How knowledge enters the ontology: EPICAL

We're not expecting anyone to manually populate SQL tables. The designed ingest pipeline is called EPICAL:

Source docs → EXTRACT → PONDER → INTERROGATE → CALIBRATE → AUTHENTICATE → LOAD

Enter fullscreen mode

Exit fullscreen mode

The first two stages (Extract and Ponder) are agent-driven — the AI proposes candidate objects and relationships from source documents. Interrogate and Calibrate refine confidence. Authenticate is the human gate — a Dolt diff review, just like a code PR. Only after human approval does knowledge become canonical.

The epistemic boundary is strict: an agent cannot bypass human validation. The promote_candidate tool requires authenticated status.

OPS Contracts: reasoning envelopes for external work

One more concept worth mentioning. When work happens in external systems (Jira, GitHub, CI/CD), the agent can generate an OPS Contract — a reasoning envelope that attaches institutional context to a work item:

generate_ops_contract(  external_work_ref="jira://TASK-123",  description="Prepare annual report for submission",  contract_kind="annual_reporting" )

Enter fullscreen mode

Exit fullscreen mode

The contract tells the executing agent why this task matters, what rules apply, and which boundaries must not be crossed — without the agent needing to query the full ontology itself.

Try it

The full repo is open source:

git clone https://github.com/Right-Reasons/right-reasons cd right-reasons docker compose up -d cd mcp-server && pip install -e .

Enter fullscreen mode

Exit fullscreen mode

Connect your agent, then ask:

"Why did we choose Provider A over Provider B for identity? Use the get_explanation_packet tool with object ID ex_ops_02."

The agent will trace the full reasoning chain across all four layers.

  • 📦 GitHub repo

  • 🌐 Website

  • 📝 Background article by Kaspar Loit

  • 📊 Full experiment results

Right Reasons is built by MindWorks Industries. We're looking for early users who want to give their AI agents actual institutional reasoning. Reach out at [email protected].

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudemodelbenchmark

Knowledge Map

Knowledge Map
TopicsEntitiesSource
We tested s…claudemodelbenchmarkversionopen sourceserviceDev.to AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 138 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models