Products claude model benchmark version product platform

AgentManifest: A Declarative Spec Where the Harness Is the First-Class Decision

Dev.to AIby MouseRiderApril 3, 20269 min read1 views

RFC v0.3 — design proposal, not a shipping product. CC0 licensed. Feedback and critique welcome. GitHub: MouseRider/agentmanifest-rfc When you run AI agents across more than one role, the execution environment turns out to matter more than it first appears. The model gets most of the attention — benchmarks, leaderboards, capability comparisons — but the harness shapes runtime behavior in ways that model selection alone doesn’t account for. A personal assistant, an ops monitor, a coding agent, a trading bot: these aren’t the same agent with different prompts. They need different memory models, different autonomy levels, different guardrail enforcement, different lifecycle behaviors. Current agent harnesses are mostly either finished platforms you adopt wholesale, or open-ended toolkits that

RFC v0.3 — design proposal, not a shipping product. CC0 licensed. Feedback and critique welcome. GitHub: MouseRider/agentmanifest-rfc

When you run AI agents across more than one role, the execution environment turns out to matter more than it first appears. The model gets most of the attention — benchmarks, leaderboards, capability comparisons — but the harness shapes runtime behavior in ways that model selection alone doesn’t account for.

A personal assistant, an ops monitor, a coding agent, a trading bot: these aren’t the same agent with different prompts. They need different memory models, different autonomy levels, different guardrail enforcement, different lifecycle behaviors. Current agent harnesses are mostly either finished platforms you adopt wholesale, or open-ended toolkits that reward deep specialisation. There’s no standardised, composable layer in between: a way to declare what an agent needs, select the right harness for its role, and assemble the configuration portably.

AgentManifest is a design proposal for that missing layer.

This is part of an ongoing series on building persistent AI agents. Article 1 covered TSVC — context isolation across topics. Article 2 covered agent epistemology — how an agent knows what it knows. AgentManifest grew out of the same body of work: a production personal assistant running on OpenClaw, and the questions that surface when you push a system like that into real daily use.

The Spec

Dockerfile-like syntax. FROM selects the harness — the primary design decision in any manifest.

# Personal Assistant FROM openclaw:latest

# Personal Assistant FROM openclaw:latest

MODEL claude-opus ROLE personal-assistant

TOOLS browser, email, calendar, file-system, sub-agents MEMORY persistent, cross-session PERSONALITY ./soul.md

GUARDRAILS approval-for-external-sends, budget-cap-daily=5.00 AUTONOMY high HEARTBEAT interval=30m, quiet-hours=23:00-08:00

CHANNELS telegram=in-out, email=in-out, twitter=out SPENDING daily-cap=50.00, per-transaction-cap=20.00 IDENTITY did:web:agents.example.com:assistant

DEPLOY always-on RESTART on-failure`

Enter fullscreen mode

Exit fullscreen mode

# Ops Monitor

Same harness. Completely different agent.

FROM openclaw:latest

MODEL claude-haiku ROLE ops-monitor

TOOLS file-system, ssh, docker, http, alerting MEMORY session-only

GUARDRAILS strict-instructions, no-generative-output, read-only-by-default AUTONOMY medium HEARTBEAT interval=5m

ALERT_CHANNEL telegram-ops-thread ON_ERROR alert-and-retry, max-retries=3

DEPLOY always-on RESOURCES memory=256m`

Enter fullscreen mode

Exit fullscreen mode

Same base harness. Completely different agent. The spec makes the differences explicit, auditable, and portable — without requiring both to fit a single one-size-fits-all runtime.

Swap the harness and the same directives target a different execution environment:

FROM langgraph:latest

or

FROM claude-code:latest

or

FROM crewai:latest`

Enter fullscreen mode

Exit fullscreen mode

Why Harness Selection Belongs in the Spec

Model selection is reasonably well-served by existing tooling — benchmarks, leaderboards, capability comparisons are all mature. Harness selection is less well-served, and it has more influence over runtime behavior than the current tooling reflects.

Here’s a concrete distinction worth making explicit. Writing “always ask for approval before deleting files” in a system prompt is a soft constraint — the model follows it as part of its instruction-following behavior. A deterministic guardrail at the harness level enforces the same rule unconditionally, independent of context length or task complexity. Both are valid approaches; they’re not equivalent, and the choice between them is a meaningful design decision that currently lives in implementation rather than in the agent definition.

Different roles suit different harness configurations:

A coding agent fits Claude Code — git integration, sandboxed terminal, pre-commit guardrails in the infrastructure
A research pipeline fits LangGraph — graph-native execution, defined workflow shape, explicit checkpoints
A personal assistant fits OpenClaw — persistent memory, heartbeat behavior, cross-session continuity, sub-agent delegation (see the TSVC article for what running this in production actually looks like)
A team workflow fits CrewAI — role-based agent structure, structured task handoffs, shared goal propagation

AgentManifest makes that selection explicit and portable. The spec sits above the harness layer — it doesn’t replace harnesses, it selects and configures them.

Three Directives Worth Examining

GUARDRAILS

GUARDRAILS strict-instructions, read-only-by-default, no-external-sends

Enter fullscreen mode

Exit fullscreen mode

Guardrails in AgentManifest are compiled into the harness configuration, not embedded in the prompt. The harness enforces them at the infrastructure level. This is the practical distinction between a behavioral instruction and a behavioral constraint.

IDENTITY

IDENTITY did:web:agents.example.com:purchasing-agent SPENDING daily-cap=500.00, per-transaction-cap=100.00

IDENTITY did:web:agents.example.com:purchasing-agent SPENDING daily-cap=500.00, per-transaction-cap=100.00

Enter fullscreen mode

Exit fullscreen mode

IDENTITY assigns a cryptographic identity — immutable per manifest version, verifiable by external systems. Once identity is verifiable, it becomes the binding point for systems that require an accountable party on the other end of a transaction or access request.

Wallets and payment systems. An agent with a stable cryptographic identity can be issued a spending account scoped to that identity. SPENDING declares the limits; the wallet enforces them at infrastructure level. If something goes wrong, the audit trail is complete: which agent, which manifest version, which guardrails were active, what it spent and when.

OAuth and API credentials. Rather than embedding credentials in config or prompts, the harness can resolve access rights from the agent’s verified identity at runtime. An agent identity can be an OAuth client_id, a service account in Azure AD or AWS IAM, or a member of a permissioned data feed — scoped to that agent specifically, not a shared credential.

Inter-agent trust. In a multi-agent system, a coordinator can verify that the specialist it’s delegating to is genuinely running the manifest it claims — same spec version, same guardrails in force. This connects to the coordinator model described in the TSVC article: one coordinator, many specialists, each independently verifiable.

PROMPT_PROFILE and LOCALE

PROMPT_PROFILE claude-opus LOCALE en-GB

PROMPT_PROFILE claude-opus LOCALE en-GB

Enter fullscreen mode

Exit fullscreen mode

The harness adapts prompt scaffolding to the selected model and language. The spec author doesn’t maintain model-specific variants or locale-specific rewrites. The harness handles that as an implementation detail.

agent-compose: Coordination Above the Single Agent

A single AgentManifest defines a single agent. agent-compose is the layer above — the analog to docker-compose for multi-agent systems. It references individual manifests, defines inter-agent interfaces, and declares the coordination topology.

Hierarchy

The most common pattern. A lead agent delegates to specialists; each specialist runs whatever harness suits its role.

topology: hierarchy

agents: coordinator: manifest: ./coordinator.agentmanifest role: lead researcher: manifest: ./researcher.agentmanifest # FROM langgraph:latest role: specialist coder: manifest: ./coder.agentmanifest # FROM claude-code:latest role: specialist

delegation: coordinator -> [researcher, coder]: protocol: task-dispatch`

Enter fullscreen mode

Exit fullscreen mode

The coordinator doesn’t need to know which harness each specialist uses. Harness heterogeneity is internal to the system.

Council

For high-stakes decisions, a council routes a proposal to a set of agents for independent evaluation before any action is taken. No single agent’s judgment is final.

topology: council

agents: proposer: manifest: ./agents/proposer.agentmanifest council:

manifest: ./agents/compliance-reviewer.agentmanifest
manifest: ./agents/context-checker.agentmanifest
manifest: ./agents/risk-assessor.agentmanifest

council_config: trigger: action-type=financial OR confidence < 0.7 evaluation: independent quorum: all on_rejection: halt-and-alert`

Enter fullscreen mode

Exit fullscreen mode

evaluation: independent matters — agents evaluate without seeing each other’s output first, preventing anchoring.

Consensus

A more flexible variant. Rather than unanimous approval, agents reach a decision through structured agreement with configurable thresholds.

topology: consensus

agents: council:

manifest: ./agents/reviewer-a.agentmanifest weight: 1.0
manifest: ./agents/reviewer-b.agentmanifest weight: 1.0
manifest: ./agents/senior-reviewer.agentmanifest weight: 2.0

consensus_config: method: weighted-majority # options: majority, supermajority, unanimity, weighted-majority threshold: 0.6 on_no_consensus: hold-for-human`

Enter fullscreen mode

Exit fullscreen mode

Useful for moderation decisions, borderline classification cases, or any workflow where structured disagreement should surface before acting. The conditions that trigger a council, the quorum required, and the fallback behavior are all declarable in the spec — not embedded in custom orchestration code.

When council members carry verifiable IDENTITY credentials, the audit trail for a decision includes the verified identity of each participating agent, the manifest version each was running, and the guardrails in force at the time.

Landscape

Oracle Agent Spec docker-agent gitagent AgentManifest

Goal Portability across runtimes Declarative config, one runtime Git-native definition, export anywhere Role-appropriate harness per agent

Harness selection Abstracted away Fixed Adapter-based First-class (FROM)

Behavioral enforcement Framework-dependent Prompt-based RULES.md + compliance config Harness-compiled

Multi-agent Single spec Coordinator model Inheritance + deps agent-compose with topology declarations

Identity / payments Not in scope Not in scope Not in scope First-class directives

Format YAML YAML File system structure Dockerfile-like DSL

Status Shipped Shipped Shipped Design proposal / RFC

On gitagent: it’s worth using today if your goal is git-native agent versioning and framework portability. AgentManifest is working on a different axis — not how to make the runtime invisible, but how to declare it explicitly. The two are potentially complementary: a gitagent repo could reference an AgentManifest to declare its harness requirements.

What This Is and Isn’t

AgentManifest is RFC v0.3. The spec is concrete enough to debate; no implementation exists yet. Validator tooling, a reference harness resolver, and a formal grammar are on the roadmap.

The spec is CC0. I’d genuinely welcome a working group or standards body taking it further — the goal was to get the idea into a form concrete enough to argue with.

Open Questions

A few things the spec doesn’t resolve yet, where input would be useful:

Harness resolver ecosystem. The spec works best if harness maintainers ship their own resolvers. That requires community buy-in that isn’t there yet. How do you bootstrap that?

Inter-agent protocol. agent-compose defines topology; it doesn’t yet commit to a wire protocol for agent-to-agent communication. Candidates on the table: A2A (Google’s agent communication protocol), MCP (Anthropic’s tool protocol, which is seeing increasing use for agent-to-agent calls), or plain HTTP with interfaces declared in the compose file. Each has different tradeoffs around standardisation, harness coupling, and implementation complexity.

Testing and simulation. For safety-critical agents — trading bots, autonomous purchasing agents — dry-run capability seems important. How do you test guardrail firing without live tool execution?

Cross-harness observability. When agents on different harnesses participate in a shared workflow, coherent distributed tracing is an open problem. The spec creates a clear seam where it needs to be solved via the IDENTITY directive; it doesn’t solve it.

Repo

MANIFEST.md — full spec, v0.3
examples/ — AgentManifest files for six agent roles
docs/design-rationale.md — why harness heterogeneity, not portability
docs/agent-compose.md — topology patterns and multi-agent coordination
docs/identity.md — identity model, wallet binding, inter-agent trust

If you’ve run agents across multiple roles in production and have thoughts on where this framing holds or breaks down — open an issue. The RFC is designed to be argued with.

AgentManifest was designed in collaboration with a persistent AI agent running on OpenClaw and through extended conversations with Claude AI (claude.ai). The spec, the repo, and this article are the output of that process — an example of the kind of work the system is designed to support.

Original source

Dev.to AI

https://dev.to/mouserider/agentmanifest-a-declarative-spec-where-the-harness-is-the-first-class-decision-lnc

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudemodelbenchmark

ModelsLive

Does GPT-2 Have a Fear Direction?

Anthropic dropped a paper this morning showing that Claude Sonnet 4.5 has steerable emotion representations. Actual directions in activation space that, when injected, shift the model's behavior in predictable ways. They found a non-monotonic anger flip: push the steering vector hard enough and the model will flip to something qualitatively different than anger. The paper only covered their very large, heavily instruction tuned model. This paper is a write-up on the same same experiment at a tiny scale. The Setup: I generated 40 situational prompt pairs to extract a fewer direction via difference-in-means. No emotional words for the prompts and the contrast is entirely situational. Ex: standing at the edge of a rooftop versus standing at the edge of a meadow, alone in a parking garage at m

lesswrong.com

6m15 minutes ago

ReleasesLive

Two Theories for Cryopreservation

Why cryonics, and the two main methods, with practical discussion and philosophical musings on both. Epistemic status: Cryonics is a scientific field that is long established, yet long underfunded, and uncertain. I’ve been thinking about this on and off for a few years and remain cautiously optimistic. Most people who have ever lived, over 90%, have died, and most information we may need to be able to revive them has also gone. We still live in the era where a single accident or disease can swiftly and permanently end your experience of life. If you value your life, and want to continue to live indefinitely, cryogenic preservation of your body is an obvious thing to consider. Here, I will mostly talk about the two main methods of cryopreservation, with some high-level technical explanation

lesswrong.com

11mabout 1 hour ago

Analyst NewsLive

I thought eight metrics could capture my mental state. I was wrong.

Morning and night, I pronounce "Hey Exo" [1] , and my phone beeps once. I begin describing events and what's going on in my mind – where my attention is, my present feelings, how I slept, what I did that day, and who sleighted me – you know, that kind of stuff ;) Eventually, I begin listing various subjective quantitative measures, "Bipolar index: -1 to 0, Mood: +4, Stress: 3-4, Motivation: 5..." The resulting transcription is parsed by LLM and eventually makes it to a database table that can be plotted. I described the motivation for this and the process in greater detail yesterday. I log eight core metrics: bipolar index, mood, motivation, stress, anxiety, somnolence, % chance of falling asleep, and productivity. On occasion, I log other values such as "instability", tiredness, focus, mu

lesswrong.com

10mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 128 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

AgentManifest: A Declarative Spec Where the Harness Is the First-Class Decision

The Spec

Same harness. Completely different agent.

or

or

Why Harness Selection Belongs in the Spec

Three Directives Worth Examining

GUARDRAILS

IDENTITY

PROMPT_PROFILE and LOCALE

agent-compose: Coordination Above the Single Agent

Hierarchy

Council

Consensus

Landscape

What This Is and Isn’t

Open Questions

Repo

Daily AI Digest

More about

Does GPT-2 Have a Fear Direction?

Two Theories for Cryopreservation

I thought eight metrics could capture my mental state. I was wrong.

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Products

AI startup envisions '100M new people' making videogames

The Silent Revolution: Scaling AI Ethics from “Frontier” to “Fortified” - legalserviceindia.com

Developers Warn: AI-Driven Ambient Programming May Slow Down Apple App Store Reviews - news.aibase.com

The AI That Refuses to Advise, And Why That Changes Everything