AgentManifest: A Declarative Spec Where the Harness Is the First-Class Decision
RFC v0.3 — design proposal, not a shipping product. CC0 licensed. Feedback and critique welcome. GitHub: MouseRider/agentmanifest-rfc When you run AI agents across more than one role, the execution environment turns out to matter more than it first appears. The model gets most of the attention — benchmarks, leaderboards, capability comparisons — but the harness shapes runtime behavior in ways that model selection alone doesn’t account for. A personal assistant, an ops monitor, a coding agent, a trading bot: these aren’t the same agent with different prompts. They need different memory models, different autonomy levels, different guardrail enforcement, different lifecycle behaviors. Current agent harnesses are mostly either finished platforms you adopt wholesale, or open-ended toolkits that
RFC v0.3 — design proposal, not a shipping product. CC0 licensed. Feedback and critique welcome. GitHub: MouseRider/agentmanifest-rfc
When you run AI agents across more than one role, the execution environment turns out to matter more than it first appears. The model gets most of the attention — benchmarks, leaderboards, capability comparisons — but the harness shapes runtime behavior in ways that model selection alone doesn’t account for.
A personal assistant, an ops monitor, a coding agent, a trading bot: these aren’t the same agent with different prompts. They need different memory models, different autonomy levels, different guardrail enforcement, different lifecycle behaviors. Current agent harnesses are mostly either finished platforms you adopt wholesale, or open-ended toolkits that reward deep specialisation. There’s no standardised, composable layer in between: a way to declare what an agent needs, select the right harness for its role, and assemble the configuration portably.
AgentManifest is a design proposal for that missing layer.
This is part of an ongoing series on building persistent AI agents. Article 1 covered TSVC — context isolation across topics. Article 2 covered agent epistemology — how an agent knows what it knows. AgentManifest grew out of the same body of work: a production personal assistant running on OpenClaw, and the questions that surface when you push a system like that into real daily use.
The Spec
Dockerfile-like syntax. FROM selects the harness — the primary design decision in any manifest.
# Personal Assistant FROM openclaw:latest# Personal Assistant FROM openclaw:latestMODEL claude-opus ROLE personal-assistant
TOOLS browser, email, calendar, file-system, sub-agents MEMORY persistent, cross-session PERSONALITY ./soul.md
GUARDRAILS approval-for-external-sends, budget-cap-daily=5.00 AUTONOMY high HEARTBEAT interval=30m, quiet-hours=23:00-08:00
CHANNELS telegram=in-out, email=in-out, twitter=out SPENDING daily-cap=50.00, per-transaction-cap=20.00 IDENTITY did:web:agents.example.com:assistant
DEPLOY always-on RESTART on-failure`
Enter fullscreen mode
Exit fullscreen mode
# Ops Monitor
Same harness. Completely different agent.
FROM openclaw:latest
MODEL claude-haiku ROLE ops-monitor
TOOLS file-system, ssh, docker, http, alerting MEMORY session-only
GUARDRAILS strict-instructions, no-generative-output, read-only-by-default AUTONOMY medium HEARTBEAT interval=5m
ALERT_CHANNEL telegram-ops-thread ON_ERROR alert-and-retry, max-retries=3
DEPLOY always-on RESOURCES memory=256m`
Enter fullscreen mode
Exit fullscreen mode
Same base harness. Completely different agent. The spec makes the differences explicit, auditable, and portable — without requiring both to fit a single one-size-fits-all runtime.
Swap the harness and the same directives target a different execution environment:
FROM langgraph:latest
or
FROM claude-code:latest
or
FROM crewai:latest`
Enter fullscreen mode
Exit fullscreen mode
Why Harness Selection Belongs in the Spec
Model selection is reasonably well-served by existing tooling — benchmarks, leaderboards, capability comparisons are all mature. Harness selection is less well-served, and it has more influence over runtime behavior than the current tooling reflects.
Here’s a concrete distinction worth making explicit. Writing “always ask for approval before deleting files” in a system prompt is a soft constraint — the model follows it as part of its instruction-following behavior. A deterministic guardrail at the harness level enforces the same rule unconditionally, independent of context length or task complexity. Both are valid approaches; they’re not equivalent, and the choice between them is a meaningful design decision that currently lives in implementation rather than in the agent definition.
Different roles suit different harness configurations:
-
A coding agent fits Claude Code — git integration, sandboxed terminal, pre-commit guardrails in the infrastructure
-
A research pipeline fits LangGraph — graph-native execution, defined workflow shape, explicit checkpoints
-
A personal assistant fits OpenClaw — persistent memory, heartbeat behavior, cross-session continuity, sub-agent delegation (see the TSVC article for what running this in production actually looks like)
-
A team workflow fits CrewAI — role-based agent structure, structured task handoffs, shared goal propagation
AgentManifest makes that selection explicit and portable. The spec sits above the harness layer — it doesn’t replace harnesses, it selects and configures them.
Three Directives Worth Examining
GUARDRAILS
GUARDRAILS strict-instructions, read-only-by-default, no-external-sends
Enter fullscreen mode
Exit fullscreen mode
Guardrails in AgentManifest are compiled into the harness configuration, not embedded in the prompt. The harness enforces them at the infrastructure level. This is the practical distinction between a behavioral instruction and a behavioral constraint.
IDENTITY
IDENTITY did:web:agents.example.com:purchasing-agent SPENDING daily-cap=500.00, per-transaction-cap=100.00IDENTITY did:web:agents.example.com:purchasing-agent SPENDING daily-cap=500.00, per-transaction-cap=100.00Enter fullscreen mode
Exit fullscreen mode
IDENTITY assigns a cryptographic identity — immutable per manifest version, verifiable by external systems. Once identity is verifiable, it becomes the binding point for systems that require an accountable party on the other end of a transaction or access request.
Wallets and payment systems. An agent with a stable cryptographic identity can be issued a spending account scoped to that identity. SPENDING declares the limits; the wallet enforces them at infrastructure level. If something goes wrong, the audit trail is complete: which agent, which manifest version, which guardrails were active, what it spent and when.
OAuth and API credentials. Rather than embedding credentials in config or prompts, the harness can resolve access rights from the agent’s verified identity at runtime. An agent identity can be an OAuth client_id, a service account in Azure AD or AWS IAM, or a member of a permissioned data feed — scoped to that agent specifically, not a shared credential.
Inter-agent trust. In a multi-agent system, a coordinator can verify that the specialist it’s delegating to is genuinely running the manifest it claims — same spec version, same guardrails in force. This connects to the coordinator model described in the TSVC article: one coordinator, many specialists, each independently verifiable.
PROMPT_PROFILE and LOCALE
PROMPT_PROFILE claude-opus LOCALE en-GBPROMPT_PROFILE claude-opus LOCALE en-GBEnter fullscreen mode
Exit fullscreen mode
The harness adapts prompt scaffolding to the selected model and language. The spec author doesn’t maintain model-specific variants or locale-specific rewrites. The harness handles that as an implementation detail.
agent-compose: Coordination Above the Single Agent
A single AgentManifest defines a single agent. agent-compose is the layer above — the analog to docker-compose for multi-agent systems. It references individual manifests, defines inter-agent interfaces, and declares the coordination topology.
Hierarchy
The most common pattern. A lead agent delegates to specialists; each specialist runs whatever harness suits its role.
topology: hierarchy
agents: coordinator: manifest: ./coordinator.agentmanifest role: lead researcher: manifest: ./researcher.agentmanifest # FROM langgraph:latest role: specialist coder: manifest: ./coder.agentmanifest # FROM claude-code:latest role: specialist
delegation: coordinator -> [researcher, coder]: protocol: task-dispatch`
Enter fullscreen mode
Exit fullscreen mode
The coordinator doesn’t need to know which harness each specialist uses. Harness heterogeneity is internal to the system.
Council
For high-stakes decisions, a council routes a proposal to a set of agents for independent evaluation before any action is taken. No single agent’s judgment is final.
topology: council
agents: proposer: manifest: ./agents/proposer.agentmanifest council:
- manifest: ./agents/compliance-reviewer.agentmanifest
- manifest: ./agents/context-checker.agentmanifest
- manifest: ./agents/risk-assessor.agentmanifest
council_config: trigger: action-type=financial OR confidence < 0.7 evaluation: independent quorum: all on_rejection: halt-and-alert`
Enter fullscreen mode
Exit fullscreen mode
evaluation: independent matters — agents evaluate without seeing each other’s output first, preventing anchoring.
Consensus
A more flexible variant. Rather than unanimous approval, agents reach a decision through structured agreement with configurable thresholds.
topology: consensus
agents: council:
- manifest: ./agents/reviewer-a.agentmanifest weight: 1.0
- manifest: ./agents/reviewer-b.agentmanifest weight: 1.0
- manifest: ./agents/senior-reviewer.agentmanifest weight: 2.0
consensus_config: method: weighted-majority # options: majority, supermajority, unanimity, weighted-majority threshold: 0.6 on_no_consensus: hold-for-human`
Enter fullscreen mode
Exit fullscreen mode
Useful for moderation decisions, borderline classification cases, or any workflow where structured disagreement should surface before acting. The conditions that trigger a council, the quorum required, and the fallback behavior are all declarable in the spec — not embedded in custom orchestration code.
When council members carry verifiable IDENTITY credentials, the audit trail for a decision includes the verified identity of each participating agent, the manifest version each was running, and the guardrails in force at the time.
Landscape
Oracle Agent Spec docker-agent gitagent AgentManifest
Goal Portability across runtimes Declarative config, one runtime Git-native definition, export anywhere Role-appropriate harness per agent
Harness selection
Abstracted away
Fixed
Adapter-based
First-class (FROM)
Behavioral enforcement Framework-dependent Prompt-based RULES.md + compliance config Harness-compiled
Multi-agent Single spec Coordinator model Inheritance + deps agent-compose with topology declarations
Identity / payments Not in scope Not in scope Not in scope First-class directives
Format YAML YAML File system structure Dockerfile-like DSL
Status Shipped Shipped Shipped Design proposal / RFC
On gitagent: it’s worth using today if your goal is git-native agent versioning and framework portability. AgentManifest is working on a different axis — not how to make the runtime invisible, but how to declare it explicitly. The two are potentially complementary: a gitagent repo could reference an AgentManifest to declare its harness requirements.
What This Is and Isn’t
AgentManifest is RFC v0.3. The spec is concrete enough to debate; no implementation exists yet. Validator tooling, a reference harness resolver, and a formal grammar are on the roadmap.
The spec is CC0. I’d genuinely welcome a working group or standards body taking it further — the goal was to get the idea into a form concrete enough to argue with.
Open Questions
A few things the spec doesn’t resolve yet, where input would be useful:
Harness resolver ecosystem. The spec works best if harness maintainers ship their own resolvers. That requires community buy-in that isn’t there yet. How do you bootstrap that?
Inter-agent protocol. agent-compose defines topology; it doesn’t yet commit to a wire protocol for agent-to-agent communication. Candidates on the table: A2A (Google’s agent communication protocol), MCP (Anthropic’s tool protocol, which is seeing increasing use for agent-to-agent calls), or plain HTTP with interfaces declared in the compose file. Each has different tradeoffs around standardisation, harness coupling, and implementation complexity.
Testing and simulation. For safety-critical agents — trading bots, autonomous purchasing agents — dry-run capability seems important. How do you test guardrail firing without live tool execution?
Cross-harness observability. When agents on different harnesses participate in a shared workflow, coherent distributed tracing is an open problem. The spec creates a clear seam where it needs to be solved via the IDENTITY directive; it doesn’t solve it.
Repo
-
MANIFEST.md — full spec, v0.3
-
examples/ — AgentManifest files for six agent roles
-
docs/design-rationale.md — why harness heterogeneity, not portability
-
docs/agent-compose.md — topology patterns and multi-agent coordination
-
docs/identity.md — identity model, wallet binding, inter-agent trust
If you’ve run agents across multiple roles in production and have thoughts on where this framing holds or breaks down — open an issue. The RFC is designed to be argued with.
AgentManifest was designed in collaboration with a persistent AI agent running on OpenClaw and through extended conversations with Claude AI (claude.ai). The spec, the repo, and this article are the output of that process — an example of the kind of work the system is designed to support.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
claudemodelbenchmark
Does GPT-2 Have a Fear Direction?
Anthropic dropped a paper this morning showing that Claude Sonnet 4.5 has steerable emotion representations. Actual directions in activation space that, when injected, shift the model's behavior in predictable ways. They found a non-monotonic anger flip: push the steering vector hard enough and the model will flip to something qualitatively different than anger. The paper only covered their very large, heavily instruction tuned model. This paper is a write-up on the same same experiment at a tiny scale. The Setup: I generated 40 situational prompt pairs to extract a fewer direction via difference-in-means. No emotional words for the prompts and the contrast is entirely situational. Ex: standing at the edge of a rooftop versus standing at the edge of a meadow, alone in a parking garage at m

Two Theories for Cryopreservation
Why cryonics, and the two main methods, with practical discussion and philosophical musings on both. Epistemic status: Cryonics is a scientific field that is long established, yet long underfunded, and uncertain. I’ve been thinking about this on and off for a few years and remain cautiously optimistic. Most people who have ever lived, over 90%, have died, and most information we may need to be able to revive them has also gone. We still live in the era where a single accident or disease can swiftly and permanently end your experience of life. If you value your life, and want to continue to live indefinitely, cryogenic preservation of your body is an obvious thing to consider. Here, I will mostly talk about the two main methods of cryopreservation, with some high-level technical explanation

I thought eight metrics could capture my mental state. I was wrong.
Morning and night, I pronounce "Hey Exo" [1] , and my phone beeps once. I begin describing events and what's going on in my mind – where my attention is, my present feelings, how I slept, what I did that day, and who sleighted me – you know, that kind of stuff ;) Eventually, I begin listing various subjective quantitative measures, "Bipolar index: -1 to 0, Mood: +4, Stress: 3-4, Motivation: 5..." The resulting transcription is parsed by LLM and eventually makes it to a database table that can be plotted. I described the motivation for this and the process in greater detail yesterday. I log eight core metrics: bipolar index, mood, motivation, stress, anxiety, somnolence, % chance of falling asleep, and productivity. On occasion, I log other values such as "instability", tiredness, focus, mu
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!