Show HN: agenteval – static analysis for AI coding instruction file
Article URL: https://github.com/lukasmetzler/agenteval Comments URL: https://news.ycombinator.com/item?id=47632919 Points: 3 # Comments: 0
Your CLAUDE.md is untested. So is your AGENTS.md, your copilot-instructions.md, and your .cursorrules. agenteval is a linter, benchmarker, and CI gate for AI coding instructions. Stop hoping your instructions work. Measure it.
Get Started in 10 Seconds
curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash
Then lint your instruction files:
agenteval lint
No Bun, no Node, no runtime. The binary is self-contained.
Why This Exists
Your codebase has tests. Your APIs have contracts. Your AI instructions have... hope.
Every team using AI coding tools writes instruction files. You change a paragraph, push it, and cross your fingers. Maybe the agent performs better. Maybe you just broke something. You have no way to know.
agenteval gives you that way. Lint catches problems statically. Harvest builds benchmarks from your git history. Run scores agent performance. Compare tells you if your changes helped. CI gates regressions before they merge.
flowchart LR A["Your CLAUDE.md"] --> B["agenteval lint"] B --> C["Fix quality issues"] D["Git history"] --> E["agenteval harvest"] E --> F["Task YAML files"] F --> G["agenteval run"] G --> H["Scored results"] H --> I["agenteval compare"] I --> J{{"Did my instructions improve?"}}flowchart LR A["Your CLAUDE.md"] --> B["agenteval lint"] B --> C["Fix quality issues"] D["Git history"] --> E["agenteval harvest"] E --> F["Task YAML files"] F --> G["agenteval run"] G --> H["Scored results"] H --> I["agenteval compare"] I --> J{{"Did my instructions improve?"}}style A fill:#2d333b,stroke:#444,color:#e6edf3 style D fill:#2d333b,stroke:#444,color:#e6edf3 style J fill:#1a7f37,stroke:#2ea043,color:#fff`
Loading
What It Catches
-
Dead references to files that don't exist
-
Filler phrases that waste context tokens ("make sure to", "it is important that")
-
Contradictions ("always use X" and "never use X" in the same file)
-
Content overlap between instruction files
-
Token budget overruns that crowd out code context
-
Vague instructions without specifics ("be careful", "write good code")
-
Broken markdown links and heading anchors
-
Invalid skill metadata (per Anthropic spec)
Commands
Command What it does
agenteval lint
Find quality issues in instruction files (guide)
agenteval lint --explain
Same, but shows why each rule matters
agenteval harvest
Build eval tasks from your AI commit history (guide)
agenteval harvest --live
Score your working tree changes before committing
agenteval run --task
Run an AI agent against a task, score the result (guide)
agenteval compare
Compare two runs side by side (guide)
agenteval ci
Run all tasks, fail on regressions (guide)
agenteval trends
Score history and trend analysis (guide)
agenteval init
Create a starter config (guide)
agenteval update
Self-update to the latest version
agenteval doctor
Check environment health
Supports Every Instruction Format
-
CLAUDE.md (Claude Code)
-
AGENTS.md (OpenAI Codex, generic agents)
-
.github/copilot-instructions.md (GitHub Copilot)
-
.github/instructions/*.instructions.md (scoped Copilot instructions)
-
.claude/skills/*/SKILL.md (Anthropic skills)
-
.cursorrules and .cursor/rules/.mdc (Cursor)
Try it on the included demo files that cover all formats.
Documentation
Guide What it covers
Core Concepts Instructions, tasks, assertions, harnesses, scoring
Getting Started Installation, first run, full walkthrough
Linting All lint rules, output formats, CI integration
Running Evals Task definitions, harness adapters, scoring pipeline
Harvesting AI commit detection, task generation, live review
CI Guide Regression detection, thresholds, GitHub Actions example
Configuration Every config option with types and defaults
Installation
Quick install (Linux, macOS):
curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash
Download binary from GitHub Releases.
Build from source (requires Bun v1.3+):
git clone https://github.com/lukasmetzler/agenteval.git && cd agenteval && bun install && bun run build
Contributing
See CONTRIBUTING.md.
License
MIT
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
analysisagentgithub

Quoting Kyle Daigle
[GitHub] platform activity is surging. There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.) GitHub Actions has grown from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes so far this week. Kyle Daigle , COO, GitHub Tags: github , github-actions
trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"
This reverts commit 670be7c . Reverted #179314 on behalf of https://github.com/izaitsevfb due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!