AI NEWS HUBbyEIGENVECTOREigenvector

Show HN: agenteval – static analysis for AI coding instruction file

Hacker News AI Topby lukasmetzlerApril 3, 20263 min read1 views
Source Quiz

Article URL: https://github.com/lukasmetzler/agenteval Comments URL: https://news.ycombinator.com/item?id=47632919 Points: 3 # Comments: 0

Your CLAUDE.md is untested. So is your AGENTS.md, your copilot-instructions.md, and your .cursorrules. agenteval is a linter, benchmarker, and CI gate for AI coding instructions. Stop hoping your instructions work. Measure it.

Get Started in 10 Seconds

curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash

Then lint your instruction files:

agenteval lint

No Bun, no Node, no runtime. The binary is self-contained.

Why This Exists

Your codebase has tests. Your APIs have contracts. Your AI instructions have... hope.

Every team using AI coding tools writes instruction files. You change a paragraph, push it, and cross your fingers. Maybe the agent performs better. Maybe you just broke something. You have no way to know.

agenteval gives you that way. Lint catches problems statically. Harvest builds benchmarks from your git history. Run scores agent performance. Compare tells you if your changes helped. CI gates regressions before they merge.

flowchart LR  A["Your CLAUDE.md"] --> B["agenteval lint"]  B --> C["Fix quality issues"]  D["Git history"] --> E["agenteval harvest"]  E --> F["Task YAML files"]  F --> G["agenteval run"]  G --> H["Scored results"]  H --> I["agenteval compare"]  I --> J{{"Did my instructions improve?"}}

style A fill:#2d333b,stroke:#444,color:#e6edf3 style D fill:#2d333b,stroke:#444,color:#e6edf3 style J fill:#1a7f37,stroke:#2ea043,color:#fff`

Loading

What It Catches

  • Dead references to files that don't exist

  • Filler phrases that waste context tokens ("make sure to", "it is important that")

  • Contradictions ("always use X" and "never use X" in the same file)

  • Content overlap between instruction files

  • Token budget overruns that crowd out code context

  • Vague instructions without specifics ("be careful", "write good code")

  • Broken markdown links and heading anchors

  • Invalid skill metadata (per Anthropic spec)

Commands

Command What it does

agenteval lint Find quality issues in instruction files (guide)

agenteval lint --explain Same, but shows why each rule matters

agenteval harvest Build eval tasks from your AI commit history (guide)

agenteval harvest --live Score your working tree changes before committing

agenteval run --task Run an AI agent against a task, score the result (guide)

agenteval compare Compare two runs side by side (guide)

agenteval ci Run all tasks, fail on regressions (guide)

agenteval trends Score history and trend analysis (guide)

agenteval init Create a starter config (guide)

agenteval update Self-update to the latest version

agenteval doctor Check environment health

Supports Every Instruction Format

  • CLAUDE.md (Claude Code)

  • AGENTS.md (OpenAI Codex, generic agents)

  • .github/copilot-instructions.md (GitHub Copilot)

  • .github/instructions/*.instructions.md (scoped Copilot instructions)

  • .claude/skills/*/SKILL.md (Anthropic skills)

  • .cursorrules and .cursor/rules/.mdc (Cursor)

Try it on the included demo files that cover all formats.

Documentation

Guide What it covers

Core Concepts Instructions, tasks, assertions, harnesses, scoring

Getting Started Installation, first run, full walkthrough

Linting All lint rules, output formats, CI integration

Running Evals Task definitions, harness adapters, scoring pipeline

Harvesting AI commit detection, task generation, live review

CI Guide Regression detection, thresholds, GitHub Actions example

Configuration Every config option with types and defaults

Installation

Quick install (Linux, macOS):

curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash

Download binary from GitHub Releases.

Build from source (requires Bun v1.3+):

git clone https://github.com/lukasmetzler/agenteval.git && cd agenteval && bun install && bun run build

Contributing

See CONTRIBUTING.md.

License

MIT

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Show HN: ag…analysisagentgithubHacker News…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 184 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!