Show HN: agenteval – static analysis for AI coding instruction file

Hacker News AI Topby lukasmetzlerApril 3, 20263 min read1 views

Article URL: https://github.com/lukasmetzler/agenteval Comments URL: https://news.ycombinator.com/item?id=47632919 Points: 3 # Comments: 0

Your CLAUDE.md is untested. So is your AGENTS.md, your copilot-instructions.md, and your .cursorrules. agenteval is a linter, benchmarker, and CI gate for AI coding instructions. Stop hoping your instructions work. Measure it.

Get Started in 10 Seconds

curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash

Then lint your instruction files:

agenteval lint

No Bun, no Node, no runtime. The binary is self-contained.

Why This Exists

Your codebase has tests. Your APIs have contracts. Your AI instructions have... hope.

Every team using AI coding tools writes instruction files. You change a paragraph, push it, and cross your fingers. Maybe the agent performs better. Maybe you just broke something. You have no way to know.

agenteval gives you that way. Lint catches problems statically. Harvest builds benchmarks from your git history. Run scores agent performance. Compare tells you if your changes helped. CI gates regressions before they merge.

flowchart LR  A["Your CLAUDE.md"] --> B["agenteval lint"]  B --> C["Fix quality issues"]  D["Git history"] --> E["agenteval harvest"]  E --> F["Task YAML files"]  F --> G["agenteval run"]  G --> H["Scored results"]  H --> I["agenteval compare"]  I --> J{{"Did my instructions improve?"}}

flowchart LR  A["Your CLAUDE.md"] --> B["agenteval lint"]  B --> C["Fix quality issues"]  D["Git history"] --> E["agenteval harvest"]  E --> F["Task YAML files"]  F --> G["agenteval run"]  G --> H["Scored results"]  H --> I["agenteval compare"]  I --> J{{"Did my instructions improve?"}}

style A fill:#2d333b,stroke:#444,color:#e6edf3 style D fill:#2d333b,stroke:#444,color:#e6edf3 style J fill:#1a7f37,stroke:#2ea043,color:#fff`

What It Catches

Dead references to files that don't exist
Filler phrases that waste context tokens ("make sure to", "it is important that")
Contradictions ("always use X" and "never use X" in the same file)
Content overlap between instruction files
Token budget overruns that crowd out code context
Vague instructions without specifics ("be careful", "write good code")
Broken markdown links and heading anchors
Invalid skill metadata (per Anthropic spec)

Commands

Command What it does

agenteval lint Find quality issues in instruction files (guide)

agenteval lint --explain Same, but shows why each rule matters

agenteval harvest Build eval tasks from your AI commit history (guide)

agenteval harvest --live Score your working tree changes before committing

agenteval run --task Run an AI agent against a task, score the result (guide)

agenteval compare Compare two runs side by side (guide)

agenteval ci Run all tasks, fail on regressions (guide)

agenteval trends Score history and trend analysis (guide)

agenteval init Create a starter config (guide)

agenteval update Self-update to the latest version

agenteval doctor Check environment health

Supports Every Instruction Format

CLAUDE.md (Claude Code)
AGENTS.md (OpenAI Codex, generic agents)
.github/copilot-instructions.md (GitHub Copilot)
.github/instructions/*.instructions.md (scoped Copilot instructions)
.claude/skills/*/SKILL.md (Anthropic skills)
.cursorrules and .cursor/rules/.mdc (Cursor)

Try it on the included demo files that cover all formats.

Documentation

Guide What it covers

Core Concepts Instructions, tasks, assertions, harnesses, scoring

Getting Started Installation, first run, full walkthrough

Linting All lint rules, output formats, CI integration

Running Evals Task definitions, harness adapters, scoring pipeline

Harvesting AI commit detection, task generation, live review

CI Guide Regression detection, thresholds, GitHub Actions example

Configuration Every config option with types and defaults

Installation

Quick install (Linux, macOS):

curl -fsSL https://raw.githubusercontent.com/lukasmetzler/agenteval/main/install.sh | bash

Download binary from GitHub Releases.

Build from source (requires Bun v1.3+):

git clone https://github.com/lukasmetzler/agenteval.git && cd agenteval && bun install && bun run build

Contributing

See CONTRIBUTING.md.

License

MIT

Original source

Hacker News AI Top

https://github.com/lukasmetzler/agenteval

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

analysisagentgithub

Analyst NewsLive

A Beginner-Friendly Guide to NLP Token Classification: NER, POS Tagging & Chunking Explained

In today’s world, computers need to understand human language to perform tasks like chatbots, search engines, and text analysis. This is… Continue reading on Medium »

Medium AI

1m31 minutes ago

Market NewsFresh

Quoting Kyle Daigle

[GitHub] platform activity is surging. There were 1 billion commits in 2025. Now, it's 275 million per week, on pace for 14 billion this year if growth remains linear (spoiler: it won't.) GitHub Actions has grown from 500M minutes/week in 2023 to 1B minutes/week in 2025, and now 2.1B minutes so far this week. Kyle Daigle , COO, GitHub Tags: github , github-actions

Simon Willison Blog

1mabout 4 hours ago

Open Source AIFresh

trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"

This reverts commit 670be7c . Reverted #179314 on behalf of https://github.com/izaitsevfb due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )

PyTorch Releases

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 184 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Show HN: agenteval – static analysis for AI coding instruction file

Get Started in 10 Seconds

Why This Exists

What It Catches

Commands

Supports Every Instruction Format

Documentation

Installation

Contributing

License

Daily AI Digest

More about

A Beginner-Friendly Guide to NLP Token Classification: NER, POS Tagging & Chunking Explained

Quoting Kyle Daigle

trunk/6c6e22937db24fe8c7b74452a6d3630c65d1c8b8: Revert "Remove TRITON=yes from CPU-only GCC11 docker configs (#179314)"

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Analyst News

MASTER DATA ENGINEERING : PYSPARK + AIRFLOW

A Beginner-Friendly Guide to NLP Token Classification: NER, POS Tagging & Chunking Explained

trunk/eff8c11e842f54a16bd4b4ea6c70ff550ceeea98

Universal restoration of medical images