Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI, Price Theory, and the Future of Economics ResearchHacker News AI TopShow HN: EU Compliance SaaS for Sale ($4K Each) – CBAM, AI Act, Public TendersHacker News AI TopShow HN: Filoxenia – open protocol for human-AI companionshipHacker News AI TopShow HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)Hacker News AI TopMeta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at RiskWired AI"Cognitive surrender" leads AI users to abandon logical thinking, research findsHacker News AI TopUsing multiple AI agents as an architectural review councilHacker News AI TopAES Maximo robot installs 100 megawatts of solar capacityThe Robot Reportb8653llama.cpp Releases5 Backend Concepts You Shouldn’t IgnoreTowards AIAI's Next Frontier: Insights from Jeff Dean and Bill Dally InHacker News AI TopGoogle launches Gemma 4, an enterprise-grade open source AI model set - CIO DiveGNews AI GemmaBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI, Price Theory, and the Future of Economics ResearchHacker News AI TopShow HN: EU Compliance SaaS for Sale ($4K Each) – CBAM, AI Act, Public TendersHacker News AI TopShow HN: Filoxenia – open protocol for human-AI companionshipHacker News AI TopShow HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)Hacker News AI TopMeta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at RiskWired AI"Cognitive surrender" leads AI users to abandon logical thinking, research findsHacker News AI TopUsing multiple AI agents as an architectural review councilHacker News AI TopAES Maximo robot installs 100 megawatts of solar capacityThe Robot Reportb8653llama.cpp Releases5 Backend Concepts You Shouldn’t IgnoreTowards AIAI's Next Frontier: Insights from Jeff Dean and Bill Dally InHacker News AI TopGoogle launches Gemma 4, an enterprise-grade open source AI model set - CIO DiveGNews AI Gemma
AI NEWS HUBbyEIGENVECTOREigenvector

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

ArXiv CS.AIby Harsh Mankodiya, Chase Gallik, Theodoros Galanos, Andriy MulyarApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29199v1 Announce Type: new Abstract: The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at htt

View PDF HTML (experimental)

Abstract:The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at this https URL under an Apache 2 license.

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29199 [cs.AI]

(or arXiv:2603.29199v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.29199

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Andriy Mulyar [view email] [v1] Tue, 31 Mar 2026 03:10:28 UTC (7,806 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
AEC-Bench: …claudemodelfoundation …benchmarkreleaseannounceArXiv CS.AI

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 133 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!