Models claude model foundation model benchmark release announce

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

ArXiv CS.AIby Harsh Mankodiya, Chase Gallik, Theodoros Galanos, Andriy MulyarApril 1, 20261 min read0 views

arXiv:2603.29199v1 Announce Type: new Abstract: The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at htt

View PDF HTML (experimental)

Abstract:The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at this https URL under an Apache 2 license.

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29199 [cs.AI]

(or arXiv:2603.29199v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.29199

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Andriy Mulyar [view email] [v1] Tue, 31 Mar 2026 03:10:28 UTC (7,806 KB)

Original source

ArXiv CS.AI

https://arxiv.org/abs/2603.29199

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudemodelfoundation model

ModelsLive

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data about how they train AI models.

Wired AI

4m16 minutes ago

ReleasesLive

Google launches Gemma 4, an enterprise-grade open source AI model set - CIO Dive

Google launches Gemma 4, an enterprise-grade open source AI model set CIO Dive

GNews AI Gemma

1mabout 1 hour ago

Models

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model - WSJ

Exclusive | Meta Is Delaying the Rollout of Its Flagship AI Model WSJ

GNews AI Llama

1m11 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 133 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center - WSJ

Mistral AI Raises $830 Million in Debt For Nvidia-Powered Data Center WSJ

GNews AI Mistral

1m4 days ago

Models

Mistral AI Lands Accenture as Latest Big Client - WSJ

Mistral AI Lands Accenture as Latest Big Client WSJ

GNews AI Mistral

1mabout 1 month ago

ModelsLive

Show HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)

Article URL: https://github.com/Affitor/affiliate-skills Comments URL: https://news.ycombinator.com/item?id=47632530 Points: 1 # Comments: 0

Hacker News AI Top

12m15 minutes ago

ModelsLive

"Cognitive surrender" leads AI users to abandon logical thinking, research finds

Article URL: https://arstechnica.com/ai/2026/04/research-finds-ai-users-scarily-willing-to-surrender-their-cognition-to-llms/ Comments URL: https://news.ycombinator.com/item?id=47632504 Points: 5 # Comments: 0

Hacker News AI Top

2m18 minutes ago