Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI educationBlack Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI education
AI NEWS HUBbyEIGENVECTOREigenvector

ComBench: A Repo-level Real-world Benchmark for Compilation Error Repair

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27333v1 Announce Type: cross Abstract: Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To b — Jia Li, Zeyang Zhuang, Zhuangbin Chen, Yuxin Su, Wei Meng, Michael R. Lyu

View PDF HTML (experimental)

Abstract:Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To bridge this critical gap, we propose ComBench, the first repository-level, reproducible real-world benchmark for C/C++ compilation error repair. ComBench is constructed through a novel, automated framework that systematically mines real-world failures from the GitHub CI histories of large-scale open-source projects. Our framework contributes techniques for the high-precision identification of ground-truth repair patches from complex version histories and a high-fidelity mechanism for reproducing the original, ephemeral build environments. To ensure data quality, all samples in ComBench are execution-verified -- guaranteeing reproducible failures and build success with ground-truth patches. Using ComBench, we conduct a comprehensive evaluation of 12 modern LLMs under both direct and agent-based repair settings. Our experiments reveal a significant gap between a model's ability to achieve syntactic correctness (a 73% success rate for GPT-5) and its ability to ensure semantic correctness (only 41% of its patches are valid). We also find that different models exhibit distinct specializations for different error types. ComBench provides a robust and realistic platform to guide the future development of ACER techniques capable of addressing the complexities of modern software development.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.27333 [cs.SE]

(or arXiv:2603.27333v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.27333

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jia Li [view email] [v1] Sat, 28 Mar 2026 16:35:34 UTC (281 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ComBench: A…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 148 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers