Research Papers research paper arxiv ai artificial-intelligence

ComBench: A Repo-level Real-world Benchmark for Compilation Error Repair

arXivMarch 31, 202610 min read0 views

arXiv:2603.27333v1 Announce Type: cross Abstract: Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To b — Jia Li, Zeyang Zhuang, Zhuangbin Chen, Yuxin Su, Wei Meng, Michael R. Lyu

View PDF HTML (experimental)

Abstract:Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To bridge this critical gap, we propose ComBench, the first repository-level, reproducible real-world benchmark for C/C++ compilation error repair. ComBench is constructed through a novel, automated framework that systematically mines real-world failures from the GitHub CI histories of large-scale open-source projects. Our framework contributes techniques for the high-precision identification of ground-truth repair patches from complex version histories and a high-fidelity mechanism for reproducing the original, ephemeral build environments. To ensure data quality, all samples in ComBench are execution-verified -- guaranteeing reproducible failures and build success with ground-truth patches. Using ComBench, we conduct a comprehensive evaluation of 12 modern LLMs under both direct and agent-based repair settings. Our experiments reveal a significant gap between a model's ability to achieve syntactic correctness (a 73% success rate for GPT-5) and its ability to ensure semantic correctness (only 41% of its patches are valid). We also find that different models exhibit distinct specializations for different error types. ComBench provides a robust and realistic platform to guide the future development of ACER techniques capable of addressing the complexities of modern software development.

Subjects:

Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.27333 [cs.SE]

(or arXiv:2603.27333v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.27333

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jia Li [view email] [v1] Sat, 28 Mar 2026 16:35:34 UTC (281 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27333

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Countries

European Union Outlines Strategies to Boost AI Adoption, Research - WSJ

European Union Outlines Strategies to Boost AI Adoption, Research WSJ

GNews AI EU

1m6 months ago

Countries

European Union Outlines Strategies to Boost AI Adoption, Research - WSJ

European Union Outlines Strategies to Boost AI Adoption, Research WSJ

GNews AI EU

1m6 months ago

Releases

Obsolescence without hostility: optimization, uniformity, and the erosion of human meaning in a post-AI world

Most contemporary discussions of artificial intelligence focus on misalignment, loss of control, or catastrophic harm. This paper examines a different and comparatively neglected possibility: that advanced AI may erode the social conditions under which human meaning has historically been generated, without conflict, coercion, or displacement. The central question is not whether AI dominates humanity, but whether human participation remains causally significant once AI systems outperform humans across core instrumental domains. The argument is conditional and long-horizon in scope. It proceeds from the observation that existing limits on AI superiority are primarily technological and economic rather than principled. If these constraints are progressively overcome, and AI systems come to out

AI & Society Journal

7m3 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 148 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches - gnet-research.org

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches gnet-research.org

GNews AI voice

1m4 months ago

Research PapersFresh

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

arXiv:2604.01756v1 Announce Type: new Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and

arXiv cs.RO

2mabout 7 hours ago

Research PapersFresh

3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements

arXiv:2604.01703v1 Announce Type: new Abstract: Realizing relative localization by leveraging inter-robot local measurements is a challenging problem, especially in the presence of measurement noise. Motivated by this challenge, in this paper we propose a novel and systematic 3-D relative localization framework based on inter-robot interior angle and self-displacement measurements. Initially, we propose a linear relative localization theory comprising a distributed linear relative localization algorithm and sufficient conditions for localizability. According to this theory, robots can determine their neighbors' relative positions and orientations in a purely linear manner. Subsequently, in order to deal with measurement noise, we present an advanced Maximum a Posterior (MAP) estimator by a

arXiv cs.RO

2mabout 7 hours ago

Research PapersFresh

Coupler Position Optimization and Channel Estimation for Flexible Coupler Antenna Aided Multiuser Communication

arXiv:2602.11319v2 Announce Type: replace-cross Abstract: In this paper, we propose a distributed flexible coupler antenna (FCA) array to enhance communication performance with low hardware cost. At each FCA, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FCA is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FCA-aided multiuser multiple-input multiple-output (MIMO) system, where an FCA array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of us

arXiv eess.SP

2mabout 7 hours ago