Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents
arXiv:2603.26233v1 Announce Type: new Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that exp — Nicholas Edwards, Sebastian Schuster
View PDF HTML (experimental)
Abstract:As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that explicitly decouples underspecification detection from code execution. Our results demonstrate that this multi-agent system using OpenHands + Claude Sonnet 4.5 achieves a 69.40% task resolve rate, significantly outperforming a standard single-agent setup (61.20%) and closing the performance gap with agents operating on fully specified instructions. Furthermore, we find that the multi-agent system exhibits well-calibrated uncertainty, conserving queries on simple tasks while proactively seeking information on more complex issues. These findings indicate that current models can be turned into proactive collaborators, where agents independently recognize when to ask questions to elicit missing information in real-world, underspecified tasks.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2603.26233 [cs.CL]
(or arXiv:2603.26233v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.26233
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Nicholas Edwards [view email] [v1] Fri, 27 Mar 2026 09:56:26 UTC (158 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv![[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-neural-network-P6fqXULWLNUwjuxqUZnB3T.webp)
[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)
We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video inpainting / object removal methods can fill in pixels behind an object (e.g., removing shadows or reflections), but they often fail when the removed object affects the dynamics of the scene. For example: - A domino chain is falling → removing the middle blocks should stop the chain - Two cars are about to crash → removing one car should prevent the collision Current models typically remove the object but leave its effects unchanged, resulting in physically implausible outputs. VOID addresses this by modeling counterfactual scene evolution: “What would the video look like if the object had never been there?” Key ideas: - Counterfactual training data: paire

Rivalry and collaboration attitudes: Study finds writers need both to thrive in the age of AI
When a screenwriter told New York University researchers last year that letting AI do her work would make her "miserable inside," she was onto something. A follow-up study from NYU s Tandon School of Engineering and Stern School of Business finds that the instinct to compete with generative AI, rather than simply embrace it, is associated with meaningful long-term benefits for writing professionals.
![[D] Reviewer said he will increase his score but he hasn’t (yet)](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-neural-network-P6fqXULWLNUwjuxqUZnB3T.webp)
[D] Reviewer said he will increase his score but he hasn’t (yet)
Maybe someone here can help me figure this out. I have a reviewer who acknowledged my rebuttal and said they will increase their score*, but they haven’t. Their score is still 4, which was the initial score. Now I am very anxious about the AC reading this and thinking that they increased their score to 4 from a 3 ( meaning their initial thought was reject) because the other person who acknowledged and said they will increase their score did it on the spot at the same time, and I can see the updated score, but the other said they will but didn’t, and now I fear it will look like they did and that the 4 is the updated score ( meaning the initial score was a reject). I can answer to the rebuttal ( they said option A, fully resolved). I wonder if in my answer I should hint that they have yet t
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Rivalry and collaboration attitudes: Study finds writers need both to thrive in the age of AI
When a screenwriter told New York University researchers last year that letting AI do her work would make her "miserable inside," she was onto something. A follow-up study from NYU s Tandon School of Engineering and Stern School of Business finds that the instinct to compete with generative AI, rather than simply embrace it, is associated with meaningful long-term benefits for writing professionals.
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data
DynaVid addresses limitations in video diffusion models by using synthetic motion data represented as optical flow to improve realistic video synthesis with dynamic motions and fine-grained motion control. (2 upvotes on HuggingFace)

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!