Research Papers research paper arxiv nlp language-models

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

arXivMarch 30, 202610 min read0 views

arXiv:2603.26233v1 Announce Type: new Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that exp — Nicholas Edwards, Sebastian Schuster

View PDF HTML (experimental)

Abstract:As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software engineering, they frequently encounter underspecified instructions that lack crucial context. While human developers naturally resolve underspecification by asking clarifying questions, current agents are largely optimized for autonomous execution. In this work, we systematically evaluate the clarification-seeking abilities of LLM agents on an underspecified variant of SWE-bench Verified. We propose an uncertainty-aware multi-agent scaffold that explicitly decouples underspecification detection from code execution. Our results demonstrate that this multi-agent system using OpenHands + Claude Sonnet 4.5 achieves a 69.40% task resolve rate, significantly outperforming a standard single-agent setup (61.20%) and closing the performance gap with agents operating on fully specified instructions. Furthermore, we find that the multi-agent system exhibits well-calibrated uncertainty, conserving queries on simple tasks while proactively seeking information on more complex issues. These findings indicate that current models can be turned into proactive collaborators, where agents independently recognize when to ask questions to elicit missing information in real-world, underspecified tasks.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.26233 [cs.CL]

(or arXiv:2603.26233v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26233

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Nicholas Edwards [view email] [v1] Fri, 27 Mar 2026 09:56:26 UTC (158 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26233

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video inpainting / object removal methods can fill in pixels behind an object (e.g., removing shadows or reflections), but they often fail when the removed object affects the dynamics of the scene. For example: - A domino chain is falling → removing the middle blocks should stop the chain - Two cars are about to crash → removing one car should prevent the collision Current models typically remove the object but leave its effects unchanged, resulting in physically implausible outputs. VOID addresses this by modeling counterfactual scene evolution: “What would the video look like if the object had never been there?” Key ideas: - Counterfactual training data: paire

Reddit r/MachineLearning

2mabout 2 hours ago

Research PapersLive

Rivalry and collaboration attitudes: Study finds writers need both to thrive in the age of AI

When a screenwriter told New York University researchers last year that letting AI do her work would make her "miserable inside," she was onto something. A follow-up study from NYU s Tandon School of Engineering and Stern School of Business finds that the instinct to compete with generative AI, rather than simply embrace it, is associated with meaningful long-term benefits for writing professionals.

TechXplore AI

1mabout 1 hour ago

ReleasesLive

[D] Reviewer said he will increase his score but he hasn’t (yet)

Maybe someone here can help me figure this out. I have a reviewer who acknowledged my rebuttal and said they will increase their score*, but they haven’t. Their score is still 4, which was the initial score. Now I am very anxious about the AC reading this and thinking that they increased their score to 4 from a 3 ( meaning their initial thought was reject) because the other person who acknowledged and said they will increase their score did it on the spot at the same time, and I can see the updated score, but the other said they will but didn’t, and now I fear it will look like they did and that the 4 is the updated score ( meaning the initial score was a reject). I can answer to the rebuttal ( they said option A, fully resolved). I wonder if in my answer I should hint that they have yet t

Reddit r/MachineLearning

2mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 202 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Rivalry and collaboration attitudes: Study finds writers need both to thrive in the age of AI

TechXplore AI

1mabout 1 hour ago

Research PapersFresh

Switzerland hosts 'CERN of semiconductor research'

Article URL: https://www.swissinfo.ch/eng/swiss-ai/switzerland-hosts-cern-of-semiconductor-research/91015332 Comments URL: https://news.ycombinator.com/item?id=47624879 Points: 16 # Comments: 2

Hacker News Top

5mabout 2 hours ago

Research PapersRecent

DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

DynaVid addresses limitations in video diffusion models by using synthetic motion data represented as optical flow to improve realistic video synthesis with dynamic motions and fine-grained motion control. (2 upvotes on HuggingFace)

HuggingFace Papers

2m1 day ago

Research PapersRecent

T5Gemma-TTS Technical Report

Encoder-decoder codec language model with cross-attention and PM-RoPE achieves improved voice cloning and duration control for multilingual speech synthesis. (2 upvotes on HuggingFace)

HuggingFace Papers

2m1 day ago