Research Papers research paper arxiv ai artificial-intelligence

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

arXivMarch 31, 202610 min read0 views

arXiv:2603.28387v1 Announce Type: new Abstract: Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, w — Doan Nam Long Vu, Simone Balloccu

View PDF HTML (experimental)

Abstract:Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely \emph{mentioning} MRI availability in the task prompt accounts for 70-80% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the \emph{scaffold effect}. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, while eliminating MRI-referencing behavior, collapses both conditions toward random baseline. Our findings demonstrate that surface evaluations are inadequate indicators of multimodal reasoning, with direct implications for the deployment of VLMs in clinical settings.

Subjects:

Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2603.28387 [cs.AI]

(or arXiv:2603.28387v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.28387

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Doan Nam Long Vu [view email] [v1] Mon, 30 Mar 2026 12:58:10 UTC (2,077 KB)

Original source

arXiv

https://arxiv.org/abs/2603.28387

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ReleasesFresh

Reachability-Aware Time Scaling for Path Tracking

arXiv:2604.00439v1 Announce Type: new Abstract: This paper studies tracking of collision-free waypoint paths produced by an offline planner for a planar double-integrator system with bounded speed and acceleration. Because sampling-based planners must route around obstacles, the resulting waypoint paths can contain sharp turns and high-curvature regions, so one-step reachability under acceleration limits becomes critical even when the path geometry is collision-free. We build on a pure-pursuit-style, reachability-guided quadratic-program (QP) tracker with a one-step acceleration margin. Offline, we evaluate this margin along a spline fitted to the waypoint path and update a scalar speed-scaling profile so that the required one-step acceleration remains below the available bound. Online, th

arXiv cs.RO

1mabout 5 hours ago

Research PapersFresh

Sampling-based Task and Kinodynamic Motion Planning under Semantic Uncertainty

arXiv:2604.00401v1 Announce Type: new Abstract: This paper tackles the problem of integrated task and kinodynamic motion planning in uncertain environments. We consider a robot with nonlinear dynamics tasked with a Linear Temporal Logic over finite traces ($\ltlf$) specification operating in a partially observable environment. Specifically, the uncertainty is in the semantic labels of the environment. We show how the problem can be modeled as a Partially Observable Stochastic Hybrid System that captures the robot dynamics, $\ltlf$ task, and uncertainty in the environment state variables. We propose an anytime algorithm that takes advantage of the structure of the hybrid system, and combines the effectiveness of decision-making techniques and sampling-based motion planning. We prove the sou

arXiv cs.RO

1mabout 5 hours ago

ModelsFresh

Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data

arXiv:2604.00391v1 Announce Type: new Abstract: Diffusion-based trajectory optimization has emerged as a powerful planning paradigm, but existing methods require either learned score networks trained on large datasets or analytical dynamics models for score computation. We introduce \emph{Behavioral Score Diffusion} (BSD), a training-free and model-free trajectory planner that computes the diffusion score function directly from a library of trajectory data via kernel-weighted estimation. At each denoising step, BSD retrieves relevant trajectories using a triple-kernel weighting scheme -- diffusion proximity, state context, and goal relevance -- and computes a Nadaraya-Watson estimate of the denoised trajectory. The diffusion noise schedule naturally controls kernel bandwidths, creating a m

arXiv cs.RO

2mabout 5 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 166 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Sampling-based Task and Kinodynamic Motion Planning under Semantic Uncertainty

arXiv cs.RO

1mabout 5 hours ago

Research PapersFresh

Real Time Local Wind Inference for Robust Autonomous Navigation

arXiv:2604.00343v1 Announce Type: new Abstract: This thesis presents a solution that enables aerial robots to reason about surrounding wind flow fields in real time using on board sensors and embedded flight hardware. The core novelty of this research is the fusion of range measurements with sparse in situ wind measurements to predict surrounding flow fields. We aim to address two fundamental questions: first, the sufficiency of topographical data for accurate wind prediction in dense urban environments; and second, the utility of learned wind models for motion planning with an emphasis on energy efficiency and obstacle avoidance. Drawing on tools from deep learning, fluid mechanics, and optimal control, we establish a framework for local wind prediction using navigational LiDAR, and then

arXiv cs.RO

2mabout 5 hours ago

Research PapersFresh

Play-Testing REMind: Evaluating an Educational Robot-Mediated Role-Play Game

arXiv:2604.00300v1 Announce Type: new Abstract: This paper presents REMind, an innovative educational robot-mediated role-play game designed to support anti-bullying bystander intervention among children. REMind invites players to observe a bullying scenario enacted by social robots, reflect on the perspectives of the characters, and rehearse defending strategies by puppeteering a robotic avatar. We evaluated REMind through a mixed-methods play-testing study with 18 children aged 9--10. The findings suggest that the experience supported key learning goals related to self-efficacy, perspective-taking, understanding outcomes of defending, and intervention strategies. These results highlight the promise of Robot-Mediated Applied Drama (RMAD) as a novel pedagogical framework to support Social-

arXiv cs.RO

1mabout 5 hours ago

Research PapersFresh

Long-Horizon Geometry-Aware Navigation among Polytopes via MILP-MPC and Minkowski-Based CBFs

arXiv:2604.00162v1 Announce Type: new Abstract: Autonomous navigation in complex, non-convex environments remains challenging when robot dynamics, control limits, and exact robot geometry must all be taken into account. In this paper, we propose a hierarchical planning and control framework that bridges long-horizon guidance and geometry-aware safety guarantees for a polytopic robot navigating among polytopic obstacles. At the high level, Mixed-Integer Linear Programming (MILP) is embedded within a Model Predictive Control (MPC) framework to generate a nominal trajectory around polytopic obstacles while modeling the robot as a point mass for computational tractability. At the low level, we employ a control barrier function (CBF) based on the exact signed distance in the Minkowski-differenc

arXiv cs.RO

1mabout 5 hours ago