MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies
MEDOPENCLAW and MEDFLOWBENCH enable evaluation of vision-language models in medical imaging by allowing dynamic interaction with 3D medical volumes through standard viewers, revealing limitations in spatial reasoning when accessing professional tools. (0 upvotes on HuggingFace)
Published on Mar 25
·
Submitted by
liu
on Mar 30
Authors:
,
,
,
,
,
,
,
,
,
Abstract
MEDOPENCLAW and MEDFLOWBENCH enable evaluation of vision-language models in medical imaging by allowing dynamic interaction with 3D medical volumes through standard viewers, revealing limitations in spatial reasoning when accessing professional tools.
AI-generated summary
Currently, evaluating vision-language models (VLMs) in medical imaging tasks oversimplifies clinical reality by relying on pre-selected 2D images that demand significant manual labor to curate. This setup misses the core challenge of realworld diagnostics: a true clinical agent must actively navigate full 3D volumes across multiple sequences or modalities to gather evidence and ultimately support a final decision. To address this, we propose MEDOPENCLAW, an auditable runtime designed to let VLMs operate dynamically within standard medical tools or viewers (e.g., 3D Slicer). On top of this runtime, we introduce MEDFLOWBENCH, a full-study medical imaging benchmark covering multi-sequence brain MRI and lung CT/PET. It systematically evaluates medical agentic capabilities across viewer-only, tool-use, and open-method tracks. Initial results reveal a critical insight: while state-of-the-art LLMs/VLMs (e.g., Gemini 3.1 Pro and GPT-5.4) can successfully navigate the viewer to solve basic study-level tasks, their performance paradoxically degrades when given access to professional support tools due to a lack of precise spatial grounding. By bridging the gap between static-image perception and interactive clinical workflows, MEDOPENCLAW and MEDFLOWBENCH establish a reproducible foundation for developing auditable, full-study medical imaging agents.
View arXiv page View PDF Project page Add to collection
Get this paper in your agent:
hf papers read 2603.24649
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2603.24649 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2603.24649 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.24649 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQelg3M0U0azc4TENIb2NHX09Ea1AtczN5T3ptb0lBS0g0MXdsbjBYVWNTc3RmZU1pQm5USjI0WWRNZjhGaVRtdmhhU01qQ1ZnUEZQN3B3QVFxek5BeWdRLU5EeDlJSEw3blJIOGVTSDR2dVl2RHpFTmd1dEpYdElxbmFNM1UyTzAxTm1wQmJOTk10ZE80VFgxVGJYUGdTbXFCa041VVhvZmVHLWMxTDVHaDlFdE8tSjIzVTZLY2dpVzlYRUROZ1JLMUhscFluQU44Y3ZKbDN0ZHUyeGpVNU5aTGtSaF9pM0YwVG1sd3p6S0V6OVc0WGZPQk1qOGY2UU5MUkJ6MHA5SmlaLUtURU5tQzFXZ2hVSnRNTHM3UWl5QmxYRkJiNDJkd1VYUFBWeG1mZFNEb0JtQl9SWUFwTU9IVnlfZWVLeTRTU25IZDRJM1pVQ3F1eFRIV1o0NUVveW8xRjFzNVQyQkdFOU5xdFhqZ0F3S3VJMHNNZHBPVEE1eUpTVTA3QUp3WFZKMk9CeDJUVWwyOWZBUDJkelpOQl9laUQ2QjVYRW1iYUU3OW1LMkRMSDJWQlRKRw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Penn State Extension AI tool, Tilva, expands access to research-based guidance - The Pennsylvania State University
<a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxOX0prRHBaY0x3cnNKM3RnR3BuTmlBeW1xRE1wNFlMdEpZM2d2Uk9EMEM4MU5ONnkyNDVDbm9oYjVxRDNTZjZ2NzF3VUJvTWpsU2k2a1EtRDVaZjI5X3U2SEJraG4tN0JCLU4xaThNa2FtYnFZU0pSSkNkaGRYdEpaVlZYbXlmMUF4VzFkcHQtM052eE5sVG9wODA1dDRGUlNrWFRZenRmRU1DckNHNUg5blhCc0Jnby1Z?oc=5" target="_blank">Penn State Extension AI tool, Tilva, expands access to research-based guidance</a> <font color="#6f6f6f">The Pennsylvania State University</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!