Research Papers research paper arxiv computer-vision image-recognition

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

arXivby [Submitted on 19 Dec 2025 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read1 views

arXiv:2512.17312v2 Announce Type: replace Abstract: Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orc — Qi Song, Honglin Li, Yingchen Yu, Haoyi Zhou, Lin Yang, Song Bai, Qi She, Zilong Huang, Yunqing Zhao

View PDF HTML (experimental)

Abstract:Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orchestrate multiple tools, compute intermediate results, and render visual artifacts (e.g., boxes, lines, plots) that support transparent, self-checkable reasoning. To guide this process, we introduce a reward for balanced and adaptive tool calling, which balances exploration with efficiency and mitigates tool overuse. Interestingly, beyond the expected capabilities taught by atomic supervision, we empirically observe novel emergent behaviors during RL training: CodeDance demonstrates novel tool invocations, unseen compositions, and cross-task transfer. These behaviors arise without task-specific fine-tuning, suggesting a general and scalable mechanism for executable visual reasoning. Extensive experiments across reasoning benchmarks (e.g., visual search, math, chart QA) show that CodeDance not only consistently outperforms schema-driven and text-only baselines, but also surpasses closed models such as GPT-4o and larger open-source models.

Comments: CVPR 2026. Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2512.17312 [cs.CV]

(or arXiv:2512.17312v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.17312

arXiv-issued DOI via DataCite

Submission history

From: Honglin Li [view email] [v1] Fri, 19 Dec 2025 07:52:23 UTC (33,386 KB) [v2] Wed, 1 Apr 2026 07:43:19 UTC (27,950 KB)

Original source

arXiv

https://arxiv.org/abs/2512.17312

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Self-Evolving AIFresh

Ask HN: Learning resources for building AI agents?

I’ve recently gone through several materials, including Antonio Gulli’s AI Agentic Design Patterns, Sam Bhagwat’s Principles of Building AI Agents and Patterns for Building AI Agents, as well as the courses from LangGraph Academy and some content on DataCamp. This space is evolving very quickly, so I’m curious how others here are approaching learning. What resources, courses, papers, or hands-on approaches have you found most useful while building AI agents? Comments URL: https://news.ycombinator.com/item?id=47637083 Points: 2 # Comments: 3

Hacker News AI Top

1mabout 2 hours ago

Research PapersLive

Anthropic makes the case for anthropomorphizing AI in ‘unsettling’ research paper - Mashable

Anthropic makes the case for anthropomorphizing AI in ‘unsettling’ research paper Mashable

Google News: Claude

1mabout 2 hours ago

Countries

AMD CEO to meet Samsung chief in South Korea amid race for AI memory chips, paper says - Reuters

AMD CEO to meet Samsung chief in South Korea amid race for AI memory chips, paper says Reuters

GNews AI Korea

1m24 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Anthropic makes the case for anthropomorphizing AI in ‘unsettling’ research paper - Mashable

Anthropic makes the case for anthropomorphizing AI in ‘unsettling’ research paper Mashable

Google News: Claude

1mabout 2 hours ago

Research PapersFresh

Researchers offer guidance for safer AI-enabled medical devices - Today's Medical Developments

Researchers offer guidance for safer AI-enabled medical devices Today's Medical Developments

GNews AI healthcare

1mabout 5 hours ago

Research PapersLive

Debris from aerial interception strikes Oracle building in Dubai, UAE says

The damage to Oracle's building comes after Iran warned it would target U.S. tech companies operating in the Middle East.

CNBC Technology

1mabout 2 hours ago

Research Papers

Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design

Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, — Oihan Cordelier, Youssef Diouane, Nathalie Bartoli

arXiv

2m5 days ago