CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning
arXiv:2512.17312v2 Announce Type: replace Abstract: Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orc — Qi Song, Honglin Li, Yingchen Yu, Haoyi Zhou, Lin Yang, Song Bai, Qi She, Zilong Huang, Yunqing Zhao
View PDF HTML (experimental)
Abstract:Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orchestrate multiple tools, compute intermediate results, and render visual artifacts (e.g., boxes, lines, plots) that support transparent, self-checkable reasoning. To guide this process, we introduce a reward for balanced and adaptive tool calling, which balances exploration with efficiency and mitigates tool overuse. Interestingly, beyond the expected capabilities taught by atomic supervision, we empirically observe novel emergent behaviors during RL training: CodeDance demonstrates novel tool invocations, unseen compositions, and cross-task transfer. These behaviors arise without task-specific fine-tuning, suggesting a general and scalable mechanism for executable visual reasoning. Extensive experiments across reasoning benchmarks (e.g., visual search, math, chart QA) show that CodeDance not only consistently outperforms schema-driven and text-only baselines, but also surpasses closed models such as GPT-4o and larger open-source models.
Comments: CVPR 2026. Project page: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2512.17312 [cs.CV]
(or arXiv:2512.17312v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2512.17312
arXiv-issued DOI via DataCite
Submission history
From: Honglin Li [view email] [v1] Fri, 19 Dec 2025 07:52:23 UTC (33,386 KB) [v2] Wed, 1 Apr 2026 07:43:19 UTC (27,950 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Ask HN: Learning resources for building AI agents?
I’ve recently gone through several materials, including Antonio Gulli’s AI Agentic Design Patterns, Sam Bhagwat’s Principles of Building AI Agents and Patterns for Building AI Agents, as well as the courses from LangGraph Academy and some content on DataCamp. This space is evolving very quickly, so I’m curious how others here are approaching learning. What resources, courses, papers, or hands-on approaches have you found most useful while building AI agents? Comments URL: https://news.ycombinator.com/item?id=47637083 Points: 2 # Comments: 3
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design
Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, — Oihan Cordelier, Youssef Diouane, Nathalie Bartoli





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!