Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHackers Are Posting the Claude Code Leak With Bonus MalwareWired AIEnthusiast installs Win 3.1X on bare metal Ryzen 9 9900X and RTX 5060 Ti system using floppy disk drive — Asus motherboard’s ‘classic BIOS’ functionality was instrumental to the feattomshardware.comPowering Down Enterprises Tackle AI’s Soaring Energy CostsDev.to AIIs Micron the New Nvidia? - The Motley FoolGNews AI NVIDIAFrom Guesswork to Growth: AI-Driven Analytics for Grant WritingDev.to AILost Warship From Battle of Copenhagen Found After 225 YearsGizmodoThese One-of-a-Kind Objects Are in the Wrong MuseumsGizmodoNew 'GeForge' and 'GDDRHammer' attacks can fully infiltrate your system through Nvidia's GPU memory — Rowhammer attacks in GPUs force bit flips in protected VRAM regions to gain read/write accesstomshardware.comSoftware-update - FairScan 1.18.0Tweakers.netGPUs vs. TPUs: Decoding the Powerhouses of AIHacker News AI TopAnthropic drops OpenClaw support amid Claude overload - News.azGoogle News: ClaudeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHackers Are Posting the Claude Code Leak With Bonus MalwareWired AIEnthusiast installs Win 3.1X on bare metal Ryzen 9 9900X and RTX 5060 Ti system using floppy disk drive — Asus motherboard’s ‘classic BIOS’ functionality was instrumental to the feattomshardware.comPowering Down Enterprises Tackle AI’s Soaring Energy CostsDev.to AIIs Micron the New Nvidia? - The Motley FoolGNews AI NVIDIAFrom Guesswork to Growth: AI-Driven Analytics for Grant WritingDev.to AILost Warship From Battle of Copenhagen Found After 225 YearsGizmodoThese One-of-a-Kind Objects Are in the Wrong MuseumsGizmodoNew 'GeForge' and 'GDDRHammer' attacks can fully infiltrate your system through Nvidia's GPU memory — Rowhammer attacks in GPUs force bit flips in protected VRAM regions to gain read/write accesstomshardware.comSoftware-update - FairScan 1.18.0Tweakers.netGPUs vs. TPUs: Decoding the Powerhouses of AIHacker News AI TopAnthropic drops OpenClaw support amid Claude overload - News.azGoogle News: Claude
AI NEWS HUBbyEIGENVECTOREigenvector

CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning

arXivby [Submitted on 19 Dec 2025 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read1 views
Source Quiz

arXiv:2512.17312v2 Announce Type: replace Abstract: Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orc — Qi Song, Honglin Li, Yingchen Yu, Haoyi Zhou, Lin Yang, Song Bai, Qi She, Zilong Huang, Yunqing Zhao

View PDF HTML (experimental)

Abstract:Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines tool use with stepwise verification, yet most open-source approaches still rely on text-only chains, rigid visual schemas, or single-step pipelines, limiting flexibility, interpretability, and transferability on complex tasks. We introduce CodeDance, which explores executable code as a general solver for visual reasoning. Unlike fixed-schema calls (e.g., only predicting bounding-box coordinates), CodeDance defines, composes, and executes code to orchestrate multiple tools, compute intermediate results, and render visual artifacts (e.g., boxes, lines, plots) that support transparent, self-checkable reasoning. To guide this process, we introduce a reward for balanced and adaptive tool calling, which balances exploration with efficiency and mitigates tool overuse. Interestingly, beyond the expected capabilities taught by atomic supervision, we empirically observe novel emergent behaviors during RL training: CodeDance demonstrates novel tool invocations, unseen compositions, and cross-task transfer. These behaviors arise without task-specific fine-tuning, suggesting a general and scalable mechanism for executable visual reasoning. Extensive experiments across reasoning benchmarks (e.g., visual search, math, chart QA) show that CodeDance not only consistently outperforms schema-driven and text-only baselines, but also surpasses closed models such as GPT-4o and larger open-source models.

Comments: CVPR 2026. Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2512.17312 [cs.CV]

(or arXiv:2512.17312v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.17312

arXiv-issued DOI via DataCite

Submission history

From: Honglin Li [view email] [v1] Fri, 19 Dec 2025 07:52:23 UTC (33,386 KB) [v2] Wed, 1 Apr 2026 07:43:19 UTC (27,950 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
CodeDance: …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 156 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers