TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba
arXiv:2603.27314v1 Announce Type: new Abstract: Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism. To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that ex — Ziyue Yang, Kaixing Yang, Xulong Tang
View PDF HTML (experimental)
Abstract:Music-to-dance generation has broad applications in virtual reality, dance education, and digital character animation. However, the limited coverage of existing 3D dance datasets confines current models to a narrow subset of music styles and choreographic patterns, resulting in poor generalization to real-world music. Consequently, generated dances often become overly simplistic and repetitive, substantially degrading expressiveness and realism. To tackle this problem, we present TokenDance, a two-stage music-to-dance generation framework that explicitly addresses this limitation through dual-modality tokenization and efficient token-level generation. In the first stage, we discretize both dance and music using Finite Scalar Quantization, where dance motions are factorized into upper and lower-body components with kinematic-dynamic constraints, and music is decomposed into semantic and acoustic features with dedicated codebooks to capture choreography-specific structures. In the second stage, we introduce a Local-Global-Local token-to-token generator built on a Bidirectional Mamba backbone, enabling coherent motion synthesis, strong music-dance alignment, and efficient non-autoregressive inference. Extensive experiments demonstrate that TokenDance achieves overall state-of-the-art (SOTA) performance in both generation quality and inference speed, highlighting its effectiveness and practical value for real-world music-to-dance applications.
Comments: CVPR2026 Workshop on HuMoGen
Subjects:
Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
Cite as: arXiv:2603.27314 [cs.AI]
(or arXiv:2603.27314v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.27314
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Ziyue Yang [view email] [v1] Sat, 28 Mar 2026 15:38:14 UTC (13,856 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
We're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openly
<p>At MWire Labs, we build language technology for Northeast India's indigenous languages - ASR, MT, OCR, LLMs. The region has 200+ languages. Almost none of them exist in mainstream AI datasets.<br> So we're doing something a bit unusual.</p> <p>NortheastGenAI 2026 is a virtual workshop on May 29 where every submission must be AI-generated or AI-assisted - with full disclosure of how. All reviews are AI-assisted too, followed by a human editorial check. Everything is public on OpenReview. Inspired by Agents4Science 2025 (Stanford).</p> <p>We're not claiming AI research is ready. We're asking the question openly and publishing whatever comes out.</p> <p>*<em>Three tracks:<br> *</em><br> Language, Culture & Heritage<br> Society, History & Anthropology<br> AI and Technology for NE In
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Iran’s Revolutionary Guards just named 18 US tech firms as military targets. The age of the civilian data centre is over.
At 8pm Tehran time on Tuesday, a new kind of front line was drawn, not through desert terrain or along a disputed border, but through the server farms, cloud regions, and corporate campuses of America’s largest technology companies. The Islamic Revolutionary Guard Corps published a statement on its official Sepah News channel naming 18 US […] This story continues at The Next Web
Real-time speech-to-speech translation - research.google
<a href="https://news.google.com/rss/articles/CBMid0FVX3lxTFAxeFFhNVhOTjVXeEhXeGFHOXE3WENYeGFISjlpVGNueGtDS2ZZTEVsZHh6dkhLc191aFFYNEpMUUxraV9uTWF6YW1RcF9VTFlIZDBuQTlpbkhBRnJxU1FuTGY4aEtFc2FEaWMxekxUTnlzV3dFN1ow?oc=5" target="_blank">Real-time speech-to-speech translation</a> <font color="#6f6f6f">research.google</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!