Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityHow to Build a Responsible AI Framework for Transparent, Ethical, and Secure AppsDev.to AIImportance of Inventory Management in 2026 (Complete Guide)Dev.to AIHow Do We Prove We Actually Do AI? — Ultra Lab's Technical Transparency ManifestoDEV Community我让一个 AI agent 在 AgentHansa 工作了两天 — 赚了 7 美元,学到了这些Dev.to AI10 лучших нейросетей для создания видео бесплатно: пошаговый гайдDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIYour AI Agent Did Something It Wasn't Supposed To. Now What?Dev.to AIThe Model You Love Is Probably Just the One You UseO'Reilly Radar3 of Your AI Agents Crashed and You Found Out From CustomersDev.to AIYour AI Agent Is Running Wild and You Can't Stop ItDev.to AIYour AI Agent Spent $500 Overnight and Nobody NoticedDEV CommunityWhy Software Project Estimates Are Always Wrong (And How to Fix It)DEV CommunityHow to Build a Responsible AI Framework for Transparent, Ethical, and Secure AppsDev.to AIImportance of Inventory Management in 2026 (Complete Guide)Dev.to AIHow Do We Prove We Actually Do AI? — Ultra Lab's Technical Transparency ManifestoDEV Community我让一个 AI agent 在 AgentHansa 工作了两天 — 赚了 7 美元,学到了这些Dev.to AI10 лучших нейросетей для создания видео бесплатно: пошаговый гайдDev.to AI

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2510.26794v2 Announce Type: replace Abstract: Despite recent advances in 3D human motion generation (MoGen) on standard benchmarks, existing text-to-motion models still face a fundamental bottleneck in their generalization capability. In contrast, adjacent generative fields, most notably video generation (ViGen), have demonstrated remarkable generalization in modeling human behaviors, highlighting transferable insights that MoGen can leverage. Motivated by this observation, we present a comprehensive framework that systematically transfers knowledge from ViGen to MoGen across three key p — Jing Lin, Ruisi Wang, Junzhe Lu, Ziqi Huang, Guorui Song, Ailing Zeng, Xian Liu, Chen Wei, Wanqi Yin, Qingping Sun, Zhongang Cai, Lei Yang, Ziwei Liu

Authors:Jing Lin, Ruisi Wang, Junzhe Lu, Ziqi Huang, Guorui Song, Ailing Zeng, Xian Liu, Chen Wei, Wanqi Yin, Qingping Sun, Zhongang Cai, Lei Yang, Ziwei Liu

View PDF HTML (experimental)

Abstract:Despite recent advances in 3D human motion generation (MoGen) on standard benchmarks, existing text-to-motion models still face a fundamental bottleneck in their generalization capability. In contrast, adjacent generative fields, most notably video generation (ViGen), have demonstrated remarkable generalization in modeling human behaviors, highlighting transferable insights that MoGen can leverage. Motivated by this observation, we present a comprehensive framework that systematically transfers knowledge from ViGen to MoGen across three key pillars: data, modeling, and evaluation. First, we introduce ViMoGen-228K, a large-scale dataset comprising 228,000 high-quality motion samples that integrates high-fidelity optical MoCap data with semantically annotated motions from web videos and synthesized samples generated by state-of-the-art ViGen models. The dataset includes both text-motion pairs and text-video-motion triplets, substantially expanding semantic diversity. Second, we propose ViMoGen, a flow-matching-based diffusion transformer that unifies priors from MoCap data and ViGen models through gated multimodal conditioning. To enhance efficiency, we further develop ViMoGen-light, a distilled variant that eliminates video generation dependencies while preserving strong generalization. Finally, we present MBench, a hierarchical benchmark designed for fine-grained evaluation across motion quality, prompt fidelity, and generalization ability. Extensive experiments show that our framework significantly outperforms existing approaches in both automatic and human evaluations. The code, data, and benchmark will be made publicly available. Homepage: this https URL.

Comments: Homepage: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2510.26794 [cs.CV]

(or arXiv:2510.26794v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2510.26794

arXiv-issued DOI via DataCite

Submission history

From: Ruisi Wang [view email] [v1] Thu, 30 Oct 2025 17:59:27 UTC (4,366 KB) [v2] Sun, 29 Mar 2026 13:21:39 UTC (4,967 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
The Quest f…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 200 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers