Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessApache IoTDB for Intelligent Transportation — Architecture, Core Capabilities, and Industry FitDEV CommunityI got tired of uploading sensitive images to random websites, so I built a local-only blur toolDEV CommunityAionUi: One Interface for 12+ AI Agents — A Free, Open-Source Cowork Desktop AppDEV CommunityVibe analyzing my genomelesswrong.comZero Trust for AI Agents: Why We Added Tiered Membership to Our NetworkDEV Community5 CLAUDE.md Rules That Made My AI Stop Asking and Start DoingDEV Community🚀 The "Legacy Code" Nightmare is Over: How AI Agents are Automating App ModernizationDEV CommunityWeb Color "Wheel" ChartDEV CommunityThe Kidney ProblemDEV CommunityThe Stranger's HandshakeDEV CommunityI built an open-source LLM security scanner that runs in <5ms with zero dependenciesDEV CommunityNvidia's Slurm Acquisition Sparks Fresh Fears Over Fairness In AI Chip Race: Report - BenzingaGNews AI NVIDIABlack Hat USADark ReadingBlack Hat AsiaAI BusinessApache IoTDB for Intelligent Transportation — Architecture, Core Capabilities, and Industry FitDEV CommunityI got tired of uploading sensitive images to random websites, so I built a local-only blur toolDEV CommunityAionUi: One Interface for 12+ AI Agents — A Free, Open-Source Cowork Desktop AppDEV CommunityVibe analyzing my genomelesswrong.comZero Trust for AI Agents: Why We Added Tiered Membership to Our NetworkDEV Community5 CLAUDE.md Rules That Made My AI Stop Asking and Start DoingDEV Community🚀 The "Legacy Code" Nightmare is Over: How AI Agents are Automating App ModernizationDEV CommunityWeb Color "Wheel" ChartDEV CommunityThe Kidney ProblemDEV CommunityThe Stranger's HandshakeDEV CommunityI built an open-source LLM security scanner that runs in <5ms with zero dependenciesDEV CommunityNvidia's Slurm Acquisition Sparks Fresh Fears Over Fairness In AI Chip Race: Report - BenzingaGNews AI NVIDIA
AI NEWS HUBbyEIGENVECTOREigenvector

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

arXivMarch 31, 20262 min read1 views
Source Quiz

arXiv:2506.05207v4 Announce Type: replace Abstract: Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to large video diffusion transformers. Naive two-stage LoRA tuning struggles to maintain motion consistency between generated and input videos due to the i — Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zexuan Yan, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

Authors:Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zexuan Yan, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

View PDF HTML (experimental)

Abstract:Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to large video diffusion transformers. Naive two-stage LoRA tuning struggles to maintain motion consistency between generated and input videos due to the inherent spatial-temporal coupling in the 3D attention operator. Additionally, they require time-consuming fine-tuning processes in both stages. To tackle these issues, we propose Follow-Your-Motion, an efficient two-stage video motion transfer framework that finetunes a powerful video diffusion transformer to synthesize complex motion. Specifically, we propose a spatial-temporal decoupled LoRA to decouple the attention architecture for spatial appearance and temporal motion processing. During the second training stage, we design the sparse motion sampling and adaptive RoPE to accelerate the tuning speed. To address the lack of a benchmark for this field, we introduce MotionBench, a comprehensive benchmark comprising diverse motion, including creative camera motion, single object motion, multiple object motion, and complex human motion. We show extensive evaluations on MotionBench to verify the superiority of Follow-Your-Motion.

Comments: Accepted by ICLR 2026, project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2506.05207 [cs.CV]

(or arXiv:2506.05207v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2506.05207

arXiv-issued DOI via DataCite

Submission history

From: Kunyu Feng [view email] [v1] Thu, 5 Jun 2025 16:18:32 UTC (25,007 KB) [v2] Wed, 13 Aug 2025 16:07:46 UTC (25,007 KB) [v3] Tue, 24 Mar 2026 12:52:27 UTC (19,664 KB) [v4] Mon, 30 Mar 2026 15:48:00 UTC (19,665 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Follow-Your…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 251 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers