Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic says Claude subscriptions will no longer cover usage on third-party tools like OpenClaw starting April 4 at 12pm PT, to better manage capacity (Boris Cherny/@bcherny)TechmemeDoes GPT-2 Have a Fear Direction?lesswrong.comY Combinator's CEO says he ships 37,000 lines of AI code per dayHacker News AI TopShow HN: SpeechSDK – free, open-source SDK that unifies all AI voice modelsHacker News AI TopWe Ditched LangChain. Here’s What We Built Instead — and Why It’s Better for Serious AI Research.Medium AIAMD vs. Nvidia: The AI Supercycle Is Big Enough for Both. Here's the Better Buy. - AOL.comGNews AI NVIDIAFiling: Anthropic has formed AnthroPAC, a new PAC that will be funded exclusively and voluntarily by its employees and is expected to be bipartisan (Miranda Nazzaro/The Hill)TechmemeI Broke Up With ChatGPT (And My Productivity Thanked Me)Medium AIAI startup envisions '100M new people' making videogamesHacker News AI TopEsquire Singapore's One Piece "interview" mashes up AI slop and ghoulishness to make ghoulislop - AV ClubGNews AI SingaporeMost Students Think ChatGPT Helps Them Study — Here’s Why It Actually Slows Them Down (And How to…Medium AIWhen the server crashes the soulMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

FastVMT: Eliminating Redundancy in Video Motion Transfer

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2602.05551v3 Announce Type: replace Abstract: Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT — Yue Ma, Zhikai Wang, Tianhao Ren, Mingzhe Zheng, Hongyu Liu, Jiayi Guo, Kunyu Feng, Yuxuan Xue, Zixiang Zhao, Konrad Schindler, Qifeng Chen, Linfeng Zhang

Authors:Yue Ma, Zhikai Wang, Tianhao Ren, Mingzhe Zheng, Hongyu Liu, Jiayi Guo, Kunyu Feng, Yuxuan Xue, Zixiang Zhao, Konrad Schindler, Qifeng Chen, Linfeng Zhang

View PDF HTML (experimental)

Abstract:Video motion transfer aims to synthesize videos by generating visual content according to a text prompt while transferring the motion pattern observed in a reference video. Recent methods predominantly use the Diffusion Transformer (DiT) architecture. To achieve satisfactory runtime, several methods attempt to accelerate the computations in the DiT, but fail to address structural sources of inefficiency. In this work, we identify and remove two types of computational redundancy in earlier work: motion redundancy arises because the generic DiT architecture does not reflect the fact that frame-to-frame motion is small and smooth; gradient redundancy occurs if one ignores that gradients change slowly along the diffusion trajectory. To mitigate motion redundancy, we mask the corresponding attention layers to a local neighborhood such that interaction weights are not computed unnecessarily distant image regions. To exploit gradient redundancy, we design an optimization scheme that reuses gradients from previous diffusion steps and skips unwarranted gradient computations. On average, FastVMT achieves a 3.43x speedup without degrading the visual fidelity or the temporal consistency of the generated videos.

Comments: Accepted by ICLR2026, Project page: this http URL, Code: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2602.05551 [cs.CV]

(or arXiv:2602.05551v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.05551

arXiv-issued DOI via DataCite

Submission history

From: Kunyu Feng [view email] [v1] Thu, 5 Feb 2026 11:15:59 UTC (13,558 KB) [v2] Tue, 24 Mar 2026 13:23:33 UTC (23,840 KB) [v3] Mon, 30 Mar 2026 16:09:33 UTC (23,840 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FastVMT: El…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 111 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!