Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessLetters to Sen. Ed Markey: six autonomous vehicle companies say remote assistants don't directly control vehicles; Tesla says its operators are allowed to do so (Aarian Marshall/Wired)TechmemeAnthropic Just Leaked Claude Code's Source. Here's What It Means for Your Vibe-Coded App.DEV CommunityYou're a slop coder. Autospec is for professionals only.DEV CommunityWhat Happened to CodiumAI? The Rebrand to Qodo ExplainedDEV CommunityWhat Karpathy's Autoresearch Unlocked for MeDEV CommunityOpenClaw Creem agentDEV CommunityStock Market Today, March 31: Nvidia Rises on $2 Billion Marvell AI Infrastructure Partnership - The Motley FoolGNews AI NVIDIAVolt Typhoon Weaponized SOHO Routers at Scale — Here's Your Zero-Trust Playbook for the Remote EdgeDEV CommunityDeep Dive into vLLM: How PagedAttention & Continuous Batching Revolutionized LLM InferenceDEV CommunityFour futures of AI: Life sciences - EYGoogle News: AICan consumers support AI? Just 3% of households are paying subscribers - KTVLGNews AI USAOpenAI: $122 Billion Funding At $852 Billion Valuation Raised To Accelerate Next Phase Of AI - pulse2.comGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessLetters to Sen. Ed Markey: six autonomous vehicle companies say remote assistants don't directly control vehicles; Tesla says its operators are allowed to do so (Aarian Marshall/Wired)TechmemeAnthropic Just Leaked Claude Code's Source. Here's What It Means for Your Vibe-Coded App.DEV CommunityYou're a slop coder. Autospec is for professionals only.DEV CommunityWhat Happened to CodiumAI? The Rebrand to Qodo ExplainedDEV CommunityWhat Karpathy's Autoresearch Unlocked for MeDEV CommunityOpenClaw Creem agentDEV CommunityStock Market Today, March 31: Nvidia Rises on $2 Billion Marvell AI Infrastructure Partnership - The Motley FoolGNews AI NVIDIAVolt Typhoon Weaponized SOHO Routers at Scale — Here's Your Zero-Trust Playbook for the Remote EdgeDEV CommunityDeep Dive into vLLM: How PagedAttention & Continuous Batching Revolutionized LLM InferenceDEV CommunityFour futures of AI: Life sciences - EYGoogle News: AICan consumers support AI? Just 3% of households are paying subscribers - KTVLGNews AI USAOpenAI: $122 Billion Funding At $852 Billion Valuation Raised To Accelerate Next Phase Of AI - pulse2.comGoogle News: OpenAI

Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2601.14959v2 Announce Type: replace Abstract: Existing video frame interpolation (VFI) methods often adopt a frame-centric approach, processing videos as independent short segments (e.g., triplets), which leads to temporal inconsistencies and motion artifacts. To overcome this, we propose a holistic, video-centric paradigm named Local Diffusion Forcing for Video Frame Interpolation (LDF-VFI). Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence. To mitigate error accumulation inherent in auto-re — Xinyu Peng, Han Li, Yuyang Huang, Ziyang Zheng, Yaoming Wang, Xin Chen, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

View PDF HTML (experimental)

Abstract:Existing video frame interpolation (VFI) methods often adopt a frame-centric approach, processing videos as independent short segments (e.g., triplets), which leads to temporal inconsistencies and motion artifacts. To overcome this, we propose a holistic, video-centric paradigm named Local Diffusion Forcing for Video Frame Interpolation (LDF-VFI). Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence. To mitigate error accumulation inherent in auto-regressive generation, we introduce a novel skip-concatenate sampling strategy that effectively maintains temporal stability. Furthermore, LDF-VFI incorporates sparse, local attention and tiled VAE encoding, a combination that not only enables efficient processing of long sequences but also allows generalization to arbitrary spatial resolutions (e.g., 4K) at inference without retraining. An enhanced conditional VAE decoder, which leverages multi-scale features from the input video, further improves reconstruction fidelity. Empirically, LDF-VFI achieves state-of-the-art performance on challenging VFI benchmarks, demonstrating superior per-frame quality and temporal consistency, especially in scenes with large motion. The source code is available at this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2601.14959 [cs.CV]

(or arXiv:2601.14959v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2601.14959

arXiv-issued DOI via DataCite

Submission history

From: Xinyu Peng [view email] [v1] Wed, 21 Jan 2026 12:58:52 UTC (1,633 KB) [v2] Mon, 30 Mar 2026 08:09:27 UTC (24,420 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Towards Hol…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 139 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers