Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training
Recent advances in video generation models have achieved impressive results. However, these models heavily rely on the use of high-quality data that combines both high visual quality and high motion quality. In this paper, we identify a key challenge in video data curation: the Motion-Vision Quality Dilemma. We discovered that visual quality and motion intensity inherently exhibit a negative correlation, making it hard to obtain golden data that excels in both aspects. To address this challenge, we first examine the hierarchical learning dynamics of video diffusion models and conduct gradient- — Xiangyang Luo, Qingyu Li, Yuming Li
View PDF HTML (experimental)
Abstract:Recent advances in video generation models have achieved impressive results. However, these models heavily rely on the use of high-quality data that combines both high visual quality and high motion quality. In this paper, we identify a key challenge in video data curation: the Motion-Vision Quality Dilemma. We discovered that visual quality and motion intensity inherently exhibit a negative correlation, making it hard to obtain golden data that excels in both aspects. To address this challenge, we first examine the hierarchical learning dynamics of video diffusion models and conduct gradient-based analysis on quality-degraded samples. We discover that quality-imbalanced data can produce gradients similar to golden data at appropriate timesteps. Based on this, we introduce the novel concept of Timestep selection in Training Process. We propose Timestep-aware Quality Decoupling (TQD), which modifies the data sampling distribution to better match the model's learning process. For certain types of data, the sampling distribution is skewed toward higher timesteps for motion-rich data, while high visual quality data is more likely to be sampled during lower timesteps. Through extensive experiments, we demonstrate that TQD enables training exclusively on separated imbalanced data to achieve performance surpassing conventional training with better data, challenging the necessity of perfect data in video generation. Moreover, our method also boosts model performance when trained on high-quality data, showcasing its effectiveness across different data scenarios.
Comments: Accepted to CVPR 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.25527 [cs.CV]
(or arXiv:2603.25527v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.25527
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Xiangyang Luo [view email] [v1] Thu, 26 Mar 2026 14:59:57 UTC (7,381 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv#475 – Demis Hassabis: Future of AI, Simulating Reality, Physics and Video Games
Demis Hassabis is the CEO of Google DeepMind and Nobel Prize winner for his groundbreaking work in protein structure prediction using AI. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep475-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/demis-hassabis-2-transcript CONTACT LEX: Feedback give feedback to Lex: https://lexfridman.com/survey AMA submit questions, videos or call-in: https://lexfridman.com/ama Hiring join our team: https://lexfridman.com/hiring Other other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: Demis s X: https://x.com/demishassabis DeepMind s X: https://x.com/GoogleDeepMind DeepMind s Instagram: https://instagram.com/GoogleDeepMi
#480 – Dave Hone: T-Rex, Dinosaurs, Extinction, Evolution, and Jurassic Park
Dave Hone is a paleontologist, expert on dinosaurs, co-host of the Terrible Lizards podcast, and author of numerous scientific papers and books on the behavior and ecology of dinosaurs. He lectures at Queen Mary University of London on topics of Ecology, Zoology, Biology, and Evolution. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep480-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/dave-hone-transcript CONTACT LEX: Feedback give feedback to Lex: https://lexfridman.com/survey AMA submit questions, videos or call-in: https://lexfridman.com/ama Hiring join our team: https://lexfridman.com/hiring Other other ways to get in touch: https://lexfridman.com/contact EPISODE LI
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic
arXiv:2604.01134v1 Announce Type: cross Abstract: The Operational Design Domain (ODD) of urbanoriented Level 4 (L4) autonomous driving, especially for autonomous robotaxis, confronts formidable challenges in complex urban mixed traffic environments. These challenges stem mainly from the high density of Vulnerable Road Users (VRUs) and their highly uncertain and unpredictable interaction behaviors. However, existing open-source datasets predominantly focus on structured scenarios such as highways or regulated intersections, leaving a critical gap in data representing chaotic, unstructured urban environments. To address this, this paper proposes an efficient, high-precision method for constructing drone-based datasets and establishes the Vehicle-Vulnerable Road User Interaction Dataset (VRUD
From Code Changes to Quality Gains: An Empirical Study in Python ML Systems with PyQu
arXiv:2511.02827v3 Announce Type: replace Abstract: In an era shaped by Generative Artificial Intelligence for code generation and the rising adoption of Python-based Machine Learning systems (MLS), software quality has emerged as a major concern. As these systems grow in complexity and importance, a key obstacle lies in understanding exactly how specific code changes affect overall quality-a shortfall aggravated by the lack of quality assessment tools and a clear mapping between ML systems code changes and their quality effects. Although prior work has explored code changes in MLS, it mostly stops at what the changes are, leaving a gap in our knowledge of the relationship between code changes and the MLS quality. To address this gap, we conducted a large-scale empirical study of 3,340 ope
Mitigating Omitted Variable Bias in Empirical Software Engineering
arXiv:2501.17026v5 Announce Type: replace Abstract: Omitted variable bias occurs when a statistical model leaves out variables that are relevant determinants of the effects under study. This results in the model attributing the missing variables' effect to some of the included variables -- hence over- or under-estimating the latter's true effect. Omitted variable bias presents a significant threat to the validity of empirical research, particularly in non-experimental studies such as those prevalent in empirical software engineering. This paper illustrates the impact of omitted variable bias on two illustrative examples in the software engineering domain, and uses them to present methods to investigate the possible presence of omitted variable bias, to estimate its impact, and to mitigate


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!