Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessNothing’s AI devices plan reportedly contains smart glasses and earbudsTechCrunchRuben Gallego Takes Aim At Marco Rubio Over Threat To Leave NATO: 'No Right To Take Us Out Of It'International Business TimesIndia says foreign investment gains made before 2017 are exempt from its General Anti-Avoidance Rules, after a court required Tiger to pay $1.6B on a 2018 sale (Reuters)TechmemeCathie Wood on OpenAI: We continue to serve as a bridge between private and public markets - CNBCGoogle News: OpenAIMemahami Dasar Web Development: Mengenal Frontend dan BackendDEV CommunityCombining the robot operating system with LLMs for natural-language controlPhys.org AICombining the robot operating system with LLMs for natural-language control - Tech XploreGoogle News: LLMEU bars AI-generated content from official communications, according to PoliticoThe DecoderI tested ChatGPT vs. Claude to see which is better - and if it's worth switchingZDNet AII tested ChatGPT vs. Claude to see which is better - and if it's worth switching - ZDNETGoogle News: ChatGPTOpenClaw AI Agent Framework: Run Autonomous AI on Your Own HardwareDEV CommunityForbes Daily: OpenAI Is Now Worth A Whopping $852 Billion - ForbesGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessNothing’s AI devices plan reportedly contains smart glasses and earbudsTechCrunchRuben Gallego Takes Aim At Marco Rubio Over Threat To Leave NATO: 'No Right To Take Us Out Of It'International Business TimesIndia says foreign investment gains made before 2017 are exempt from its General Anti-Avoidance Rules, after a court required Tiger to pay $1.6B on a 2018 sale (Reuters)TechmemeCathie Wood on OpenAI: We continue to serve as a bridge between private and public markets - CNBCGoogle News: OpenAIMemahami Dasar Web Development: Mengenal Frontend dan BackendDEV CommunityCombining the robot operating system with LLMs for natural-language controlPhys.org AICombining the robot operating system with LLMs for natural-language control - Tech XploreGoogle News: LLMEU bars AI-generated content from official communications, according to PoliticoThe DecoderI tested ChatGPT vs. Claude to see which is better - and if it's worth switchingZDNet AII tested ChatGPT vs. Claude to see which is better - and if it's worth switching - ZDNETGoogle News: ChatGPTOpenClaw AI Agent Framework: Run Autonomous AI on Your Own HardwareDEV CommunityForbes Daily: OpenAI Is Now Worth A Whopping $852 Billion - ForbesGoogle News: OpenAI

Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2505.21545v4 Announce Type: replace-cross Abstract: Latent Video Diffusion Models (LVDMs) have achieved state-of-the-art generative quality for image and video generation; however, they remain brittle under noisy conditioning, where small perturbations in text or multimodal embeddings can cascade over timesteps and cause semantic drift. Existing corruption strategies from image diffusion (Gaussian, Uniform) fail in video settings because static noise disrupts temporal fidelity. In this paper, we propose CAT-LVDM, a corruption-aware training framework with structured, data-aligned noise i — Chika Maduabuchi, Hao Chen, Yujin Han, Jindong Wang

View PDF HTML (experimental)

Abstract:Latent Video Diffusion Models (LVDMs) have achieved state-of-the-art generative quality for image and video generation; however, they remain brittle under noisy conditioning, where small perturbations in text or multimodal embeddings can cascade over timesteps and cause semantic drift. Existing corruption strategies from image diffusion (Gaussian, Uniform) fail in video settings because static noise disrupts temporal fidelity. In this paper, we propose CAT-LVDM, a corruption-aware training framework with structured, data-aligned noise injection tailored for video diffusion. Our two operators, Batch-Centered Noise Injection (BCNI) and Spectrum-Aware Contextual Noise (SACN), align perturbations with batch semantics or spectral dynamics to preserve coherence. CAT-LVDM yields substantial gains: BCNI reduces FVD by 31.9 percent on WebVid-2M, MSR-VTT, and MSVD, while SACN improves UCF-101 by 12.3 percent, outperforming Gaussian, Uniform, and even large diffusion baselines like DEMO (2.3B) and Lavie (3B) despite training on 5x less data. Ablations confirm the unique value of low-rank, data-aligned noise, and theory establishes why these operators tighten robustness and generalization bounds. CAT-LVDM thus sets a new framework for robust video diffusion, and our experiments show that it can also be extended to autoregressive generation and multimodal video understanding LLMs. Code, models, and samples are available at this https URL

Comments: ICLR 2026 ReALM-GEN

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2505.21545 [cs.CV]

(or arXiv:2505.21545v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2505.21545

arXiv-issued DOI via DataCite

Submission history

From: Chika Maduabuchi [view email] [v1] Sat, 24 May 2025 20:11:14 UTC (19,426 KB) [v2] Wed, 11 Feb 2026 10:26:17 UTC (19,416 KB) [v3] Thu, 26 Mar 2026 13:23:26 UTC (19,417 KB) [v4] Sun, 29 Mar 2026 16:05:22 UTC (19,418 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Corruption-…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 201 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers