Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic releases part of AI tool source code in 'error'TechXplore AIMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: Claude38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnthropic releases part of AI tool source code in 'error'TechXplore AIMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: Claude38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI Safety

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2601.05138v2 Announce Type: replace Abstract: Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inherently capture dynamics in the projected 2D image plane. To bridge this gap, we introduce VerseCrafter, a geometry-driven video world model that generates dynamic, realistic videos from a unified 4D geometric world state. Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state as a static background point cloud — Sixiao Zheng, Minghao Yin, Wenbo Hu, Xiaoyu Li, Ying Shan, Yanwei Fu

View PDF HTML (experimental)

Abstract:Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inherently capture dynamics in the projected 2D image plane. To bridge this gap, we introduce VerseCrafter, a geometry-driven video world model that generates dynamic, realistic videos from a unified 4D geometric world state. Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state as a static background point cloud and per-object 3D Gaussian trajectories. This representation captures each object's motion path and probabilistic 3D occupancy over time, providing a flexible, category-agnostic alternative to rigid bounding boxes and parametric models. We render 4D Geometric Control into 4D control maps for a pretrained video diffusion model, enabling high-fidelity, view-consistent video generation that faithfully follows the specified dynamics. To enable training at scale, we develop an automatic data engine and construct VerseControl4D, a real-world dataset of 35K training samples with automatically derived prompts and rendered 4D control maps. Extensive experiments show that VerseCrafter achieves superior visual quality and more accurate control over camera and multi-object motion than prior methods.

Comments: Project Page: this https URL, Accepted by CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2601.05138 [cs.CV]

(or arXiv:2601.05138v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2601.05138

arXiv-issued DOI via DataCite

Submission history

From: Sixiao Zheng [view email] [v1] Thu, 8 Jan 2026 17:28:52 UTC (42,718 KB) [v2] Mon, 30 Mar 2026 02:18:28 UTC (58,237 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
VerseCrafte…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 183 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers