Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessStartup funding shatters all records in Q1TechCrunch AIHow to Use Shaders in React (2026 WebGPU / WebGL Tutorial)DEV CommunityThe 5th Agent Orchestration Pattern: Market-Based Task AllocationDEV CommunityThe Hidden Cost of Copy-Pasting Code Into ChatGPTDEV Community14-Package Monorepo: How We Structured WAIaaS for AI Agent BuildersDEV CommunityPromoting raw BG3 gameplay bundle previews in the TD2 SDL portDEV CommunityWhat Is New In Helm 4 And How It Improves Over Helm 3DEV CommunityDevelopers Are Designing for AI Before Users NowDEV CommunityStop Using Elaborate Personas: Research Shows They Degrade Claude Code OutputDEV CommunityAnthropic Executive Blames Claude Code Leak on ‘Process Errors’Bloomberg TechnologyAn Engineering-grade breakdown of RAG PipelineDEV CommunityHate Speech Detection Still Cooks (Even in 2026)Towards AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessStartup funding shatters all records in Q1TechCrunch AIHow to Use Shaders in React (2026 WebGPU / WebGL Tutorial)DEV CommunityThe 5th Agent Orchestration Pattern: Market-Based Task AllocationDEV CommunityThe Hidden Cost of Copy-Pasting Code Into ChatGPTDEV Community14-Package Monorepo: How We Structured WAIaaS for AI Agent BuildersDEV CommunityPromoting raw BG3 gameplay bundle previews in the TD2 SDL portDEV CommunityWhat Is New In Helm 4 And How It Improves Over Helm 3DEV CommunityDevelopers Are Designing for AI Before Users NowDEV CommunityStop Using Elaborate Personas: Research Shows They Degrade Claude Code OutputDEV CommunityAnthropic Executive Blames Claude Code Leak on ‘Process Errors’Bloomberg TechnologyAn Engineering-grade breakdown of RAG PipelineDEV CommunityHate Speech Detection Still Cooks (Even in 2026)Towards AI

VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward

HuggingFace Papersby Zhaochong An ,March 27, 20262 min read1 views
Source Quiz

VGGRPO is a latent geometry-guided framework that enhances video diffusion models' geometric consistency through a Latent Geometry Model and latent-space reinforcement learning with camera motion and geometry reprojection rewards. (13 upvotes on HuggingFace)

Published on Mar 27

Authors:

,

,

,

,

,

,

Abstract

VGGRPO enhances video diffusion models with latent geometry guidance to improve geometric consistency and camera stability while avoiding expensive VAE decoding.

AI-generated summary

Large-scale video diffusion models achieve impressive visual quality, yet often fail to preserve geometric consistency. Prior approaches improve consistency either by augmenting the generator with additional modules or applying geometry-aware alignment. However, architectural modifications can compromise the generalization of internet-scale pretrained models, while existing alignment methods are limited to static scenes and rely on RGB-space rewards that require repeated VAE decoding, incurring substantial compute overhead and failing to generalize to highly dynamic real-world scenes. To preserve the pretrained capacity while improving geometric consistency, we propose VGGRPO (Visual Geometry GRPO), a latent geometry-guided framework for geometry-aware video post-training. VGGRPO introduces a Latent Geometry Model (LGM) that stitches video diffusion latents to geometry foundation models, enabling direct decoding of scene geometry from the latent space. By constructing LGM from a geometry model with 4D reconstruction capability, VGGRPO naturally extends to dynamic scenes, overcoming the static-scene limitations of prior methods. Building on this, we perform latent-space Group Relative Policy Optimization with two complementary rewards: a camera motion smoothness reward that penalizes jittery trajectories, and a geometry reprojection consistency reward that enforces cross-view geometric coherence. Experiments on both static and dynamic benchmarks show that VGGRPO improves camera stability, geometry consistency, and overall quality while eliminating costly VAE decoding, making latent-space geometry-guided reinforcement an efficient and flexible approach to world-consistent video generation.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2603.26599

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.26599 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.26599 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.26599 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
VGGRPO: Tow…researchpaperarxivvideo diffu…latent spacegeometry fo…HuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 190 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers