Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AI

Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

arXiv cs.CVby Felix Wimbauer, Fabian Manhardt, Michael Oechsle, Nikolai Kalischek, Christian Rupprecht, Daniel Cremers, Federico TombariApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.28980v1 Announce Type: new Abstract: The synthesis of immersive 3D scenes from text is rapidly maturing, driven by novel video generative models and feed-forward 3D reconstruction, with vast potential in AR/VR and world modeling. While panoramic images have proven effective for scene initialization, existing approaches suffer from a trade-off between visual fidelity and explorability: autoregressive expansion suffers from context drift, while panoramic video generation is limited to low resolution. We present Stepper, a unified framework for text-driven immersive 3D scene synthesis that circumvents these limitations via stepwise panoramic scene expansion. Stepper leverages a novel multi-view 360{\deg} diffusion model that enables consistent, high-resolution expansion, coupled wi

View PDF HTML (experimental)

Abstract:The synthesis of immersive 3D scenes from text is rapidly maturing, driven by novel video generative models and feed-forward 3D reconstruction, with vast potential in AR/VR and world modeling. While panoramic images have proven effective for scene initialization, existing approaches suffer from a trade-off between visual fidelity and explorability: autoregressive expansion suffers from context drift, while panoramic video generation is limited to low resolution. We present Stepper, a unified framework for text-driven immersive 3D scene synthesis that circumvents these limitations via stepwise panoramic scene expansion. Stepper leverages a novel multi-view 360° diffusion model that enables consistent, high-resolution expansion, coupled with a geometry reconstruction pipeline that enforces geometric coherence. Trained on a new large-scale, multi-view panorama dataset, Stepper achieves state-of-the-art fidelity and structural consistency, outperforming prior approaches, thereby setting a new standard for immersive scene generation.

Comments: Accepted at CVPR 2026 Findings; Find our project page under this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28980 [cs.CV]

(or arXiv:2603.28980v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28980

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Felix Wimbauer [view email] [v1] Mon, 30 Mar 2026 20:26:28 UTC (19,710 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelannounceworld model

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Stepper: St…modelannounceworld modelarxivarXiv cs.CV

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Releases