Generating Humanless Environment Walkthroughs from Egocentric Walking Tour Videos
arXiv:2603.29036v1 Announce Type: new Abstract: Egocentric "walking tour" videos provide a rich source of image data to develop rich and diverse visual models of environments around the world. However, the significant presence of humans in frames of these videos due to crowds and eye-level camera perspectives mitigates their usefulness in environment modeling applications. We focus on addressing this challenge by developing a generative algorithm that can realistically remove (i.e., inpaint) humans and their associated shadow effects from walking tour videos. Key to our approach is the construction of a rich semi-synthetic dataset of video clip pairs to train this generative model. Each pair in the dataset consists of an environment-only background clip, and a composite clip of walking hum
View PDF HTML (experimental)
Abstract:Egocentric "walking tour" videos provide a rich source of image data to develop rich and diverse visual models of environments around the world. However, the significant presence of humans in frames of these videos due to crowds and eye-level camera perspectives mitigates their usefulness in environment modeling applications. We focus on addressing this challenge by developing a generative algorithm that can realistically remove (i.e., inpaint) humans and their associated shadow effects from walking tour videos. Key to our approach is the construction of a rich semi-synthetic dataset of video clip pairs to train this generative model. Each pair in the dataset consists of an environment-only background clip, and a composite clip of walking humans with simulated shadows overlaid on the background. We randomly sourced both foreground and background components from real egocentric walking tour videos around the world to maintain visual diversity. We then used this dataset to fine-tune the state-of-the-art Casper video diffusion model for object and effects inpainting, and demonstrate that the resulting model performs far better than Casper both qualitatively and quantitatively at removing humans from walking tour clips with significant human presence and complex backgrounds. Finally, we show that the resulting generated clips can be used to build successful 3D/4D models of urban locations.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.29036 [cs.CV]
(or arXiv:2603.29036v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.29036
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Yujin Ham [view email] [v1] Mon, 30 Mar 2026 22:08:46 UTC (48,369 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelannounceapplication
Intel to Repurchase 49% Equity Interest in Ireland Fab Joint Venture
SANTA CLARA, Calif. and NEW YORK, April 1, 2026 Intel Corporation (Nasdaq: INTC) and Apollo (NYSE: APO) today announced a definitive agreement for Intel to repurchase the 49% equity interest in the joint venture related to Intel s Fab 34 in Ireland not held by Intel for $14.2 billion. The agreement reflects Intel s continued business momentum The post Intel to Repurchase 49% Equity Interest in Ireland Fab Joint Venture appeared first on Newsroom .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!