PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal
arXiv:2603.27555v1 Announce Type: new Abstract: Removing objects from natural images is challenging due to difficulty of synthesizing semantically coherent content while preserving background integrity. Existing methods often rely on fine-tuning, prompt engineering, or inference-time optimization, yet still suffer from texture inconsistency, rigid artifacts, weak foreground-background disentanglement, and poor scalability for multi-object removal. We propose a novel zero-shot object removal framework, namely PANDORA, that operates directly on pre-trained text-to-image diffusion models, requiri — Dinh-Khoi Vo, Van-Loc Nguyen, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le
View PDF HTML (experimental)
Abstract:Removing objects from natural images is challenging due to difficulty of synthesizing semantically coherent content while preserving background integrity. Existing methods often rely on fine-tuning, prompt engineering, or inference-time optimization, yet still suffer from texture inconsistency, rigid artifacts, weak foreground-background disentanglement, and poor scalability for multi-object removal. We propose a novel zero-shot object removal framework, namely PANDORA, that operates directly on pre-trained text-to-image diffusion models, requiring no fine-tuning, prompts, or optimization. We propose Pixel-wise Attention Dissolution to remove object by nullifying the most correlated attention keys for masked pixels, effectively eliminating the object from self-attention flow and allowing background context to dominate reconstruction. We further introduce Localized Attentional Disentanglement Guidance to steer denoising toward latent manifolds favorable to clean object removal. Together, these components enable precise, non-rigid, prompt-free, and scalable multi-object erasure in a single pass. Experiments demonstrate superior visual fidelity and semantic plausibility compared to state-of-the-art methods. The project page is available at this https URL.
Comments: ICME 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.27555 [cs.CV]
(or arXiv:2603.27555v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.27555
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Trung Nghia Le [view email] [v1] Sun, 29 Mar 2026 07:34:08 UTC (8,981 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivWhy do I believe preserving structure is enough?
There's a lot even our best neuroscientists don't know about the human brain. How can we have any reasonable hope for preservation given those unknowns? What if there are crucial memory mechanisms that are so poorly understood, we don't even know to check whether our methods preserve them? As it turns out, there's some interesting empirical evidence about the general shape , and limits, of those unknowns. In Ted Chiang's short story Exhalation , a race of aliens have brains which run on compressed air, performing computations and storing information in elaborate arrangements of hinged gold-foil leaves. The leaves are held in position by a constant stream of air flowing through the brain's tubules, encoding alien thoughts and memories. That ephemeral suspension pattern is the whole self—any
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Label-free pathological subtyping of non-small cell lung cancer using deep classification and virtual immunohistochemical staining
npj Digital Medicine, Published online: 03 April 2026; doi:10.1038/s41746-026-02557-x Label-free pathological subtyping of non-small cell lung cancer using deep classification and virtual immunohistochemical staining





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!