Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AIThe npm Supply Chain Attack Affecting Millions: How to Check If You’re ImpactedMedium AI+27795822694 Best Traditional Healer in Polokwane Limpopo | Bring Back Lost Love | Dr JamesMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AIThe npm Supply Chain Attack Affecting Millions: How to Check If You’re ImpactedMedium AI+27795822694 Best Traditional Healer in Polokwane Limpopo | Bring Back Lost Love | Dr JamesMedium AI

Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.28367v1 Announce Type: new Abstract: Visual autoregressive (VAR) models have recently emerged as a promising family of generative models, enabling a wide range of downstream vision tasks such as text-guided image editing. By shifting the editing paradigm from noise manipulation in diffusion-based methods to token-level operations, VAR-based approaches achieve better background preservation and significantly faster inference. However, existing VAR-based editing methods still face two key challenges: accurately localizing editable tokens and maintaining structural consistency in the e — Tao Xia, Jiawei Liu, Yukun Zhang, Ting Liu, Wei Wang, Lei Zhang

View PDF HTML (experimental)

Abstract:Visual autoregressive (VAR) models have recently emerged as a promising family of generative models, enabling a wide range of downstream vision tasks such as text-guided image editing. By shifting the editing paradigm from noise manipulation in diffusion-based methods to token-level operations, VAR-based approaches achieve better background preservation and significantly faster inference. However, existing VAR-based editing methods still face two key challenges: accurately localizing editable tokens and maintaining structural consistency in the edited results. In this work, we propose a novel text-guided image editing framework rooted in an analysis of intermediate feature distributions within VAR models. First, we introduce a coarse-to-fine token localization strategy that can refine editable regions, balancing editing fidelity and background preservation. Second, we analyze the intermediate representations of VAR models and identify structure-related features, by which we design a simple yet effective feature injection mechanism to enhance structural consistency between the edited and source images. Third, we develop a reinforcement learning-based adaptive feature injection scheme that automatically learns scale- and layer-specific injection ratios to jointly optimize editing fidelity and structure preservation. Extensive experiments demonstrate that our method achieves superior structural consistency and editing quality compared with state-of-the-art approaches, across both local and global editing scenarios.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28367 [cs.CV]

(or arXiv:2603.28367v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28367

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tao Xia [view email] [v1] Mon, 30 Mar 2026 12:35:33 UTC (7,417 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Rethinking …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 92 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers