Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen the Scraper Breaks Itself: Building a Self-Healing CSS Selector Repair SystemDEV CommunitySelf-Referential Generics in Kotlin: When Type Safety Requires Talking to YourselfDEV CommunitySources: Amazon is in talks to acquire Globalstar to bolster its low Earth orbit satellite business; Apple's 20% stake in Globalstar is a complicating factor (Financial Times)TechmemeZ.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywhereMarkTechPostHow I Started Using AI Agents for End-to-End Testing (Autonoma AI)DEV CommunityHow AI Is Changing PTSD Recovery — And Why It MattersDEV CommunityDeepSource vs Coverity: Static Analysis ComparedDEV CommunityClaude Code's Source Didn't Leak. It Was Already Public for Years.DEV CommunityStop Accepting BGP Routes on Trust Alone: Deploy RPKI ROV on IOS-XE and IOS XR TodayDEV CommunityI Built 5 SaaS Products in 7 Days Using AIDEV CommunitySingle-cell imaging and machine learning reveal hidden coordination in algae's response to light stress - MSNGoogle News: Machine LearningGoogle Dramatically Upgrades Storage in Google AI Pro - Thurrott.comGoogle News: GeminiBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessWhen the Scraper Breaks Itself: Building a Self-Healing CSS Selector Repair SystemDEV CommunitySelf-Referential Generics in Kotlin: When Type Safety Requires Talking to YourselfDEV CommunitySources: Amazon is in talks to acquire Globalstar to bolster its low Earth orbit satellite business; Apple's 20% stake in Globalstar is a complicating factor (Financial Times)TechmemeZ.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywhereMarkTechPostHow I Started Using AI Agents for End-to-End Testing (Autonoma AI)DEV CommunityHow AI Is Changing PTSD Recovery — And Why It MattersDEV CommunityDeepSource vs Coverity: Static Analysis ComparedDEV CommunityClaude Code's Source Didn't Leak. It Was Already Public for Years.DEV CommunityStop Accepting BGP Routes on Trust Alone: Deploy RPKI ROV on IOS-XE and IOS XR TodayDEV CommunityI Built 5 SaaS Products in 7 Days Using AIDEV CommunitySingle-cell imaging and machine learning reveal hidden coordination in algae's response to light stress - MSNGoogle News: Machine LearningGoogle Dramatically Upgrades Storage in Google AI Pro - Thurrott.comGoogle News: Gemini

A training-free framework for high-fidelity appearance transfer via diffusion transformers

arXivby [Submitted on 24 Mar 2026]March 31, 20261 min read0 views
Source Quiz

arXiv:2603.26767v1 Announce Type: new Abstract: Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first training-free framework specifically designed to tame DiTs for high-fidelity appearance transfer. Our core is a synergistic system that disentangles structure and appearance. We leverage high-fidelity inversion to establish a rich content prior for the so — Shengrong Gu, Ye Wang, Song Wu, Rui Ma, Qian Wang, Lanjun Wang, Zili Yi

View PDF HTML (experimental)

Abstract:Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first training-free framework specifically designed to tame DiTs for high-fidelity appearance transfer. Our core is a synergistic system that disentangles structure and appearance. We leverage high-fidelity inversion to establish a rich content prior for the source image, capturing its lighting and micro-textures. A novel attention-sharing mechanism then dynamically fuses purified appearance features from a reference, guided by geometric priors. Our unified approach operates at 1024px and outperforms specialized methods on tasks ranging from semantic attribute transfer to fine-grained material application. Extensive experiments confirm our state-of-the-art performance in both structural preservation and appearance fidelity.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26767 [cs.CV]

(or arXiv:2603.26767v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26767

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shengrong Gu [view email] [v1] Tue, 24 Mar 2026 07:53:50 UTC (34,959 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A training-…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 195 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers