Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodalAI’s Uncertain Cost Effects in Health Care - American Enterprise Institute - AEIGNews AI healthcareWhat happened when they installed ChatGPT on a nuclear supercomputer - OutriderGoogle News: ChatGPTSony's gaming division just bought an AI startup that turns photos into 3D volumesEngadgetBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodalAI’s Uncertain Cost Effects in Health Care - American Enterprise Institute - AEIGNews AI healthcareWhat happened when they installed ChatGPT on a nuclear supercomputer - OutriderGoogle News: ChatGPTSony's gaming division just bought an AI startup that turns photos into 3D volumesEngadget
AI NEWS HUBbyEIGENVECTOREigenvector

Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27666v1 Announce Type: new Abstract: Recent advances in diffusion-based controllable visual generation have led to remarkable improvements in image quality. However, these powerful models are typically deployed on cloud servers due to their large computational demands, raising serious concerns about user data privacy. To enable secure and efficient on-device generation, we explore in this paper controllable diffusion models built upon linear attention architectures, which offer superior scalability and efficiency, even on edge devices. Yet, our experiments reveal that existing contr — Yuhe Liu, Zhenxiong Tan, Yujia Hu, Songhua Liu, Xinchao Wang

View PDF HTML (experimental)

Abstract:Recent advances in diffusion-based controllable visual generation have led to remarkable improvements in image quality. However, these powerful models are typically deployed on cloud servers due to their large computational demands, raising serious concerns about user data privacy. To enable secure and efficient on-device generation, we explore in this paper controllable diffusion models built upon linear attention architectures, which offer superior scalability and efficiency, even on edge devices. Yet, our experiments reveal that existing controllable generation frameworks, such as ControlNet and OminiControl, either lack the flexibility to support multiple heterogeneous condition types or suffer from slow convergence on such linear-attention models. To address these limitations, we propose a novel controllable diffusion framework tailored for linear attention backbones like SANA. The core of our method lies in a unified gated conditioning module working in a dual-path pipeline, which effectively integrates multi-type conditional inputs, such as spatially aligned and non-aligned cues. Extensive experiments on multiple tasks and benchmarks demonstrate that our approach achieves state-of-the-art controllable generation performance based on linear-attention models, surpassing existing methods in terms of fidelity and controllability.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27666 [cs.CV]

(or arXiv:2603.27666v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27666

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yuhe Liu [view email] [v1] Sun, 29 Mar 2026 12:31:34 UTC (2,252 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Gated Condi…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 112 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!