Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessTutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AISystem Instead of Team: Rethinking How Businesses Are BuiltDev.to AI10 лучших системных промптов ChatGPT: секреты успеха без опыта!Dev.to AIAI Post 4: When AI Gets It Wrong: Why AI Fails (And What That Teaches Us)Medium AIGoogle AI Overviews Are Reshaping Search — Here’s How to Get Your Business CitedDev.to AIThe $500/Month “Tool Trap” (And How Beginners Are Escaping It for Just $1)Medium AIAnthropic Accidentally Exposes Source Code for Claude Code - CNETGoogle News: ClaudeThe 4,500 Micro-Adjustment Question: Why the Best AI Still Needs a “Commander” in the Control Room.Medium AIJournal Figure Replication | Python Implementation of Sector Violin PlotsMedium AICommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessTutorials vs. Transformations: What Beauty Content Wins in 2026Dev.to AIAnthropic employee error exposes Claude Code source - InfoWorldGoogle News: ClaudeMulti-Factor Strategies Aren't Exclusive to Big Firms: A Research Framework for Independent QuantsDev.to AISystem Instead of Team: Rethinking How Businesses Are BuiltDev.to AI10 лучших системных промптов ChatGPT: секреты успеха без опыта!Dev.to AIAI Post 4: When AI Gets It Wrong: Why AI Fails (And What That Teaches Us)Medium AIGoogle AI Overviews Are Reshaping Search — Here’s How to Get Your Business CitedDev.to AIThe $500/Month “Tool Trap” (And How Beginners Are Escaping It for Just $1)Medium AIAnthropic Accidentally Exposes Source Code for Claude Code - CNETGoogle News: ClaudeThe 4,500 Micro-Adjustment Question: Why the Best AI Still Needs a “Commander” in the Control Room.Medium AIJournal Figure Replication | Python Implementation of Sector Violin PlotsMedium AICommunity Without Tokens: What AI Dev Tools Can Learn from Crypto's Community PlaybookDev.to AI

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.28713v1 Announce Type: new Abstract: Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focus on T2I generation and lack support for image editing. In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network — Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao

View PDF HTML (experimental)

Abstract:Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focus on T2I generation and lack support for image editing. In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through in-context spatial concatenation in the latent space. It concatenates images horizontally as input, using a (target | blank) configuration for generation tasks and (target | source) for editing tasks. To stabilize the training of this compact model, we introduce a task-progressive joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After high-quality SFT and reinforcement learning, DreamLite achieves GenEval (0.72) for image generation and ImgEdit (4.11) for image editing, outperforming existing on-device models and remaining competitive with several server-side models. By employing step distillation, we further reduce denoising processing to just 4 steps, enabling our DreamLite could generate or edit a 1024 x 1024 image in less than 1s on a Xiaomi 14 smartphone. To the best of our knowledge, DreamLite is the first unified on-device diffusion model that supports both image generation and image editing.

Comments: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28713 [cs.CV]

(or arXiv:2603.28713v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28713

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kailai Feng [view email] [v1] Mon, 30 Mar 2026 17:30:25 UTC (37,028 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
DreamLite: …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 173 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers