Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AII Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIPredicting 10 Minutes in 1 Square Meter: The Ultimate AI Boundary?DEV CommunityOracle Database 26ai: The World’s First AI-Native Database Just Changed EverythingMedium AIGetting Data from Multiple Sources in Power BIDEV CommunityAI APIs That Simplify Complex FeaturesMedium AIPART FIVE – THE CAPTAIN’S LOGSMedium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AII Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIPredicting 10 Minutes in 1 Square Meter: The Ultimate AI Boundary?DEV CommunityOracle Database 26ai: The World’s First AI-Native Database Just Changed EverythingMedium AIGetting Data from Multiple Sources in Power BIDEV CommunityAI APIs That Simplify Complex FeaturesMedium AIPART FIVE – THE CAPTAIN’S LOGSMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2511.11483v4 Announce Type: replace-cross Abstract: Recent text-to-image (T2I) models have made remarkable progress in generating visually realistic and semantically coherent images. However, they still suffer from randomness and inconsistency with the given prompts, particularly when textual descriptions are vague or underspecified. Existing approaches, such as prompt rewriting, best-of-N sampling, and self-refinement, can mitigate these issues but usually require additional modules and operate independently, hindering test-time scaling efficiency and increasing computational overhead. — Kaishen Wang, Ruibo Chen, Tong Zheng, Heng Huang

View PDF HTML (experimental)

Abstract:Recent text-to-image (T2I) models have made remarkable progress in generating visually realistic and semantically coherent images. However, they still suffer from randomness and inconsistency with the given prompts, particularly when textual descriptions are vague or underspecified. Existing approaches, such as prompt rewriting, best-of-N sampling, and self-refinement, can mitigate these issues but usually require additional modules and operate independently, hindering test-time scaling efficiency and increasing computational overhead. In this paper, we introduce ImAgent, a training-free unified multimodal agent that integrates reasoning, generation, and self-evaluation within a single framework for efficient test-time scaling. Guided by a policy controller, multiple generation actions dynamically interact and self-organize to enhance image fidelity and semantic alignment without relying on external models. Extensive experiments on image generation and editing tasks demonstrate that ImAgent consistently improves over the backbone and even surpasses other strong baselines where the backbone model fails, highlighting the potential of unified multimodal agents for adaptive and efficient image generation under test-time scaling.

Comments: 8 tables, 8 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2511.11483 [cs.CV]

(or arXiv:2511.11483v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.11483

arXiv-issued DOI via DataCite

Submission history

From: Kaishen Wang [view email] [v1] Fri, 14 Nov 2025 17:00:29 UTC (2,888 KB) [v2] Mon, 24 Nov 2025 02:28:18 UTC (2,892 KB) [v3] Sat, 31 Jan 2026 23:39:06 UTC (2,829 KB) [v4] Sat, 28 Mar 2026 16:18:55 UTC (2,378 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ImAgent: A …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 237 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers