Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Compositional Image Synthesis with Inference-Time Scaling

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2510.24133v2 Announce Type: replace-cross Abstract: Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality. Specifically, we leverage large language models (LLMs) to synthesize explicit layouts from input prompts, and we inject these layouts into the image generation proces — Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn

View PDF HTML (experimental)

Abstract:Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality. Specifically, we leverage large language models (LLMs) to synthesize explicit layouts from input prompts, and we inject these layouts into the image generation process, where a object-centric vision-language model (VLM) judge reranks multiple candidates to select the most prompt-aligned outcome iteratively. By unifying explicit layout-grounding with self-refine-based inference-time scaling, our framework achieves stronger scene alignment with prompts compared to recent text-to-image models. The code are available at this https URL.

Comments: projcet page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2510.24133 [cs.CV]

(or arXiv:2510.24133v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2510.24133

arXiv-issued DOI via DataCite

Submission history

From: Minsuk Ji [view email] [v1] Tue, 28 Oct 2025 07:16:21 UTC (781 KB) [v2] Fri, 27 Mar 2026 08:35:54 UTC (786 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Composition…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 147 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers