ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction
arXiv:2512.05422v2 Announce Type: replace Abstract: Unified multimodal models significantly improve visual generation by combining vision-language models (VLMs) with diffusion models. However, existing methods struggle to fully balance sufficient interaction and flexible implementation due to vast representation difference. Considering abundant and hierarchical information in VLM's layers from low-level details to high-level semantics, we propose \textbf{ParaUni}. It extracts features from variants VLM's layers in a \textbf{Para}llel way for comprehensive information interaction and retains a — Jiangtong Tan, Lin Liu, Jie Huanng, Xiaopeng Zhang, Qi Tian, Feng Zhao
View PDF HTML (experimental)
Abstract:Unified multimodal models significantly improve visual generation by combining vision-language models (VLMs) with diffusion models. However, existing methods struggle to fully balance sufficient interaction and flexible implementation due to vast representation difference. Considering abundant and hierarchical information in VLM's layers from low-level details to high-level semantics, we propose \textbf{ParaUni}. It extracts features from variants VLM's layers in a \textbf{Para}llel way for comprehensive information interaction and retains a flexible separation architecture to enhance generation in \textbf{Uni}fied multimodal model. Concretely, visual features from all VLM's layers are fed in parallel into a Layer Integration Module (LIM), which efficiently integrates fine-grained details and semantic abstractions and provides the fused representation as a condition to the diffusion model. To further enhance performance, we reveal that these hierarchical layers respond unequally to different rewards in Reinforcement Learning (RL). Crucially, we design a Layer-wise Dynamic Adjustment Mechanism (LDAM) to facilitate multiple reward improvements that aligns the hierarchical properties of these layers using RL. Extensive experiments show ParaUni leverages complementary multi-layer features to substantially improve generation quality and shows strong potential for multiple reward advances during RL stages. Code is available at this https URL.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2512.05422 [cs.CV]
(or arXiv:2512.05422v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2512.05422
arXiv-issued DOI via DataCite
Submission history
From: Jiangtong Tan [view email] [v1] Fri, 5 Dec 2025 04:41:57 UTC (1,300 KB) [v2] Sat, 28 Mar 2026 13:02:42 UTC (1,278 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivJapan To Build Global Hubs For AI Robotics Research - themorningnews.com
<a href="https://news.google.com/rss/articles/CBMimwFBVV95cUxPTndfWlc3Q1V4dzFydUlsZ0tFS1ZGWkNVLXVucF9CVkVWZkVveUlWaTVqTU1ndmY1cUJhMFNIN25VSmtCTG5vbXg3NHlYYkt2ZGNlYUlxQmVYT2ZlTGJEX1Y1Uml1UlFmb19EbExZcFdMRUI1SXpfRHlYMVp4SzBnZTRRM01ORjhWZmljNmhZb2RCT1Brb1MyN2YwWQ?oc=5" target="_blank">Japan To Build Global Hubs For AI Robotics Research</a> <font color="#6f6f6f">themorningnews.com</font>
Beyond chatbots: How CoCounsel Legal delivers AI legal research you can trust - Thomson Reuters Legal Solutions
<a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxOS2ctUWtGazRtM1hMOFZTc2FxcFRETzJDei02THdYTW5qZm5VTUZBbVc4TmFiLWthc1VnUEZ4LUNKWnJpOFNObEpPbklTUW1kMTMybzNNOXRoZzlaQ01IZHVwNzVmRFo2SEpRQ3YxeFZHY0pwZ09NLU9SdE1XMDNObE1GZGo1ZGQ4SWVtUzA1SURkTjRJUWJMbzdxV0RxN1loMWd2U25WZXVleWQ5MGtYM2Z2cUd5Zw?oc=5" target="_blank">Beyond chatbots: How CoCounsel Legal delivers AI legal research you can trust</a> <font color="#6f6f6f">Thomson Reuters Legal Solutions</font>
Materials Project Database Growth Supports AI-Ready Materials Science Research - Lab Manager
<a href="https://news.google.com/rss/articles/CBMisAFBVV95cUxQY2NKN3ZGYng1LW96b3RJa0Rlei04WXFleEttdWJYODJWX05Wc3NJRDJPUmw2VEZ0b1ltbkFoVGdxRWUzR01vcUdzRU1EWHRnUEg5VEhrWkZMLVJ6YUZUUDhiLW5OdWprOWJlck1EOExLZ0pzVW9UOFZqTTRNdGZOY2RRcE1NV0ZHdjJ1WDVXYlZXQVdBMjBLOWpjX29Ednhxb2xmTlRsRkQ4aDgxS2ZEQw?oc=5" target="_blank">Materials Project Database Growth Supports AI-Ready Materials Science Research</a> <font color="#6f6f6f">Lab Manager</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
PharosAI and 10x Genomics Partner to Transform Cancer Research with AI and Spatial Biology - PR Newswire
<a href="https://news.google.com/rss/articles/CBMi3wFBVV95cUxPNlFmMldlS0lDR2JWc052YXpVMnRudUZHWE16M25GR1NWbENQd0I5RWlaamFYREdsRnB1dXlWSExwV183TXVjcGwxZ3Y0eEFlX1dELUhtNkNqWDg4V1FvOFhfWFRUS2Nockd4MmhiMjdRWDJwS1ZYOVMzVVRaQU5zOUNudWpEWmk1Sjc3TFpTbWhfM2VvdWZhWlY1ZkVmUlNxRWRzX1pSanhmb2I0RFdNSmxqRnZEd1gyLUpXU2l3M3BLaHAtR1BrVWpDQjB3WFNRUnhNR3hZQTdva3planA0?oc=5" target="_blank">PharosAI and 10x Genomics Partner to Transform Cancer Research with AI and Spatial Biology</a> <font color="#6f6f6f">PR Newswire</font>
Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly - research.google
<a href="https://news.google.com/rss/articles/CBMiqwFBVV95cUxNZlIydE4tc3hxMjh6enFJRVRqNWZzcFQ5Szl4M3d2QWxzOGsxMlQwTnVUU3NEYTlsODlmMFo2Xy1ULU11cF8xTnJYZXRmT3VwTGdKbGpHOXdkRWpHc3hJME9MUlB5ZmVGYzZlbF9FcllZRm5vVmpFdHFWZmoxQ1VxUHJPWUQ3VV9LVUxENHJnazhoRGxBUDBzT1p1SzkwMFFaRml2cmpqMW5NTkE?oc=5" target="_blank">Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly</a> <font color="#6f6f6f">research.google</font>
US data centers’ energy use amid the artificial intelligence boom - pewresearch.org
<a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxPb1lqZC1Wdnk4aEwzVVFZZ01DTmxycVRBWENTTUFpSGdZZ2NWYlFnWDdWVXBzbjhIZnJpZ1V6akc5YnVQY2pTVjFPSDQ1dUlLN3ZiVjhaM2dXTVplU29hWndlSU9SeTNGc2JqRVQ3b1lWUnJoVXdQRmR4dC1ITkNIdDg5TWpwVVJrc1lDZVJ4X2dRNzlqaWJOdGpodS1Va1pQeFRTRGhLZUJUQVhvUlBEbVFlM2gwSlRY?oc=5" target="_blank">US data centers’ energy use amid the artificial intelligence boom</a> <font color="#6f6f6f">pewresearch.org</font>
Researchers Uncover Hidden Ingredients Behind AI Creativity - Quanta Magazine
<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxPSTRPVlIyREgzM2xsT0dhcDJoZXZqS25hSkFWODJGQ1JQUlNQb21RQXdmd0ZoSHB0RlFncjlpUTMyM3RBVHRFNGJNR3cxNzdkX2ZhcjZzLWR0UWhDdFNESmJabXdINUdZOEMxOW1mcHFQOWhZSGZFZFp2czFVWnZ0TE52OUx2cFlXekJvakdsSVdNcFcwTk55RUhXVm1YRWdfQ0E?oc=5" target="_blank">Researchers Uncover Hidden Ingredients Behind AI Creativity</a> <font color="#6f6f6f">Quanta Magazine</font>

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!