Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI data center boom ‘stress tests’ insurers as private capital floods inCNBC Technologymorningbrew.comtrunk/bac8607b42eebcd1173c3c8b6a6afa62ccb4c3b8: [vllm hash update] update the pinned vllm hash (#179439)PyTorch ReleasesThe Greatest Risk of AI in Higher Education Isn’t Cheating – It’s the Erosion of Learning Itself - The Good Men ProjectGNews AI education€500 billion-worth European data economy troubles continue - Euronews.comGNews AI EUHow AI Is Changing Lead Generation: 3 Key Things SEO & PPC Teams Need To Do Now - Search Engine JournalGNews AI searchciflow/trunk/179196: UpdatePyTorch Releasesciflow/trunk/179195: UpdatePyTorch ReleasesCan your AI rewrite your code in assembly?Hacker News AI TopAI Agents Are Coming for Your Waiting Room. That’s Just the Start. - CDOTrendsGNews AI agenticMicrosoft to Invest US$5.5 Billion in Singapore’s Cloud, AI Infrastructure - Fintech SingaporeGNews AI SingaporeAI Has Already Decided: First-Party Data Will Define Advertising’s Agentic Era - AdExchangerGNews AI agenticBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI data center boom ‘stress tests’ insurers as private capital floods inCNBC Technologymorningbrew.comtrunk/bac8607b42eebcd1173c3c8b6a6afa62ccb4c3b8: [vllm hash update] update the pinned vllm hash (#179439)PyTorch ReleasesThe Greatest Risk of AI in Higher Education Isn’t Cheating – It’s the Erosion of Learning Itself - The Good Men ProjectGNews AI education€500 billion-worth European data economy troubles continue - Euronews.comGNews AI EUHow AI Is Changing Lead Generation: 3 Key Things SEO & PPC Teams Need To Do Now - Search Engine JournalGNews AI searchciflow/trunk/179196: UpdatePyTorch Releasesciflow/trunk/179195: UpdatePyTorch ReleasesCan your AI rewrite your code in assembly?Hacker News AI TopAI Agents Are Coming for Your Waiting Room. That’s Just the Start. - CDOTrendsGNews AI agenticMicrosoft to Invest US$5.5 Billion in Singapore’s Cloud, AI Infrastructure - Fintech SingaporeGNews AI SingaporeAI Has Already Decided: First-Party Data Will Define Advertising’s Agentic Era - AdExchangerGNews AI agentic
AI NEWS HUBbyEIGENVECTOREigenvector

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

arXivby [Submitted on 26 Mar 2026]March 26, 20262 min read1 views
Source Quiz

Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling long-range cross-modal context. To address this issue, we decompose interleaved generation into textual planning and visual consistency modeling, and introduce a framework consisting of a planner and a visualizer. The planner produces dense textual descriptions for visual content — Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo

Authors:Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang

View PDF HTML (experimental)

Abstract:Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling long-range cross-modal context. To address this issue, we decompose interleaved generation into textual planning and visual consistency modeling, and introduce a framework consisting of a planner and a visualizer. The planner produces dense textual descriptions for visual content, while the visualizer synthesizes images accordingly. Under this guidance, we construct large-scale textual-proxy interleaved data (where visual content is represented in text) to train the planner, and curate reference-guided image data to train the visualizer. These designs give rise to Wan-Weaver, which exhibits emergent interleaved generation ability with long-range textual coherence and visual consistency. Meanwhile, the integration of diverse understanding and generation data into planner training enables Wan-Weaver to achieve robust task reasoning and generation proficiency. To assess the model's capability in interleaved generation, we further construct a benchmark that spans a wide range of use cases across multiple dimensions. Extensive experiments demonstrate that, even without access to any real interleaved data, Wan-Weaver achieves superior performance over existing methods.

Comments: CVPR 2026 Camera-ready, Webpage: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.25706 [cs.CV]

(or arXiv:2603.25706v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25706

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jinbo Xing [view email] [v1] Thu, 26 Mar 2026 17:50:37 UTC (21,809 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Wan-Weaver:…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 213 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers