Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAnnouncing Doublehaven with Reflections on HumourLessWrong AIHow a Monorepo Keeps Multiple Projects in Sync - From Shared Code to Atomic DeploymentsDEV CommunityStep‑by‑Step Guide: Generate PowerPoint Slides Using Copilot Studio AgentDEV CommunitySecuring the Agentic Frontier: Why Your AI Agents Need a "Citadel" 🏰DEV CommunityClaude Code's Leaked Source: A Real-World Masterclass in Harness EngineeringDEV CommunityI Built an AI PPT Maker and Resume Builder WebsiteDEV CommunityHDF5 vs. TsFile: Efficient Time-Series Data StorageDEV CommunityFinnish neurowellness startup Audicin raises $1.9MThe Next Web NeuralThere Is No Such Thing As a ServiceDEV CommunityHow MERX Aggregates All Energy Providers Into One APIDEV CommunityNew Map Split Code in Nebula: Say Goodbye to Endless and Opaque C++ BuildsDEV Community🙀 Anthropic accidentally leaked Claude Code's entire source code - The NeuronGoogle News: ClaudeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAnnouncing Doublehaven with Reflections on HumourLessWrong AIHow a Monorepo Keeps Multiple Projects in Sync - From Shared Code to Atomic DeploymentsDEV CommunityStep‑by‑Step Guide: Generate PowerPoint Slides Using Copilot Studio AgentDEV CommunitySecuring the Agentic Frontier: Why Your AI Agents Need a "Citadel" 🏰DEV CommunityClaude Code's Leaked Source: A Real-World Masterclass in Harness EngineeringDEV CommunityI Built an AI PPT Maker and Resume Builder WebsiteDEV CommunityHDF5 vs. TsFile: Efficient Time-Series Data StorageDEV CommunityFinnish neurowellness startup Audicin raises $1.9MThe Next Web NeuralThere Is No Such Thing As a ServiceDEV CommunityHow MERX Aggregates All Energy Providers Into One APIDEV CommunityNew Map Split Code in Nebula: Say Goodbye to Endless and Opaque C++ BuildsDEV Community🙀 Anthropic accidentally leaked Claude Code's entire source code - The NeuronGoogle News: Claude

LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.28082v1 Announce Type: new Abstract: Generating coherent and communicative visual sequences, such as image sequences and videos, remains a significant challenge for current multimodal systems. Despite advances in visual quality and the integration of world knowledge, existing models still struggle to maintain logical flow, often resulting in disjointed actions, fragmented narratives, and unclear storylines. We attribute these issues to the lack of attention to visual logic, a critical yet underexplored dimension of visual sequence generation that we define as the perceptual and caus — Chutian Meng, Fan Ma, Chi Zhang, Jiaxu Miao, Yi Yang, Yueting Zhuang

View PDF HTML (experimental)

Abstract:Generating coherent and communicative visual sequences, such as image sequences and videos, remains a significant challenge for current multimodal systems. Despite advances in visual quality and the integration of world knowledge, existing models still struggle to maintain logical flow, often resulting in disjointed actions, fragmented narratives, and unclear storylines. We attribute these issues to the lack of attention to visual logic, a critical yet underexplored dimension of visual sequence generation that we define as the perceptual and causal coherence among characters, actions, and scenes over time. To bridge this gap, we propose a logic-aware multi-image story visualization framework, LogiStory. The framework is built around the central innovation of explicitly modeling visual logic in story visualization. To realize this idea, we design a multi-agent system that grounds roles, extracts causal chains, and verifies story-level consistency, transforming narrative coherence from an implicit byproduct of image generation into an explicit modeling objective. This design effectively bridges structured story planning with visual generation, enhancing both narrative clarity and visual quality in story visualization. Furthermore, to evaluate the generation capacity, we construct LogicTale, a benchmark comprising richly annotated stories, emphasizing causal reasoning, and visual logic interpretability. We establish comprehensive automatic and human evaluation protocols designed to measure both visual logic and perceptual quality. Experiments demonstrate that our approach significantly improves the narrative logic of generated visual stories. This work provides a foundational step towards modeling and enforcing visual logic in general image sequence and video generation tasks.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)

Cite as: arXiv:2603.28082 [cs.CV]

(or arXiv:2603.28082v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.28082

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chutian Meng [view email] [v1] Mon, 30 Mar 2026 06:37:12 UTC (15,243 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
LogiStory: …researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 226 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers