Research Papers research paper arxiv computer-vision image-recognition

Missing No More: Dictionary-Guided Cross-Modal Image Fusion under Missing Infrared

arXivby [Submitted on 9 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read1 views

arXiv:2603.08018v2 Announce Type: replace Abstract: Infrared-visible (IR-VIS) image fusion is vital for perception and security, yet most methods rely on the availability of both modalities during training and inference. When the infrared modality is absent, pixel-space generative substitutes become hard to control and inherently lack interpretability. We address missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary. The pipeline comprises three key components: (1) Joint Shared-dictionary Representation Learning (JSRL) lea — Yafei Zhang, Meng Ma, Huafeng Li, Yu Liu

View PDF HTML (experimental)

Abstract:Infrared-visible (IR-VIS) image fusion is vital for perception and security, yet most methods rely on the availability of both modalities during training and inference. When the infrared modality is absent, pixel-space generative substitutes become hard to control and inherently lack interpretability. We address missing-IR fusion by proposing a dictionary-guided, coefficient-domain framework built upon a shared convolutional dictionary. The pipeline comprises three key components: (1) Joint Shared-dictionary Representation Learning (JSRL) learns a unified and interpretable atom space shared by both IR and VIS modalities; (2) VIS-Guided IR Inference (VGII) transfers VIS coefficients to pseudo-IR coefficients in the coefficient domain and performs a one-step closed-loop refinement guided by a frozen large language model as a weak semantic prior; and (3) Adaptive Fusion via Representation Inference (AFRI) merges VIS structures and inferred IR cues at the atom level through window attention and convolutional mixing, followed by reconstruction with the shared dictionary. This encode-transfer-fuse-reconstruct pipeline avoids uncontrolled pixel-space generation while ensuring prior preservation within interpretable dictionary-coefficient representation. Experiments under missing-IR settings demonstrate consistent improvements in perceptual quality and downstream detection performance. To our knowledge, this represents the first framework that jointly learns a shared dictionary and performs coefficient-domain inference-fusion to tackle missing-IR fusion. The source code is publicly available at this https URL.

Comments: This paper has been accepted by CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.08018 [cs.CV]

(or arXiv:2603.08018v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.08018

arXiv-issued DOI via DataCite

Submission history

From: Meng Ma [view email] [v1] Mon, 9 Mar 2026 06:48:46 UTC (41,256 KB) [v2] Wed, 1 Apr 2026 03:06:33 UTC (41,255 KB)

Original source

arXiv

https://arxiv.org/abs/2603.08018

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

No comments yet — be the first to share your thoughts!