Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Slop DetectorHacker News AI TopRambus Unveils HBM4E Controller: 16 GT/s, 2,048-Bit Interface, Enabling C-HBM4EEE TimesGPT reasoning models have "line of sight" to AGI, says OpenAI's Greg Brockman - the-decoder.comGoogle News: OpenAIGPT reasoning models have "line of sight" to AGI, says OpenAI s Greg BrockmanThe DecoderCornell study reveals AI can regenerate famous books with amazing accuracy, sparks copyright concerns - India TodayGNews AI copyrightStudy Finds ChatGPT May Help You Learn Faster, But There's a Catch - ScienceAlertGoogle News: ChatGPTThe Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI - TheSequenceGoogle News: Machine LearningOpenClaw Unlocks China’s AI Token Export BusinessBloomberg TechnologySector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025Crunchbase NewsSector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025 - Crunchbase NewsGNews AI startupsJob Pivots in the Age of AI: Lessons From Mike Mulligan and His Steam Shovel - MIT Sloan Management ReviewGoogle News: AIAre Multi-Agent Systems More Complex Than They Need to Be?The Data ExchangeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessAI Slop DetectorHacker News AI TopRambus Unveils HBM4E Controller: 16 GT/s, 2,048-Bit Interface, Enabling C-HBM4EEE TimesGPT reasoning models have "line of sight" to AGI, says OpenAI's Greg Brockman - the-decoder.comGoogle News: OpenAIGPT reasoning models have "line of sight" to AGI, says OpenAI s Greg BrockmanThe DecoderCornell study reveals AI can regenerate famous books with amazing accuracy, sparks copyright concerns - India TodayGNews AI copyrightStudy Finds ChatGPT May Help You Learn Faster, But There's a Catch - ScienceAlertGoogle News: ChatGPTThe Sequence Chat #835: Illia Polosukhin on NEAR AI, Authoring the Transformer Paper and Decentralized and Private AI - TheSequenceGoogle News: Machine LearningOpenClaw Unlocks China’s AI Token Export BusinessBloomberg TechnologySector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025Crunchbase NewsSector Snapshot: Venture Funding To Foundational AI Startups In Q1 Was Double All Of 2025 - Crunchbase NewsGNews AI startupsJob Pivots in the Age of AI: Lessons From Mike Mulligan and His Steam Shovel - MIT Sloan Management ReviewGoogle News: AIAre Multi-Agent Systems More Complex Than They Need to Be?The Data Exchange
AI NEWS HUBbyEIGENVECTOREigenvector

MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

arXivby [Submitted on 18 Mar 2026 (v1), last revised 27 Mar 2026 (this version, v2)]March 30, 20262 min read2 views
Source Quiz

arXiv:2603.17528v2 Announce Type: replace Abstract: Open-vocabulary segmentation enables pixel-level recognition from an open set of textual categories, allowing generalization beyond fixed classes. Despite great potential in remote sensing, progress in this area remains largely limited to clear-sky optical data and struggles under cloudy or haze-contaminated conditions. We present MM-OVSeg, a multimodal Optical-SAR fusion framework for resilient open-vocabulary segmentation under adverse weather conditions. MM-OVSeg leverages the complementary strengths of the two modalities--optical imagery — Yimin Wei, Aoran Xiao, Hongruixuan Chen, Junshi Xia, Naoto Yokoya

View PDF HTML (experimental)

Abstract:Open-vocabulary segmentation enables pixel-level recognition from an open set of textual categories, allowing generalization beyond fixed classes. Despite great potential in remote sensing, progress in this area remains largely limited to clear-sky optical data and struggles under cloudy or haze-contaminated conditions. We present MM-OVSeg, a multimodal Optical-SAR fusion framework for resilient open-vocabulary segmentation under adverse weather conditions. MM-OVSeg leverages the complementary strengths of the two modalities--optical imagery provides rich spectral semantics, while synthetic aperture radar (SAR) offers cloud-penetrating structural cues. To address the cross-modal domain gap and the limited dense prediction capability of current vision-language models, we propose two key designs: a cross-modal unification process for multi-sensor representation alignment, and a dual-encoder fusion module that integrates hierarchical features from multiple vision foundation models for text-aligned multimodal segmentation. Extensive experiments demonstrate that MM-OVSeg achieves superior robustness and generalization across diverse cloud conditions. The source dataset and code are available at this https URL.

Comments: CVPR2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.17528 [cs.CV]

(or arXiv:2603.17528v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.17528

arXiv-issued DOI via DataCite

Submission history

From: YiMin Wei [view email] [v1] Wed, 18 Mar 2026 09:34:23 UTC (2,631 KB) [v2] Fri, 27 Mar 2026 14:52:22 UTC (3,129 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MM-OVSeg:Mu…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 172 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers