Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

ConInfer: Context-Aware Inference for Training-Free Open-Vocabulary Remote Sensing Segmentation

arXiv cs.CVby Wenyang Chen, Zhanxuan Hu, Yaping Zhang, Hailong Ning, Yonghang TaiApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29271v1 Announce Type: new Abstract: Training-free open-vocabulary remote sensing segmentation (OVRSS), empowered by vision-language models, has emerged as a promising paradigm for achieving category-agnostic semantic understanding in remote sensing imagery. Existing approaches mainly focus on enhancing feature representations or mitigating modality discrepancies to improve patch-level prediction accuracy. However, such independent prediction schemes are fundamentally misaligned with the intrinsic characteristics of remote sensing data. In real-world applications, remote sensing scenes are typically large-scale and exhibit strong spatial as well as semantic correlations, making isolated patch-wise predictions insufficient for accurate segmentation. To address this limitation, we

View PDF HTML (experimental)

Abstract:Training-free open-vocabulary remote sensing segmentation (OVRSS), empowered by vision-language models, has emerged as a promising paradigm for achieving category-agnostic semantic understanding in remote sensing imagery. Existing approaches mainly focus on enhancing feature representations or mitigating modality discrepancies to improve patch-level prediction accuracy. However, such independent prediction schemes are fundamentally misaligned with the intrinsic characteristics of remote sensing data. In real-world applications, remote sensing scenes are typically large-scale and exhibit strong spatial as well as semantic correlations, making isolated patch-wise predictions insufficient for accurate segmentation. To address this limitation, we propose ConInfer, a context-aware inference framework for OVRSS that performs joint prediction across multiple spatial units while explicitly modeling their inter-unit semantic dependencies. By incorporating global contextual cues, our method significantly enhances segmentation consistency, robustness, and generalization in complex remote sensing environments. Extensive experiments on multiple benchmark datasets demonstrate that our approach consistently surpasses state-of-the-art per-pixel VLM-based baselines such as SegEarth-OV, achieving average improvements of 2.80% and 6.13% on open-vocabulary semantic segmentation and object extraction tasks, respectively. The implementation code is available at: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29271 [cs.CV]

(or arXiv:2603.29271v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29271

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Wenyang Chen [view email] [v1] Tue, 31 Mar 2026 05:12:02 UTC (22,408 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
ConInfer: C…modellanguage mo…benchmarktrainingannounceavailablearXiv cs.CV

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 147 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!