Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessTwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityRescuing 216 Pages from the GeoCities Era: How I Built an HTML-to-Blogger ToolDEV CommunityQ&A: AWS on new AI agents, quantum computing in healthcare - MobiHealthNewsGNews AI quantumGemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models - eWeekGNews AI GemmaBlack Hat USADark ReadingBlack Hat AsiaAI BusinessTwo Subtle Bugs That Broke Our Remotion Vercel Sandbox (And How We Fixed Them)DEV CommunityZero-Shot Attack Transfer on Gemma 4 (E4B-IT)DEV CommunityGetting Started with the Gemini API: A Practical GuideDEV CommunityLAB: Terraform Dependencies (Implicit vs Explicit)DEV CommunityDesigning a UI That AI Can Actually Understand (CortexUI Deep Dive)DEV CommunityI Went to a Hot Spring via API Call at MidnightDEV CommunityStrong,Perfect,Neon Number ProgramsDEV CommunityThe Mandate Had No Return AddressDEV CommunityCursor AI Review 2026: The Code Editor That Thinks Alongside YouDEV CommunityRescuing 216 Pages from the GeoCities Era: How I Built an HTML-to-Blogger ToolDEV CommunityQ&A: AWS on new AI agents, quantum computing in healthcare - MobiHealthNewsGNews AI quantumGemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models - eWeekGNews AI Gemma
AI NEWS HUBbyEIGENVECTOREigenvector

FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery

arXivby [Submitted on 22 Feb 2026 (v1), last revised 30 Mar 2026 (this version, v3)]March 31, 20262 min read1 views
Source Quiz

arXiv:2602.19190v3 Announce Type: replace-cross Abstract: Research on the intelligent interpretation of all-weather, all-time Synthetic Aperture Radar (SAR) is crucial for advancing remote sensing applications. In recent years, although Visual Language Models (VLMs) have demonstrated strong open-world understanding capabilities on RGB images, their performance is severely limited when directly applied to the SAR field due to the complexity of the imaging mechanism, sensitivity to scattering features, and the scarcity of high-quality text corpora. To systematically address this issue, we constr — Xiaokun Zhang, Yi Yang, Ziqi Ye, Baiyun, Xiaorong Guo, Qingchen Fang, Ruyi Zhang, Xinpeng Zhou, Haipeng Wang

View PDF HTML (experimental)

Abstract:Research on the intelligent interpretation of all-weather, all-time Synthetic Aperture Radar (SAR) is crucial for advancing remote sensing applications. In recent years, although Visual Language Models (VLMs) have demonstrated strong open-world understanding capabilities on RGB images, their performance is severely limited when directly applied to the SAR field due to the complexity of the imaging mechanism, sensitivity to scattering features, and the scarcity of high-quality text corpora. To systematically address this issue, we constructed the inaugural SAR Image-Text-AlphaEarth feature triplet dataset and developed FUSAR-GPT, a VLM specifically for SAR. FUSAR-GPT innovatively introduces a geospatial baseline model as a 'world knowledge' prior and embeds multi-source remote-sensing temporal features into the model's visual backbone via 'spatiotemporal anchors', enabling dynamic compensation for the sparse representation of targets in SAR images. Furthermore, we designed a two-stage SFT strategy to decouple the knowledge injection and task execution of large models. The spatiotemporal feature embedding and the two-stage decoupling paradigm enable FUSAR-GPT to achieve state-of-the-art performance across several typical remote sensing visual-language benchmark tests, significantly outperforming mainstream baseline models by over 10%.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2602.19190 [cs.CV]

(or arXiv:2602.19190v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2602.19190

arXiv-issued DOI via DataCite

Submission history

From: Xiaokun Zhang [view email] [v1] Sun, 22 Feb 2026 13:40:17 UTC (21,833 KB) [v2] Thu, 26 Feb 2026 09:45:03 UTC (21,833 KB) [v3] Mon, 30 Mar 2026 02:38:55 UTC (27,924 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
FUSAR-GPT :…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!