FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery
arXiv:2602.19190v3 Announce Type: replace-cross Abstract: Research on the intelligent interpretation of all-weather, all-time Synthetic Aperture Radar (SAR) is crucial for advancing remote sensing applications. In recent years, although Visual Language Models (VLMs) have demonstrated strong open-world understanding capabilities on RGB images, their performance is severely limited when directly applied to the SAR field due to the complexity of the imaging mechanism, sensitivity to scattering features, and the scarcity of high-quality text corpora. To systematically address this issue, we constr — Xiaokun Zhang, Yi Yang, Ziqi Ye, Baiyun, Xiaorong Guo, Qingchen Fang, Ruyi Zhang, Xinpeng Zhou, Haipeng Wang
View PDF HTML (experimental)
Abstract:Research on the intelligent interpretation of all-weather, all-time Synthetic Aperture Radar (SAR) is crucial for advancing remote sensing applications. In recent years, although Visual Language Models (VLMs) have demonstrated strong open-world understanding capabilities on RGB images, their performance is severely limited when directly applied to the SAR field due to the complexity of the imaging mechanism, sensitivity to scattering features, and the scarcity of high-quality text corpora. To systematically address this issue, we constructed the inaugural SAR Image-Text-AlphaEarth feature triplet dataset and developed FUSAR-GPT, a VLM specifically for SAR. FUSAR-GPT innovatively introduces a geospatial baseline model as a 'world knowledge' prior and embeds multi-source remote-sensing temporal features into the model's visual backbone via 'spatiotemporal anchors', enabling dynamic compensation for the sparse representation of targets in SAR images. Furthermore, we designed a two-stage SFT strategy to decouple the knowledge injection and task execution of large models. The spatiotemporal feature embedding and the two-stage decoupling paradigm enable FUSAR-GPT to achieve state-of-the-art performance across several typical remote sensing visual-language benchmark tests, significantly outperforming mainstream baseline models by over 10%.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2602.19190 [cs.CV]
(or arXiv:2602.19190v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2602.19190
arXiv-issued DOI via DataCite
Submission history
From: Xiaokun Zhang [view email] [v1] Sun, 22 Feb 2026 13:40:17 UTC (21,833 KB) [v2] Thu, 26 Feb 2026 09:45:03 UTC (21,833 KB) [v3] Mon, 30 Mar 2026 02:38:55 UTC (27,924 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Anthropic just paid $400 million for a startup with fewer than 10 people
Anthropic has acquired Coefficient Bio, a stealth biotech AI startup founded barely eight months ago, in an all-stock deal worth just over $400 million. The acquisition brings a team of fewer than 10 people, nearly all former Genentech computational biology researchers, into Anthropic’s healthcare and life sciences division, and it signals something larger than a [ ] This story continues at The Next Web
![[R] Differentiable Clustering & Search !](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-graph-nodes-a2pnJLpyKmDnxKWLd5BEAb.webp)
[R] Differentiable Clustering & Search !
Hey guys, I occasionally write articles on my blog, and I am happy to share the new one with you : https://bornlex.github.io/posts/differentiable-clustering/ . It came from something I was working for at work, and we ended up implementing something else because of the constraints that we have. The method mixes different loss terms to achieve a differentiable clustering method that takes into account mutual info, semantic proximity and even constraints such as the developer enforcing two tags (could be documents) to be part of the same cluster. Then it is possible to search the catalog using the clusters. All of it comes from my mind, I used an AI to double check the sentences, spelling, so it might have rewritten a few sentences, but most of it is human made. I've added the research flair
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!