Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessPenemue raises €1.7M to scale AI hate speech detectionThe Next Web AIWhy Privileged Access is Becoming the Control Plane for Agentic AI - Security BoulevardGNews AI agenticI’m a college admissions counselor. I’ve changed my mind about students using ChatGPT - San Francisco ChronicleGoogle News: AIChatGPT Ads: New Acquisition Channel Or Just Another Brand Tax? - Search Engine JournalGoogle News: ChatGPTThe web can still be wonderful, and Flipboard’s Surf proves itFast Company TechItalian rehabilitation robotics startup Wearable Robotics raises €5M to expand its arm exoskeletonThe Next Web AIAnthropic Finds “Emotions” in Claude — What Does AI “Feel”? - incryptedGoogle News: ClaudeThe Morning After: NASA’s Artemis II is on a voyage around the MoonEngadget[D] Reviewer said he will increase his score but he hasn’t (yet)Reddit r/MachineLearningGoogle Gemini in Android Auto Starts Rolling Out More Widely - Thurrott.comGoogle News: GeminiDesktop Canary v2.1.48-canary.27LobeChat ReleasesThe leadership dilemma: Governing the “Agentic AI” workforce - TechRadarGNews AI agenticBlack Hat USADark ReadingBlack Hat AsiaAI BusinessPenemue raises €1.7M to scale AI hate speech detectionThe Next Web AIWhy Privileged Access is Becoming the Control Plane for Agentic AI - Security BoulevardGNews AI agenticI’m a college admissions counselor. I’ve changed my mind about students using ChatGPT - San Francisco ChronicleGoogle News: AIChatGPT Ads: New Acquisition Channel Or Just Another Brand Tax? - Search Engine JournalGoogle News: ChatGPTThe web can still be wonderful, and Flipboard’s Surf proves itFast Company TechItalian rehabilitation robotics startup Wearable Robotics raises €5M to expand its arm exoskeletonThe Next Web AIAnthropic Finds “Emotions” in Claude — What Does AI “Feel”? - incryptedGoogle News: ClaudeThe Morning After: NASA’s Artemis II is on a voyage around the MoonEngadget[D] Reviewer said he will increase his score but he hasn’t (yet)Reddit r/MachineLearningGoogle Gemini in Android Auto Starts Rolling Out More Widely - Thurrott.comGoogle News: GeminiDesktop Canary v2.1.48-canary.27LobeChat ReleasesThe leadership dilemma: Governing the “Agentic AI” workforce - TechRadarGNews AI agentic
AI NEWS HUBbyEIGENVECTOREigenvector

Progressive Prompt-Guided Cross-Modal Reasoning for Referring Image Segmentation

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.27993v1 Announce Type: new Abstract: Referring image segmentation aims to localize and segment a target object in an image based on a free-form referring expression. The core challenge lies in effectively bridging linguistic descriptions with object-level visual representations, especially when referring expressions involve detailed attributes and complex inter-object relationships. Existing methods either rely on cross-modal alignment or employ Semantic Segmentation Prompts, but they often lack explicit reasoning mechanisms for grounding language descriptions to target regions in t — Jiachen Li, Hongyun Wang, Jinyu Xu, Wenbo Jiang, Yanchun Ma, Yongjian Liu, Qing Xie, Bolong Zheng

View PDF HTML (experimental)

Abstract:Referring image segmentation aims to localize and segment a target object in an image based on a free-form referring expression. The core challenge lies in effectively bridging linguistic descriptions with object-level visual representations, especially when referring expressions involve detailed attributes and complex inter-object relationships. Existing methods either rely on cross-modal alignment or employ Semantic Segmentation Prompts, but they often lack explicit reasoning mechanisms for grounding language descriptions to target regions in the image. To address these limitations, we propose PPCR, a Progressive Prompt-guided Cross-modal Reasoning framework for referring image segmentation. PPCR explicitly structures the reasoning process as a Semantic Understanding-Spatial Grounding-Instance Segmentation pipeline. Specifically, PPCR first employs multimodal large language models (MLLMs) to generate Semantic Segmentation Prompt that capture key semantic cues of the target object. Based on this semantic context, Spatial Segmentation Prompt are further generated to reason about object location and spatial extent, enabling a progressive transition from semantic understanding to spatial grounding. The Semantic and Spatial Segmentation prompts are then jointly integrated into the segmentation module to guide accurate target localization and segmentation. Extensive experiments on standard referring image segmentation benchmarks demonstrate that PPCR consistently outperforms existing methods. The code will be publicly released to facilitate reproducibility.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27993 [cs.CV]

(or arXiv:2603.27993v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27993

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jiachen Li [view email] [v1] Mon, 30 Mar 2026 03:33:10 UTC (15,837 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Progressive…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 202 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers