Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessWhat is the effect on the Human mind from AI?discuss.huggingface.coUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AIIntroducing ForestFire, a new tree-learning libraryMedium AIBuy Verified Coinbase Accounts - 100% active and safeDev.to AI90% людей используют нейросети как поисковик. И проигрывают.Dev.to AIContinuing the idea of building a one-person unicorn, it is important to recognize that this…Medium AIHow to Build an AI Content Playbook That Actually Protects Your VoiceDev.to AIExploring Early Web Patterns for Modern AI Agent DevelopmentDev.to AIUnderstanding NLP Token Classification : A Beginner-Friendly GuideMedium AIHow Do You Actually Scale High-Throughput LLM Serving in Production with vLLM?Medium AIGemma 4 and the On-Device AI Revolution No One Prepared You ForDev.to AI5 Claude Entrances That Doubled My Workflow EfficiencyDev.to AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessWhat is the effect on the Human mind from AI?discuss.huggingface.coUnderstanding Token Classification in NLP: NER, POS Tagging & Chunking ExplainedMedium AIIntroducing ForestFire, a new tree-learning libraryMedium AIBuy Verified Coinbase Accounts - 100% active and safeDev.to AI90% людей используют нейросети как поисковик. И проигрывают.Dev.to AIContinuing the idea of building a one-person unicorn, it is important to recognize that this…Medium AIHow to Build an AI Content Playbook That Actually Protects Your VoiceDev.to AIExploring Early Web Patterns for Modern AI Agent DevelopmentDev.to AIUnderstanding NLP Token Classification : A Beginner-Friendly GuideMedium AIHow Do You Actually Scale High-Throughput LLM Serving in Production with vLLM?Medium AIGemma 4 and the On-Device AI Revolution No One Prepared You ForDev.to AI5 Claude Entrances That Doubled My Workflow EfficiencyDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

arXivby [Submitted on 27 Mar 2026]March 30, 20262 min read1 views
Source Quiz

arXiv:2603.26049v1 Announce Type: cross Abstract: Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propos — Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao

View PDF HTML (experimental)

Abstract:Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propose a context-infused vision encoder that models how radiologists integrate clinical context -- including patient history, symptoms, and diagnostic intent -- to guide diagnostic reasoning. We then present a multi-level supervision paradigm that (1) enforces intra- and inter-modal semantic alignment through hybrid-positive contrastive learning, (2) injects diagnostic priors via disease-aware cross-modal representation learning, and (3) leverages radiologists' gaze as probabilistic priors to guide attention toward diagnostically salient regions. Extensive experiments demonstrate that CoGaze consistently outperforms state-of-the-art methods across diverse tasks, achieving up to +2.0% CheXbertF1 and +1.2% BLEU2 for free-text and structured report generation, +23.2% AUROC for zero-shot classification, and +12.2% Precision@1 for image-text retrieval. Code is available at this https URL.

Comments: Code: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26049 [cs.CV]

(or arXiv:2603.26049v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26049

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kang Liu [view email] [v1] Fri, 27 Mar 2026 03:37:52 UTC (2,559 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Seeing Like…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 149 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!