Research Papers research paper arxiv ai artificial-intelligence

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

arXivby [Submitted on 27 Mar 2026]March 30, 20262 min read1 views

arXiv:2603.26049v1 Announce Type: cross Abstract: Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propos — Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao

View PDF HTML (experimental)

Abstract:Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretraining framework for chest X-rays. We first propose a context-infused vision encoder that models how radiologists integrate clinical context -- including patient history, symptoms, and diagnostic intent -- to guide diagnostic reasoning. We then present a multi-level supervision paradigm that (1) enforces intra- and inter-modal semantic alignment through hybrid-positive contrastive learning, (2) injects diagnostic priors via disease-aware cross-modal representation learning, and (3) leverages radiologists' gaze as probabilistic priors to guide attention toward diagnostically salient regions. Extensive experiments demonstrate that CoGaze consistently outperforms state-of-the-art methods across diverse tasks, achieving up to +2.0% CheXbertF1 and +1.2% BLEU2 for free-text and structured report generation, +23.2% AUROC for zero-shot classification, and +12.2% Precision@1 for image-text retrieval. Code is available at this https URL.

Comments: Code: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26049 [cs.CV]

(or arXiv:2603.26049v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26049

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Kang Liu [view email] [v1] Fri, 27 Mar 2026 03:37:52 UTC (2,559 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26049

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play Smithsonian Magazine

GNews AI Netherlands

1mabout 1 month ago

ProductsLive

How to Build an AI Content Playbook That Actually Protects Your Voice

Ahnii! You've read the articles warning you not to let AI take over your content. Ruth Doherty's latest piece is one of the best: a clear-eyed breakdown of where AI helps and where it silently destroys your brand. This post shows you how to take that framework and turn it into an actual operating document for your content pipeline. Why a Framework Without a Playbook Doesn't Stick Ruth's core argument is sharp: AI is an efficiency engine, not a strategy engine. Use it for research, structuring, repurposing, and editing. Keep it away from messaging, customer research, and anything that requires your actual point of view. That distinction is easy to agree with. It's harder to enforce on a Tuesday afternoon when you're behind on three social posts and the AI can draft all of them in 90 seconds

Dev.to AI

6m27 minutes ago

CountriesFresh

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties International Business Times Australia

Google News: Machine Learning

1mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 149 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

Submission history

Daily AI Digest

More about

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

How to Build an AI Content Playbook That Actually Protects Your Voice

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

URI Day Highlights Student Research and the Future of AI Education in Rhode Island - uri.edu

AI could transform patient education in eye care, new research shows - Medical Xpress

🥇Top AI Papers of the Week