Models model training announce feature study arxiv

CT-to-X-ray Distillation Under Tiny Paired Cohorts: An Evidence-Bounded Reproducible Pilot Study

arXiv cs.CVby Bo Ma, Jinsong Wu, Weiqi Yan, Hongjiang WeiApril 1, 20262 min read0 views

arXiv:2603.29167v1 Announce Type: new Abstract: Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control

View PDF HTML (experimental)

Abstract:Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control attains the highest mean result on the four-image validation subset (0.875 accuracy and 0.714 macro-F1), whereas the full module-augmented JDCNet variant remains at 0.750 accuracy and 0.429 macro-F1. To test whether that ranking is a split artifact, we additionally run eight patient-level Monte Carlo resamples with same-case comparisons, stronger mechanism controls based on attention transfer and feature hints, and imbalance-sensitive analyses. Under this resampled protocol, late fusion attains the highest mean accuracy (0.885), same-modality distillation attains the highest mean macro-F1 (0.554) and balanced accuracy (0.660), the plain cross-modal control drops to 0.500 mean balanced accuracy, and neither attention transfer nor feature hints recover a robust cross-modality advantage. The contribution of this study is therefore not a validated CT-to-X-ray architecture, but a reproducible and evidence-bounded pilot protocol that makes the exact task definition, failure modes, ranking instability, and the minimum requirements for future credible CT-to-X-ray transfer claims explicit.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29167 [cs.CV]

(or arXiv:2603.29167v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29167

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Bo Ma Dr [view email] [v1] Tue, 31 Mar 2026 02:25:33 UTC (673 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2603.29167

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modeltrainingannounce

Analyst NewsFresh

Just in Time World Modeling Supports Human Planning and Reasoning

An overview of a state-of-the-art study, uncovering simulation-based reasoning, a "just-in-time" framework and how it helps improve predictions in the context of supporting human planning and reasoning.

KDnuggets

1mabout 2 hours ago

Models

A lung CT vision foundation model facilitating disease diagnosis and medical imaging - Nature

A lung CT vision foundation model facilitating disease diagnosis and medical imaging Nature

GNews AI diffusion

1m4 months ago

Research PapersLive

AI Could Become 2,000 Times More Efficient by Copying the Brain: Study

Researchers from Loughborough University are looking into a new type of computer chip that could make AI far more energy efficient.

Decrypt AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 159 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Google DeepMind dropped Gemma 4 today: Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context Both are natively multimodal (text, image, video, dynamic resolution). We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful). Free playground if you want to test without spinning anything up: https://www.modular.com/#playground submitted by /u/carolinedfrasca [link] [comments]

Reddit r/MachineLearning

1m24 minutes ago

Models

A lung CT vision foundation model facilitating disease diagnosis and medical imaging - Nature

A lung CT vision foundation model facilitating disease diagnosis and medical imaging Nature

GNews AI diffusion

1m4 months ago

ModelsFresh

b8637: model, mtmd: fix gguf conversion for audio/vision mmproj (#21309)

fix gguf conversion for audio/vision mmproj fix test

llama.cpp Releases

1mabout 3 hours ago

ModelsLive

Open Models have crossed a threshold

💡 TL;DR: Open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks — file operations, tool use, and instruction following — at a fraction of the cost and latency. Here s what our evals show and how to start using them

LangChain Blog

1m34 minutes ago