CT-to-X-ray Distillation Under Tiny Paired Cohorts: An Evidence-Bounded Reproducible Pilot Study
arXiv:2603.29167v1 Announce Type: new Abstract: Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control
View PDF HTML (experimental)
Abstract:Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control attains the highest mean result on the four-image validation subset (0.875 accuracy and 0.714 macro-F1), whereas the full module-augmented JDCNet variant remains at 0.750 accuracy and 0.429 macro-F1. To test whether that ranking is a split artifact, we additionally run eight patient-level Monte Carlo resamples with same-case comparisons, stronger mechanism controls based on attention transfer and feature hints, and imbalance-sensitive analyses. Under this resampled protocol, late fusion attains the highest mean accuracy (0.885), same-modality distillation attains the highest mean macro-F1 (0.554) and balanced accuracy (0.660), the plain cross-modal control drops to 0.500 mean balanced accuracy, and neither attention transfer nor feature hints recover a robust cross-modality advantage. The contribution of this study is therefore not a validated CT-to-X-ray architecture, but a reproducible and evidence-bounded pilot protocol that makes the exact task definition, failure modes, ranking instability, and the minimum requirements for future credible CT-to-X-ray transfer claims explicit.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.29167 [cs.CV]
(or arXiv:2603.29167v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.29167
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Bo Ma Dr [view email] [v1] Tue, 31 Mar 2026 02:25:33 UTC (673 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modeltrainingannounceKnowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
![[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-graph-nodes-a2pnJLpyKmDnxKWLd5BEAb.webp)
[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell
Google DeepMind dropped Gemma 4 today: Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context Both are natively multimodal (text, image, video, dynamic resolution). We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful). Free playground if you want to test without spinning anything up: https://www.modular.com/#playground submitted by /u/carolinedfrasca [link] [comments]

Open Models have crossed a threshold
💡 TL;DR: Open models like GLM-5 and MiniMax M2.7 now match closed frontier models on core agent tasks — file operations, tool use, and instruction following — at a fraction of the cost and latency. Here s what our evals show and how to start using them





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!