Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessExabeam expands Agent Behavior Analytics to secure AI agents across ChatGPT, Copilot and Gemini - SiliconANGLEGoogle News: ChatGPTChatGPT users can now access Shutterstock images, video and music - Stock TitanGoogle News: ChatGPTKids groups say they didn’t know OpenAI was behind their child safety coalition - The San Francisco StandardGoogle News: AI SafetyRoland Go:Mixer Studio review: Portable, professional and plenty of polishEngadgetShutterstock Launches Licensed Content App in ChatGPT, Bringing Commercial-Ready Assets into AI-Native Workflows - PR NewswireGoogle News: ChatGPTInvestors Hedge Bets As Gold Gains Despite Risk-On MoodInternational Business TimesGen AI startup Runway announces $10m venture fund - Global VenturingGoogle News: Generative AI7 Essential AI Website Builders: From Prompt to Production - KDnuggetsGoogle News: Machine LearningThe IT department: Where AI goes to die - The EconomistGoogle News: AIWho is Demis Hassabis, the man behind Google DeepMind? - The EconomistGoogle News: AIUAE Reportedly Preparing To Join The War And Help The U.S. Reopen The Strait Of HormuzInternational Business TimesBlack Hat USADark ReadingBlack Hat AsiaAI BusinessExabeam expands Agent Behavior Analytics to secure AI agents across ChatGPT, Copilot and Gemini - SiliconANGLEGoogle News: ChatGPTChatGPT users can now access Shutterstock images, video and music - Stock TitanGoogle News: ChatGPTKids groups say they didn’t know OpenAI was behind their child safety coalition - The San Francisco StandardGoogle News: AI SafetyRoland Go:Mixer Studio review: Portable, professional and plenty of polishEngadgetShutterstock Launches Licensed Content App in ChatGPT, Bringing Commercial-Ready Assets into AI-Native Workflows - PR NewswireGoogle News: ChatGPTInvestors Hedge Bets As Gold Gains Despite Risk-On MoodInternational Business TimesGen AI startup Runway announces $10m venture fund - Global VenturingGoogle News: Generative AI7 Essential AI Website Builders: From Prompt to Production - KDnuggetsGoogle News: Machine LearningThe IT department: Where AI goes to die - The EconomistGoogle News: AIWho is Demis Hassabis, the man behind Google DeepMind? - The EconomistGoogle News: AIUAE Reportedly Preparing To Join The War And Help The U.S. Reopen The Strait Of HormuzInternational Business Times

Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2511.17209v2 Announce Type: replace Abstract: We introduce SPECTRE, a fully transformer-based foundation model for volumetric computed tomography (CT). Our Self-Supervised & Cross-Modal Pretraining for CT Representation Extraction (SPECTRE) approach utilizes scalable 3D Vision Transformer architectures and modern self-supervised and vision-language pretraining strategies to learn general-purpose CT representations. Volumetric CT poses unique challenges, such as extreme token scaling, geometric anisotropy, and weak or noisy clinical supervision, that make standard transformer and cont — Cris Claessens, Christiaan Viviers, Giacomo D'Amicantonio, Egor Bondarev, Fons van der Sommen

View PDF HTML (experimental)

Abstract:We introduce SPECTRE, a fully transformer-based foundation model for volumetric computed tomography (CT). Our Self-Supervised & Cross-Modal Pretraining for CT Representation Extraction (SPECTRE) approach utilizes scalable 3D Vision Transformer architectures and modern self-supervised and vision-language pretraining strategies to learn general-purpose CT representations. Volumetric CT poses unique challenges, such as extreme token scaling, geometric anisotropy, and weak or noisy clinical supervision, that make standard transformer and contrastive learning recipes ineffective out of the box. The framework jointly optimizes a local transformer for high-resolution volumetric feature extraction and a global transformer for whole-scan context modeling, making large-scale 3D attention computationally tractable. Notably, SPECTRE is trained exclusively on openly available CT datasets, demonstrating that high-performing, generalizable representations can be achieved without relying on private data. Pretraining combines DINO-style self-distillation with SigLIP-based vision-language alignment using paired radiology reports, yielding features that are both geometrically consistent and clinically meaningful. Across multiple CT benchmarks, SPECTRE consistently outperforms prior CT foundation models in both zero-shot and fine-tuned settings, establishing SPECTRE as a scalable, open, and fully transformer-based foundation model for 3D medical imaging.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2511.17209 [cs.CV]

(or arXiv:2511.17209v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.17209

arXiv-issued DOI via DataCite

Submission history

From: Cris Claessens [view email] [v1] Fri, 21 Nov 2025 12:41:27 UTC (1,335 KB) [v2] Mon, 30 Mar 2026 12:09:34 UTC (1,345 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Scaling Sel…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 198 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers