MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis
arXiv:2602.17901v2 Announce Type: replace Abstract: Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis, but in 3D medical imaging they are still largely used separately for analysis and synthesis, respectively. Unifying them is appealing but difficult, because multi-source data exhibit pronounced style shifts while downstream tasks rely primarily on anatomy, causing anatomical content and acquisition style to become entangled. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. Our central idea is to treat unified pretraining under heterogeneous multi-center data as a factor identifiability problem, where content should consistently capture ana
View PDF HTML (experimental)
Abstract:Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis, but in 3D medical imaging they are still largely used separately for analysis and synthesis, respectively. Unifying them is appealing but difficult, because multi-source data exhibit pronounced style shifts while downstream tasks rely primarily on anatomy, causing anatomical content and acquisition style to become entangled. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. Our central idea is to treat unified pretraining under heterogeneous multi-center data as a factor identifiability problem, where content should consistently capture anatomy and style should consistently capture appearance. MeDUET addresses this problem through three components. Token demixing provides controllable supervision for factor separation, Mixed Factor Token Distillation reduces factor leakage under mixed regions, and Swap-invariance Quadruplet Contrast promotes factor-wise invariance and discriminability. With these learned factors, MeDUET transfers effectively to both synthesis and analysis, yielding higher fidelity, faster convergence, and better controllability for synthesis, while achieving competitive or superior domain generalization and label efficiency on diverse medical benchmarks. Overall, MeDUET shows that multi-source heterogeneity can serve as useful supervision, with disentanglement providing an effective interface for unifying 3D medical image synthesis and analysis. Our code is available at this https URL.
Subjects:
Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
Cite as: arXiv:2602.17901 [eess.IV]
(or arXiv:2602.17901v2 [eess.IV] for this version)
https://doi.org/10.48550/arXiv.2602.17901
arXiv-issued DOI via DataCite
Submission history
From: Junkai Liu [view email] [v1] Thu, 19 Feb 2026 23:45:23 UTC (3,515 KB) [v2] Sun, 5 Apr 2026 15:40:36 UTC (5,032 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelbenchmarktraining
Microservices Communication: REST, gRPC, and Message Queues
The Communication Problem in Microservices When you split a monolith into services, every function call becomes a network call. Network calls fail. They're slow. They're asynchronous. Choosing the right communication pattern determines whether your microservices work together or fight each other. Three Patterns 1. Synchronous REST Service A calls Service B, waits for a response. // Order Service calls Inventory Service async function createOrder ( items : OrderItem []) { // Check inventory (synchronous call) const availability = await fetch ( ' http://inventory-service/api/check ' , { method : ' POST ' , headers : { ' Content-Type ' : ' application/json ' }, body : JSON . stringify ({ items }), }). then ( r => r . json ()); if ( ! availability . allAvailable ) { throw new Error ( ' Some it

What Gemma 4's multi-token prediction head actually means for your eval pipeline
Gemma 4 dropped with a multi-token prediction (MTP) head and immediately every benchmark thread on r/LocalLLaMA and r/MachineLearning filled up with MMLU scores, HumanEval numbers, and throughput charts. Most of those benchmarks are not measuring what the MTP head actually changes. Here's what's actually happening, and what it means if you're running your own eval pipeline. What MTP actually is Standard autoregressive generation predicts one token at a time. At each step, the model outputs a probability distribution over the vocabulary, samples a token, appends it, and repeats. Multi-token prediction trains an additional head to predict multiple future tokens simultaneously. The core model still generates token-by-token at inference time, but the MTP head is used during training as an auxi

AI Doesn't Fix Your Development Problems. It Accelerates Them.
I've watched the same failure pattern play out across every technology wave of my career. Team gets a new tool that promises to change everything. Productivity numbers go up. Everyone celebrates. Six months later, they're drowning in the same late-stage rework they were drowning in before. Just more of it, arriving faster. I saw it with CASE tools in the nineties. With offshore development in the 2000s. With Agile transformations in the 2010s. With DevOps automation in the 2020s. AI code generation is the most powerful version of this pattern I've ever seen. And most engineering organizations are walking straight into it. The Illusion Looks Like This Your team adopts GitHub Copilot or a similar tool. A developer asks it to implement a user authentication module. In forty seconds, it produc
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

What Gemma 4's multi-token prediction head actually means for your eval pipeline
Gemma 4 dropped with a multi-token prediction (MTP) head and immediately every benchmark thread on r/LocalLLaMA and r/MachineLearning filled up with MMLU scores, HumanEval numbers, and throughput charts. Most of those benchmarks are not measuring what the MTP head actually changes. Here's what's actually happening, and what it means if you're running your own eval pipeline. What MTP actually is Standard autoregressive generation predicts one token at a time. At each step, the model outputs a probability distribution over the vocabulary, samples a token, appends it, and repeats. Multi-token prediction trains an additional head to predict multiple future tokens simultaneously. The core model still generates token-by-token at inference time, but the MTP head is used during training as an auxi


Anthropic Accidentally Exposes Claude Code Source via npm Source Map File
Anthropic's Claude Code CLI had its full TypeScript source exposed after a source map file was accidentally included in version 2.1.88 of its npm package. The 512,000-line codebase was archived to GitHub within hours. Anthropic called it a packaging error caused by human error. The leak revealed unreleased features, internal model codenames, and multi-agent orchestration architecture. By Steef-Jan Wiggers



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!