Models model benchmark training announce available acquisition

MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis

arXiv eess.IVby Junkai Liu, Ling Shao, Le ZhangApril 7, 20262 min read0 views

arXiv:2602.17901v2 Announce Type: replace Abstract: Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis, but in 3D medical imaging they are still largely used separately for analysis and synthesis, respectively. Unifying them is appealing but difficult, because multi-source data exhibit pronounced style shifts while downstream tasks rely primarily on anatomy, causing anatomical content and acquisition style to become entangled. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. Our central idea is to treat unified pretraining under heterogeneous multi-center data as a factor identifiability problem, where content should consistently capture ana

View PDF HTML (experimental)

Abstract:Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis, but in 3D medical imaging they are still largely used separately for analysis and synthesis, respectively. Unifying them is appealing but difficult, because multi-source data exhibit pronounced style shifts while downstream tasks rely primarily on anatomy, causing anatomical content and acquisition style to become entangled. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. Our central idea is to treat unified pretraining under heterogeneous multi-center data as a factor identifiability problem, where content should consistently capture anatomy and style should consistently capture appearance. MeDUET addresses this problem through three components. Token demixing provides controllable supervision for factor separation, Mixed Factor Token Distillation reduces factor leakage under mixed regions, and Swap-invariance Quadruplet Contrast promotes factor-wise invariance and discriminability. With these learned factors, MeDUET transfers effectively to both synthesis and analysis, yielding higher fidelity, faster convergence, and better controllability for synthesis, while achieving competitive or superior domain generalization and label efficiency on diverse medical benchmarks. Overall, MeDUET shows that multi-source heterogeneity can serve as useful supervision, with disentanglement providing an effective interface for unifying 3D medical image synthesis and analysis. Our code is available at this https URL.

Subjects:

Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)

Cite as: arXiv:2602.17901 [eess.IV]

(or arXiv:2602.17901v2 [eess.IV] for this version)

https://doi.org/10.48550/arXiv.2602.17901

arXiv-issued DOI via DataCite

Submission history

From: Junkai Liu [view email] [v1] Thu, 19 Feb 2026 23:45:23 UTC (3,515 KB) [v2] Sun, 5 Apr 2026 15:40:36 UTC (5,032 KB)

Original source

arXiv eess.IV

https://arxiv.org/abs/2602.17901

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarktraining

ProductsLive

Microservices Communication: REST, gRPC, and Message Queues

The Communication Problem in Microservices When you split a monolith into services, every function call becomes a network call. Network calls fail. They're slow. They're asynchronous. Choosing the right communication pattern determines whether your microservices work together or fight each other. Three Patterns 1. Synchronous REST Service A calls Service B, waits for a response. // Order Service calls Inventory Service async function createOrder ( items : OrderItem []) { // Check inventory (synchronous call) const availability = await fetch ( ' http://inventory-service/api/check ' , { method : ' POST ' , headers : { ' Content-Type ' : ' application/json ' }, body : JSON . stringify ({ items }), }). then ( r => r . json ()); if ( ! availability . allAvailable ) { throw new Error ( ' Some it

DEV Community

5m24 minutes ago

ModelsLive

What Gemma 4's multi-token prediction head actually means for your eval pipeline

Gemma 4 dropped with a multi-token prediction (MTP) head and immediately every benchmark thread on r/LocalLLaMA and r/MachineLearning filled up with MMLU scores, HumanEval numbers, and throughput charts. Most of those benchmarks are not measuring what the MTP head actually changes. Here's what's actually happening, and what it means if you're running your own eval pipeline. What MTP actually is Standard autoregressive generation predicts one token at a time. At each step, the model outputs a probability distribution over the vocabulary, samples a token, appends it, and repeats. Multi-token prediction trains an additional head to predict multiple future tokens simultaneously. The core model still generates token-by-token at inference time, but the MTP head is used during training as an auxi

DEV Community

8m18 minutes ago

ProductsLive

AI Doesn't Fix Your Development Problems. It Accelerates Them.

I've watched the same failure pattern play out across every technology wave of my career. Team gets a new tool that promises to change everything. Productivity numbers go up. Everyone celebrates. Six months later, they're drowning in the same late-stage rework they were drowning in before. Just more of it, arriving faster. I saw it with CASE tools in the nineties. With offshore development in the 2000s. With Agile transformations in the 2010s. With DevOps automation in the 2020s. AI code generation is the most powerful version of this pattern I've ever seen. And most engineering organizations are walking straight into it. The Illusion Looks Like This Your team adopts GitHub Copilot or a similar tool. A developer asks it to implement a user authentication module. In forty seconds, it produc

DEV Community

7m18 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 237 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

What Gemma 4's multi-token prediction head actually means for your eval pipeline

DEV Community

8m18 minutes ago

ModelsLive

LLM may be standardizing human expression – and subtly influencing how we think

Article URL: https://dornsife.usc.edu/news/stories/ai-may-be-making-us-think-and-write-more-alike/ Comments URL: https://news.ycombinator.com/item?id=47673541 Points: 36 # Comments: 15

Hacker News Top

1mabout 1 hour ago

ModelsLive

10 LLM Engineering Concepts Explained in 10 Minutes - KDnuggets

10 LLM Engineering Concepts Explained in 10 Minutes KDnuggets

GNews AI RAG

1m25 minutes ago

ModelsFresh

Anthropic Accidentally Exposes Claude Code Source via npm Source Map File

Anthropic's Claude Code CLI had its full TypeScript source exposed after a source map file was accidentally included in version 2.1.88 of its npm package. The 512,000-line codebase was archived to GitHub within hours. Anthropic called it a packaging error caused by human error. The leak revealed unreleased features, internal model codenames, and multi-agent orchestration architecture. By Steef-Jan Wiggers

InfoQ AI/ML

1mabout 5 hours ago