Research Papers research paper arxiv computer-vision image-recognition

MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

arXivMarch 30, 202610 min read0 views

arXiv:2603.26357v1 Announce Type: new Abstract: Transformer architectures, particularly Diffusion Transformers (DiTs), have become widely used in diffusion and flow-matching models due to their strong performance compared to convolutional UNets. However, the isotropic design of DiTs processes the same number of patchified tokens in every block, leading to relatively heavy computation during training process. In this work, we introduce a multi-patch transformer design in which early blocks operate on larger patches to capture coarse global context, while later blocks use smaller patches to refi — Quan Dao, Dimitris Metaxas

View PDF HTML (experimental)

Abstract:Transformer architectures, particularly Diffusion Transformers (DiTs), have become widely used in diffusion and flow-matching models due to their strong performance compared to convolutional UNets. However, the isotropic design of DiTs processes the same number of patchified tokens in every block, leading to relatively heavy computation during training process. In this work, we introduce a multi-patch transformer design in which early blocks operate on larger patches to capture coarse global context, while later blocks use smaller patches to refine local details. This hierarchical design could reduces computational cost by up to 50% in GFLOPs while achieving good generative performance. In addition, we also propose improved designs for time and class embeddings that accelerate training convergence. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our architectural choices. Code is released at \url{this https URL}

Comments: Accepted at CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26357 [cs.CV]

(or arXiv:2603.26357v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26357

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Quan Dao [view email] [v1] Fri, 27 Mar 2026 12:30:10 UTC (5,241 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26357

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

ModelsLive

DenseNet Paper Walkthrough: All Connected

When we try to train a very deep neural network model, one issue that we might encounter is the vanishing gradient problem. This is essentially a problem where the weight update of a model during training slows down or even stops, hence causing the model not to improve. When a network is very deep, the [ ] The post DenseNet Paper Walkthrough: All Connected appeared first on Towards Data Science .

Towards Data Science

23mabout 1 hour ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

GNews AI energy

1m3 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 166 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

Research PapersFresh

Experts to address AI at BC3 cybersecurity conference - Butler Eagle

Experts to address AI at BC3 cybersecurity conference Butler Eagle

GNews AI cybersecurity

1mabout 3 hours ago

Research PapersLive

TROY student Eli Hankinson showcases research on AI and interactive learning at regional conference - Troy University

TROY student Eli Hankinson showcases research on AI and interactive learning at regional conference Troy University

GNews AI education

1mabout 2 hours ago

Research PapersFresh

How Leg Stiffness Affects Energy Economy in Hopping

arXiv:2501.03971v2 Announce Type: replace Abstract: In the fields of robotics and biomechanics, the integration of elastic elements such as springs and tendons in legged systems has long been recognized for enabling energy-efficient locomotion. Yet, a significant challenge persists: designing a robotic leg that perform consistently across diverse operating conditions, especially varying average forward speeds. It remains unclear whether, for such a range of operating conditions, the stiffness of the elastic elements needs to be varied or if a similar performance can be obtained by changing the motion and actuation while keeping the stiffness fixed. This work explores the influence of the leg stiffness on the energy efficiency of a monopedal robot through an extensive parametric study of it

arXiv cs.RO

2mabout 11 hours ago