Research Papers research paper arxiv ai artificial-intelligence

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

arXivMarch 31, 202610 min read0 views

arXiv:2603.27481v1 Announce Type: cross Abstract: Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping the existing ones frozen. However, despite expert isolation, MoE-based continual learners still suffer from forgetting due to routing-drift: old-task tokens become mistakenly attracted to newly added experts, degrading performance o — Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong

View PDF HTML (experimental)

Abstract:Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping the existing ones frozen. However, despite expert isolation, MoE-based continual learners still suffer from forgetting due to routing-drift: old-task tokens become mistakenly attracted to newly added experts, degrading performance on prior tasks. We analyze the failure mode at the token level and reveal the token's dilemma: ambiguous and old tokens in new-task data offer minimal learning benefit yet induce forgetting when routed to new experts, due to their ambiguous routing assignment during training. Motivated by this, we propose LLaVA-DyMoE, a dynamic MoE framework that incrementally expands the MoE with drift-aware token assignment. We characterize token types via their routing score distributions and apply targeted regularization. Specifically, a token-level assignment guidance steers ambiguous and old tokens away from new experts to preserve established routing patterns and alleviate routing-drift, while complementary routing score regularizations enforce expert-group separation and promote new-expert specialization. Extensive experiments demonstrate that our LLaVA-DyMoE effectively mitigates routing-drift-induced forgetting, achieving over a 7% gain in mean final accuracy and a 12% reduction in forgetting compared to baselines. The project page is this https URL.

Comments: Accepted at CVPR 2026

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.27481 [cs.LG]

(or arXiv:2603.27481v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27481

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chongyang Zhao [view email] [v1] Sun, 29 Mar 2026 02:30:55 UTC (1,023 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27481

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market News

Accelerate the path from molecule to market

Siemens and Dotmatics: forward together   The pharmaceutical industry is undergoing one of the most profound transformations in its history. The pressure on companies to innovate faster and more efficiently grows as demographics shift, patients’ life expectancies rise, therapeutic complexity increases, development costs surge and regulatory expectations change as technology evolves. Yet, while manufacturing has embraced digitalization, research and development (R&D) teams remain hindered by […]

blog.siemens.com

1mabout 1 month ago

Research Papers

NIST Researchers Develop More Accurate Formula for Measuring Particle Concentration

The new method will be useful in various fields, including nanomedicine, food science, environmental science and advanced manufacturing.

nist.gov

1m7 months ago

Models

Google DeepMind’s Eli Collins to Headline IMPACT: The Data Observability Summit on November 8

Collins will discuss DeepMind’s latest research, the future of LLMs, and how to deploy AI responsibly.

montecarlodata.com

1mover 2 years ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 116 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Submission history

Daily AI Digest

More about

Accelerate the path from molecule to market

NIST Researchers Develop More Accurate Formula for Measuring Particle Concentration

Google DeepMind’s Eli Collins to Headline IMPACT: The Data Observability Summit on November 8

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

NIST Researchers Develop More Accurate Formula for Measuring Particle Concentration

Philipp Müller starts as Cyber Valley Max Planck Independent Research Group Leader

We are hiring a new Max Planck Research Group Leader at the MPI for Intelligent Systems in Stuttgart

More room for world class research

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Submission history

Daily AI Digest

More about

Accelerate the path from molecule to market

NIST Researchers Develop More Accurate Formula for Measuring Particle Concentration

Google DeepMind&#8217;s Eli Collins to Headline IMPACT: The Data Observability Summit on November 8

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

NIST Researchers Develop More Accurate Formula for Measuring Particle Concentration

Philipp Müller starts as Cyber Valley Max Planck Independent Research Group Leader

We are hiring a new Max Planck Research Group Leader at the MPI for Intelligent Systems in Stuttgart

More room for world class research

Google DeepMind’s Eli Collins to Headline IMPACT: The Data Observability Summit on November 8