Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI shutdown controls may not work as expected, new study suggests - ComputerworldGoogle News: Generative AIOpenAI Advocates Electric Grid, Safety Net Spending for New AI EraBloomberg Technology27 questions to ask when choosing an LLM - InfoWorldGoogle News: LLMJapan, driven by labor shortages, is increasingly adopting robotics and physical AI, with a hybrid model where startups innovate and corporations provide scale (Kate Park/TechCrunch)TechmemeAnthropic tells OpenClaw users to pay up - The Rundown AIGoogle News: ClaudeANALYSIS: Q1 IPOs ‘Forge’ Ahead as OpenAI, SpaceX Look to Debuts - Bloomberg Law NewsGoogle News: OpenAINew track in artificial intelligence added to Arkansas Tech University curriculum - River Valley Democrat-GazetteGoogle News: AIDeepMind Calls for New Safeguards Against AI Agent Exploitation - The420.inGoogle News: DeepMindChatGPT web service hit by brief disruption, OpenAI investigates - news.cgtn.comGoogle News: ChatGPTAgile Robots and Google DeepMind Partner on AI-Driven Industrial Robotics - ARC AdvisoryGoogle News: DeepMind40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI shutdown controls may not work as expected, new study suggests - ComputerworldGoogle News: Generative AIOpenAI Advocates Electric Grid, Safety Net Spending for New AI EraBloomberg Technology27 questions to ask when choosing an LLM - InfoWorldGoogle News: LLMJapan, driven by labor shortages, is increasingly adopting robotics and physical AI, with a hybrid model where startups innovate and corporations provide scale (Kate Park/TechCrunch)TechmemeAnthropic tells OpenClaw users to pay up - The Rundown AIGoogle News: ClaudeANALYSIS: Q1 IPOs ‘Forge’ Ahead as OpenAI, SpaceX Look to Debuts - Bloomberg Law NewsGoogle News: OpenAINew track in artificial intelligence added to Arkansas Tech University curriculum - River Valley Democrat-GazetteGoogle News: AIDeepMind Calls for New Safeguards Against AI Agent Exploitation - The420.inGoogle News: DeepMindChatGPT web service hit by brief disruption, OpenAI investigates - news.cgtn.comGoogle News: ChatGPTAgile Robots and Google DeepMind Partner on AI-Driven Industrial Robotics - ARC AdvisoryGoogle News: DeepMind40 Days of Building HarshAI: What I Learned About AI AutomationDEV CommunityMoving fast with agents without losing comprehensionDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

A Step Toward Federated Pretraining of Multimodal Large Language Models

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.26786v1 Announce Type: cross Abstract: The rapid evolution of Multimodal Large Language Models (MLLMs) is bottlenecked by the saturation of high-quality public data, while vast amounts of diverse multimodal data remain inaccessible in privacy-sensitive silos. Federated Learning (FL) offers a promising solution to unlock these distributed resources, but existing research focuses predominantly on fine-tuning, leaving the foundational pre-training phase largely unexplored. In this paper, we formally introduce the Federated MLLM Alignment (Fed-MA) task, a lightweight pre-training paradi — Baochen Xiong, Yifan Xu, Xiaoshan Yang, Yaguang Song, Yaowei Wang, Changsheng Xu

View PDF HTML (experimental)

Abstract:The rapid evolution of Multimodal Large Language Models (MLLMs) is bottlenecked by the saturation of high-quality public data, while vast amounts of diverse multimodal data remain inaccessible in privacy-sensitive silos. Federated Learning (FL) offers a promising solution to unlock these distributed resources, but existing research focuses predominantly on fine-tuning, leaving the foundational pre-training phase largely unexplored. In this paper, we formally introduce the Federated MLLM Alignment (Fed-MA) task, a lightweight pre-training paradigm that freezes the vision encoder and LLM while collaboratively training the cross-modal projector. We identify two critical challenges in this setting: (i) parameter interference in aggregating local projectors; and (ii) gradient oscillations in one-pass collaborative SGD. To address these challenges, we propose Fed-CMP, a pioneering framework for federated MLLM pre-training. Fed-CMP employs Canonical Reliability-Aware Aggregation, which constructs a canonical space to decompose client projectors into a shared alignment basis and client-specific coefficients, then performs reliability-weighted fusion to suppress parameter interference. Furthermore, Fed-CMP introduces Orthogonality-Preserved Momentum, which applies momentum to the shared alignment basis via orthogonal projection, accumulating historical optimization directions while preserving geometric structure. We construct four federated pre-training scenarios based on public datasets, and extensive experiments validate that Fed-CMP significantly outperforms existing baselines.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26786 [cs.LG]

(or arXiv:2603.26786v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26786

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Baochen Xiong [view email] [v1] Wed, 25 Mar 2026 08:16:23 UTC (6,795 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A Step Towa…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 236 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers