Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPTBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AIData Observability 2.0: The Backbone of Trusted Enterprise AnalyticsDev.to AIDid you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖Dev.to AII Built a Local-First AI Knowledge Base for Developers — Here's What Makes It DifferentDev.to AIBenchmarking Batch Deep Reinforcement Learning AlgorithmsDev.to AIBefore Its IPO, OpenAI Is Already Selling to the Public - MorningstarGoogle News: OpenAIUK National Education Union poll: 66% of secondary school teachers in England say pupils using AI are losing their capacity for core skills like writing (Sally Weale/The Guardian)TechmemeWhat AI Changes for Product ThinkingMedium AIHow Disney Imagineers are using AI and robotics to reshape the company’s theme parks - Fast CompanyGoogle News - AI roboticsBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessIs AI the new “Manhattan Project”? Vox went to Los Alamos to find out. - VoxGoogle News: ChatGPTBest Video Conferencing Solution for Enterprises in 2026Dev.to AIFunctional Testing vs Reality: What Actually Breaks in ProductionDev.to AIGenerative AI In Manufacturing Market to hit USD 10,540.1 Million by 2033 - vocal.mediaGoogle News: Generative AIData Observability 2.0: The Backbone of Trusted Enterprise AnalyticsDev.to AIDid you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖Dev.to AII Built a Local-First AI Knowledge Base for Developers — Here's What Makes It DifferentDev.to AIBenchmarking Batch Deep Reinforcement Learning AlgorithmsDev.to AIBefore Its IPO, OpenAI Is Already Selling to the Public - MorningstarGoogle News: OpenAIUK National Education Union poll: 66% of secondary school teachers in England say pupils using AI are losing their capacity for core skills like writing (Sally Weale/The Guardian)TechmemeWhat AI Changes for Product ThinkingMedium AIHow Disney Imagineers are using AI and robotics to reshape the company’s theme parks - Fast CompanyGoogle News - AI robotics
AI NEWS HUBbyEIGENVECTOREigenvector

High dimensional theory of two-phase optimizers

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.26954v1 Announce Type: new Abstract: The trend towards larger training setups has brought a renewed interest in partially asynchronous two-phase optimizers which optimize locally and then synchronize across workers. Additionally, recent work suggests that the one-worker version of one of these algorithms, DiLoCo, shows promising results as a (synchronous) optimizer. Motivated by these studies we present an analysis of LA-DiLoCo, a simple member of the DiLoCo family, on a high-dimensional linear regression problem. We show that the one-worker variant, LA, provides a different tradeof — Atish Agarwala

View PDF HTML (experimental)

Abstract:The trend towards larger training setups has brought a renewed interest in partially asynchronous two-phase optimizers which optimize locally and then synchronize across workers. Additionally, recent work suggests that the one-worker version of one of these algorithms, DiLoCo, shows promising results as a (synchronous) optimizer. Motivated by these studies we present an analysis of LA-DiLoCo, a simple member of the DiLoCo family, on a high-dimensional linear regression problem. We show that the one-worker variant, LA, provides a different tradeoff between signal and noise than SGD, which is beneficial in many scenarios. We also show that the multi-worker version generates more noise than the single worker version, but that this additional noise generation can be ameliorated by appropriate choice of hyperparameters. We conclude with an analysis of SLA -- LA with momentum -- and show that stacking two momentum operators gives an opportunity for acceleration via a non-linear transformation of the "effective'' Hessian spectrum, which is maximized for Nesterov momentum. Altogether our results show that two-phase optimizers represent a fruitful new paradigm for understanding and improving training algorithms.

Subjects:

Machine Learning (cs.LG); Statistics Theory (math.ST)

Cite as: arXiv:2603.26954 [cs.LG]

(or arXiv:2603.26954v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26954

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Atish Agarwala [view email] [v1] Fri, 27 Mar 2026 19:50:12 UTC (143 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
High dimens…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 186 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers